This package can be used to perform statistical analysis on mortality data files available from CDC. The SQL import scripts should work with data files from 2005 and later; you should, however, review the documentation for the years you are interested in in order to understand the structure of the files.
It is assumed that you have Julia installed, as well as access to a MySQL/MariaDB server where you can create databases.
- Use the file
src/Usmort.sqlto create the database and the
usmuseraccount with SELECT rights on the database tables, e.g.
mysql -u root -p <Usmort.sql.
- Download the zipped data file for the year you are interested in e.g.
wget ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/DVS/mortality/mort2006us.zipUnzip the file. Note that the size of the uncompressed data file is more that 1 GB.
- Rename the uncompressed data file to
src/Usdeathsimp.sqlto import it into the database, e.g.
mysql -u root -p --local-infile=1 <Usdeathsimp.sql.
- Definitions of different causes of deaths are taken from the data file used
by my Mortchartgen package. You
can install that package in Julia e.g. by
ageca is used to build a DataFrame with the number of deaths for
a given sex and year matching regular expressions in the underlying cause of
death and the concatenation of the entity-axis conditions on the death certificate,
grouped by 27 age groups. The function
caprop returns a DataFrame with the
relative number of deaths for a pair of frames returned by
Data for some causes of death and dimensions such as race, level of education,
martial status and place of death is imported from the file
which can be easily extended.
In order to retrieve all deaths for males in 2006:
using Usmort allexpr = caexpr(:all) allm06 = ageca(2006, :M, allexpr)
In order to retrieve all deaths among females in 2006 with respiratory infection (ICD-10 J00--J22) on the death certificate, and then calculate the proportion of these deaths with circulatory disease as underlying cause:
using Usmort allexpr = caexpr(:all) respinfexpr = Usmort.caexpr(:respinf) circexpr = Usmort.caexpr(:circ) respinff06ent = ageca(2006, :F, allexpr, [respinfexpr]) circrespinff06ent = ageca(2006, :F, circexpr, [respinfexpr]) circrespinff06entp = caprop(circrespinff06ent, respinff06ent)
Queries can be refined by using keywords in the
ageca calls. If a par of
[a,b] is given as values of the
edu03 keyword arguments,
the query will select records with the person's education coded according to
the 1989 and 2003 standards with the education level within the interval
Further keyword arguments may be given in the format
Field = [expression, operator], where
expression is an expression to match and
the name of a MySQL comparison operator or function like
for valid field names (and compare their position with the documentation for
the data files). In order to retrieve all 2006 deaths among never-married
females with lower than high school education:
using Usmort, MySQL allexpr = caexpr(:all) lowed89 = [0,8] lowed03 = [1,1] allflowedsing06 = ageca(2006, :F, allexpr; edu89 = lowed89, edu03 = lowed03, Mart = ["S", "=", MYSQL_TYPE_VARCHAR])
The performance of queries can often be improved by adding indexes on e.g. the
Datayear fields in the
In order to plot age-specific proportions of deaths in 2006 due to tumors (as underlying cause) among females for different levels of education:
using Usmort ed06fall = framedict(2006, :F, :all, :ed) ed06ftum = framedict(2006, :F, :tum, :ed) propplot(ed06ftum, ed06fall)
In order to plot age-specific proportions of deaths in 2006 due to respiratory infection (as mentioned on the death certificate) among females for different levels of education:
using Usmort ed06fall = framedict(2006, :F, :all, :ed) ed06fallrespinf = framedict(2006, :F, :all, :ed, [:respinf]) propplot(ed06fallrespinf, ed06fall)
In order to plot age-specific proportions of deaths in 2006 due to respiratory infection among females stacked by place of death:
using Usmort place06frespinf = framedict(2006, :F, :respinf, :dplace) stackdimplot(place06frespinf)
In order to plot age-specific proportions of deaths in 2006 and 2014 due to tumors among females with short education:
using Usmort ed06fall = framedict(2006, :F, :all, :ed) ed06ftum = framedict(2006, :F, :tum, :ed) ed14fall = framedict(2014, :F, :all, :ed) ed14ftum = framedict(2014, :F, :tum, :ed) edftum = [ed06ftum; ed14ftum] edfall = [ed06fall; ed14fall] groupyearplot(edftum, edfall, 1)