#mortchartgen
mortchartgen
is a tool which is used to create charts of mortality trends for different countries, age groups and causes of death based on data from WHO Mortality Database. The tool uses Pandas and matplotlib to generate the charts and stores the data in a MySQL database. A YAML configuration file is used to specify the charts to be generated.
I use the tool to generate charts for the website Mortalitetsdiagram ("mortality charts"). Files for this site (excluding the SVG charts themselves) are included in the subdirectory mortchart-site
. Currently, the generated charts are in Swedish. I am not affiliated with WHO, and they are not responsible for any interpretations of mortality trends based on charts generated from the tool. The tool is licensed under an ISC license.
##Setup
It is assumed that you have a working Python setup, as well as access to a MySQL/MariaDB server, with a user privileged to create databases. The unzipped data files will require about 500 MB of disk space. The script download.py
imports requests
, shutil
, os
and zipfile
, tableimp.py
imports os
and subprocess
, and the main script, chartgen.py
, imports sqlalchemy
, numexpr
, pandas
, matplotlib
, yaml
, os
, random
, statsmodels
and time
.
- Run
download.py [directory]
in order to download the data files and documentation from the WHO website and unzip them intodirectory
. - Read the SQL file
setupdb.sql
into the MySQL client, e.g.mysql --defaults-extra-file=tableimp.cnf < setupdb.sql
. This will create a databaseMorticd
with two tables,Pop
andDeaths
, as well as a userwhomuser
with select rights granted on these tables, which is used for the SQL queries in the chart generator.You can use the provided filetableimp.cnf
in this step and the next, as shown in the example, but then you have to adjust the relevant settings in the file (e.g. user, password, host and socket) in order for the database connection to work. For more information about the fields in the tables, consult the WHO documentation. - Load the unzipped data files into the newly created tables. The file
pop
should be loaded into the tablePop
, and the files with names starting withMort
should all be loaded into the tableDeaths
. The scripttableimp.py
loops through the data files and reads them into the tables usingmysqlimport
. You can call the script withtableimp.py [directory]
, wheredirectory
is the download directory specified in step 1. The default configuration is to read the files locally from the client, and this has to be supported by the MySQL server. Otherwise, move the files into a location where the server can read them directly and remove thelocal
option intableimp.cnf
. - Run
tablemod.py
. This stores tables of population and number of deaths (for the populations and cause-of-deaths groups specified inchartgen.yaml
) in a SQLite database,chartgen.db
. This speeds up the chart generation (see below) by avoiding repeated querying of the MySQL database with regular expressions. Some values in the dictionaryconn_config
(read fromsettings
inchartgen.yaml
) may also have to be changed in order for the database connection to work. In particular, you should changehost
andunix_socket
to suit your MySQL server.
##Generate the charts
Call the function batchplot
in chartgen.py
in order to generate the charts. This function is automatically called if chartgen.py
is invoked from the system shell. The charts are saved as SVG files in the subdirectory mortchart-site/charts
. If you want to skip certain countries, age groups or causes of death, comment out the relevant lines in chartgen.yaml
. However, the cause all
cannot be excluded, because it is used to compute percentage of total deaths for other causes.
##CSV generation
If savecsv
under settings
in chartgen.yaml
is true
, chartgen.py
will save the dataframes used to generate the charts as CSV files in the subdirectory csv
, so that they can be further analysed in other programs.
##Special charts with R
The R script specchartgen.r
demonstrates how the generated CSV files can be used. It contains the functions agetrends.plot
which generates charts showing secular trends for a given combination of sex, cause and a interval of 5-year age groups, sexratio.trends.plot
which generates charts showing secular trends for sex ratios for mortality rates/percentages, and ctrisyear.plot
which generates charts giving a comparison of mortality between countries for a given cause and year. It can generate scatterplots of female vs male mortality or bar charts for a single sex. The function ctriesyr.batchplot
uses ctrisyear.plot
to generate charts for all causes and age groups in chartgen.yaml
and for all years in a given sequence and export these as SVG files in the subdirectory mortchart-site/charts/ctriesyr
. The function causedist.plot
generates charts of the age-specific distribution of causes of death for a given country, sex and year. By default, the list of causes is read from causedist
in chartgen.yaml
.
All charts are generated using ggplot2, and the script also uses the packages tidyr, yaml, XML, gridSVG, plyr and rjson.
The function lmortfunc.test
in specchartgen.r
can be used to perform so-called Gompertzian analysis of mortality trends. By calling the function paramsplot
in mortparams.py
, results with parameters can be plotted using the TeX facilities in matplotlib.
In additions to packages used by chartgen.py
, mortparams.py
imports rpy2 for communication with R. The model is fitted with Levenberg-Marquardt nonlinear least-squares (using minpack.lm). If lmortfunc.test
is called with mortfunc = 'weibull'
, the mortality data is fitted to the two-parameter Weibull function instead of the Gompertz function (cf. Juckett and Rosenberg (1993)). It is also possible to fit survivorship curves, for the subpopulation who dies of a particular cause, instead of mortality curves, if lmortfunc.test
is called with type = 'surv'
. Fit of mortality curves corresponding to these survival curves (i.e. normalized to the fraction dying of the given cause) can be obtained by calling the function with type = rate
(the default) and normrate = TRUE
. For this normalization, life tables are constructed using LifeTables.
By calling lmortfunc.test
with pc = 'p'
or pc = 'c'
, analysis can be fitted by period or birth cohort: the latter is only implemented for unnormalized mortality curves, however.
By calling obspred_plot
in mortparams.py
on an object returned by paramsplot
it is possible to plot observed data for a list of years versus the predictions made by the non-linear regressions.
##Generate the index page and documentation source
By running mortchart-site/indexgen.py
you can generate index.html
and mortchartdoc_norefhead.md
in mortchart_site
based on the settings in chartgen.yaml
and the templates index.jinja
and mortchartdoc.jinja
in mortchart-site/jinjatempl
(which use Jinja2). The first file contains a bare form, which you can use to search among the charts in a web browser, and the second file can be used to generate the site documentation in PDF or HTML format.
##Generate docs
Run make pdfbib
in mortchart-site
in order to generate PDF documentation from the Markdown source. This requires a LaTeX distribution as well as Pandoc (in order to convert Markdown). The HTML documentation is generated automatically when the site is built (see below).
##Generate the Mortchart site
The full site is now generated using Hakyll, a static site generator which is tightly integrated with Pandoc and uses the Haskell compiler GHC. To generate the site for the first time, run make buildinit
in the directory mortchart-site
(it will be generated in mortchart-site/_site
). The program assumes that the charts (both those made by chartgen.py
and those made by ctriesyr.batchplot
in specchartgen.r
) have been generated. To update the site, run make build
; if you modify site.hs
, update with make rebuild
.