Great East Japan Earthquake Evacuees. Data are sourced from Japan's Reconstruction Agency. Two series are provided in this Data Package: monthly totals and monthly totals by prefecture.
The inspiration for extracting these data came from a comment and a visualization shared by NHK News Web developer Satoshi Yamamoto in the European Journalism Centre's Doing Journalism with Data MOOC.
Great East Japan Earthquake evacuees by prefecture, compiled by Japan's Reconstruction Agency, and originally published as PDF files.
Run the following script from this directory to download and process the data:
make data
Extraction
tabula-extractor
, an Apache PDFBox-powered, JRuby "table extraction engine", is used to extract the title, date, and evacuees by prefecture data from pages three to five of each file.
Transformation
In the intermediate CSVs, column totals and calculated changes in evacuee numbers are deleted. The first column, which has prefecture IDs and names, is split into two columns: ISO 3166-2:JP codes and prefecture names. Any notes that were in a table cell are in a new column at the end of the same row. Lastly, the data date is appended to the end of the each row. Missing or empty values are marked with 'NA'.
The original PDFs are downloaded to ./archive/pdf
, the extracted PDFs are output to ./archive/csv
, and the processed data are output to ./data
.
These data are made available under the Public Domain Dedication and License v1.0 whose full text can be found at: http://www.opendatacommons.org/licenses/pddl/1.0/