Skip to content

hhtyo/great-east-japan-earthquake-evacuees

Repository files navigation

Great East Japan Earthquake Evacuees. Data are sourced from Japan's Reconstruction Agency. Two series are provided in this Data Package: monthly totals and monthly totals by prefecture.

Build Status

Background

The inspiration for extracting these data came from a comment and a visualization shared by NHK News Web developer Satoshi Yamamoto in the European Journalism Centre's Doing Journalism with Data MOOC.

Data

Description

Great East Japan Earthquake evacuees by prefecture, compiled by Japan's Reconstruction Agency, and originally published as PDF files.

Data Preparation

Processing

Run the following script from this directory to download and process the data:

make data

Processing Notes

Extraction

tabula-extractor, an Apache PDFBox-powered, JRuby "table extraction engine", is used to extract the title, date, and evacuees by prefecture data from pages three to five of each file.

Transformation

In the intermediate CSVs, column totals and calculated changes in evacuee numbers are deleted. The first column, which has prefecture IDs and names, is split into two columns: ISO 3166-2:JP codes and prefecture names. Any notes that were in a table cell are in a new column at the end of the same row. Lastly, the data date is appended to the end of the each row. Missing or empty values are marked with 'NA'.

Resources

The original PDFs are downloaded to ./archive/pdf, the extracted PDFs are output to ./archive/csv, and the processed data are output to ./data.

License

ODC-PDDL-1.0

These data are made available under the Public Domain Dedication and License v1.0 whose full text can be found at: http://www.opendatacommons.org/licenses/pddl/1.0/

About

Great East Japan Earthquake Evacuees

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published