Skip to content

semi-organized dump of code used during the making of "The Map of History"

License

Notifications You must be signed in to change notification settings

hsingtism/map-of-history-dump

Repository files navigation

Dump of code used in the making of my "The map of History" map

The Map of History

A more detailed blog can be found on my website

This markdown file focuses on the specific scripts and code used, not the why and what of it. See the above link for details.

Brief description

This project attempts to show the geographic locations in which history is recorded by using the date and location of death of people with entries in Wikidata

The following code is used in the process of converting the raw database dump into the final maps

note The scripts listed in this repo are in no ways, efficent or perfect; they just work. This is not intended to be something that is used again instead just bodged codes. Do not use for any important application.

I - Filtering the Data

inspect-data.js can be used to print the first few lines of files to console. This is used for very large files that text editors can't handle

  • The entire Wikidata JSON dump is downloaded from Wikidata
  • The database dump is decompressed (in this case, I downloaded the gzip version and used 7-zip to decompress)
  • count-lines.js is used to estimate the size of the dump, although a number is already given directly by Wikidata
  • filter.js is used to filter out the data that will not be used. The results are written into another JSON file. The resulting file is about 140 megabytes.
    • These scripts are ran with node, written in JS, and not optimized for speed. Which makes them pretty slow, but the dataset is not too big so it's acceptable to me
  • Data is filtered again with second-filter.js: unused entries removed and dates converted to Number. Note that the calender system doesn't really matter because percision is not of concern here (at least for the Gregorian-Julian difference)
  • IMPORTANT: the string 'HALT' is manually appended on a new line in sfiltered.json
  • get-location-entity-list.js is used to extract the unique ID's of needed locations
  • The file generated by the previous script along with the main database dump is used by get-coord-from-dump.js to generate a lookup table of needed coordinates
  • Commas and brackets are added to the resulting file, I forgot to code that in, the code is not fixed so yeah. It should be a simple find-and-replace though
  • generate-final-data.js is used to generate a list of data that can be trivially catagorized and plotted onto a map ready.json

The flow of data from the database dump to the map-ready json can also be seen from the following diagram

data flow diagram

note These scripts are not perfect, they might (and some will) probably drop a few entries. These should be fine for this application but you might want to add a few things to prevent entries being dropped if you want to use these scripts for something else.

II - Analyzing and preparing the data

Before starting, brackets needs to be added to ready.json and the trailing comma needs to be removed in order to make it a valid JSON array

final-data-analysis.js can be used to view very rudimentary analysis of the data

  • sort-and-to-CSV.js is used to sort the data and convert it to CSV. This script also anonymize the data and drop some columns.
    • Then:
    • the first row is manually added
    • the CSV is manually split into files, although this can easily be done with a script
    • the coordinates (90,180) and (-90,-180) is added to the csv to ensure Datashader plots the entire world map
  • plot.py is used to plot the data into a PNG.
    • It is recommended that Conda is used with this script
    • Datashader is the main driver of the plotting
  • The PNG's outputted by the last script can be used as a mask for the next step

III - Making the final images

The PNG's from the previous step can be imported into an image editor, masked, and superimposed over a world map. As of the background, Wikimedia Common's File:BlankMap-Equirectangular.svg worked well after scaling because of the projection and easy-to-adjust color. I also did a little blurring and masking of the blur to make stray points more visable.

Final touches and captions are added and the image is ready

The final products and more details can be seen here