Skip to content
Go to file
This branch is 40 commits ahead, 1 commit behind histograph:master.

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

NYC Space/Time Directory ETL tool

Extract/Transform/Load tool for NYC Space/Time Directory data: it loads separate data modules which perform ETL tasks, such as downloading and transforming data to the NYC Space/Time Directory data model.

For more information about the NYC Space/Time Directory project, as well as datasets produced by spacetime-etl, see

ETL Modules

Space/Time's ETL modules are separate Node.js modules which need to be installed individually. Each ETL module represents a NYC Space/Time Directory dataset or data transformation, and defines a set of steps; spacetime-etl loads these modules, and executes the steps they define.

Some examples:

ETL Module Description
etl-mapwarper Outlines of maps from Map Warper, NYPL's tool for georectifying historical maps
etl-group-maps Map Warper maps, grouped by decade — used by Maps by Decade
etl-spacetime-graph Graph of all NYC Space/Time Directory datasets
etl-oldnyc Locations of 40,000 geotagged photos from OldNYC

For more ETL modules, see GitHub.


The configuration of the data tool is done in the NYC Space/Time Directory configuration file, under the etl key.

The following configuration options must be specified:

Parameter Description
moduleDir Path (absolute, or relative to data tool) where spacetime-etl looks for data modules
modulePrefix Directory prefix used to identify data modules (e.g. etl-mapwarper) — default is etl-
outputDir Directory to which ETL modules write their data


  modulePrefix: "etl-"
  moduleDir: /Users/bertspaan/code/etl-modules
  outputDir: /Users/bertspaan/data/spacetime/etl

The configuration of the separate ETL modules can also be done in configuration file. Please see the README of the respective ETL modules for more information. Example:

        PPL: 'st:Place'
        PPLX: 'st:Neighborhood'

Usage & Installation

Installing ETL Modules

To use spacetime-etl to run ETL modules, you first need to install them. Go to the directory specified by the moduleDir configuration option, and clone the ETL modules you need, for example:

git clone
git clone
git clone

Then, install the dependencies of each module:

cd etl-nyc-wards
npm install
cd ..
cd etl-mapwarper
npm install
cd ..
cd etl-oldnyc
npm install

You can now use spacetime-etl to run the three ETL modules you have just installed: nyc-wards, mapwarper and oldnyc.

Command-line Interface


npm install -g nypl-spacetime/spacetime-etl

Run the data tool without command-line arguments to get a list of the available data modules:


To run one or more ETL modules, provide their IDs as command-line arguments:

spacetime-etl mapwarper oldnyc ...

Alternatively, you can select the processing steps you want to run:


By default, all steps are run consecutively.

From a Node.js script


npm install nypl-spacetime/spacetime-etl

Usage (to run this example, first install etl-mapwarper, see Installing ETL Modules):

const etl = require('spacetime-etl')

// Fetch all installed ETL modules:
const modules = etl.modules()

// Execute all steps:
etl.execute('mapwarper', (err) => {
  if (err) {
  } else {

// Execute a single step:
etl.execute('', (err) => {
  if (err) {
  } else {

The produced data files are written in a subdirectory of the configured output directory: <outputDir>/<step>/mapwarper.

Creating an ETL module from scratch

It's easy! Let's say we want to write a scraper which, very illegally, reads photos and their metadata from the NYC Municipal Archives Online Gallery.

First, create a directory in spacetime-etl's moduleDir with the following name:

mkdir etl-nyc-municipal-archives

In this directory, create two files:

First, nyc-municipal-archives.dataset.json, holding the metadata of the ETL module and the resulting dataset:

  "id": "nyc-municipal-archives",
  "title": "NYC Municipal Archives Online Gallery",
  "license": "CC0",
  "description": "The NYC Municipal Archives Online Gallery provides research access to over 900,000 items digitized from the Municipal Archives' vast holdings, including photographs, maps, motion-pictures and audio recordings",
  "author": "Bert Spaan",
  "website": ""

The actual code goes in nyc-municipal-archives.js:

function download (config, dirs, tools, callback) {
  // Download data, write data to output directory;
  //   dirs.current contains the path of the
  //   output directory of the current step

  // config object contains configuration from
  // this module's section (if available)


function transform (config, dirs, tools, callback) {
  // Read downloaded data from output directory;
  // contains the path of the
  //   output directory of the download step

  // Do data transformations, and write the
  //   resulting Space/Time objects to disk
  //   using tools.writer

  const object = {
    type: 'object',
    obj: {
      id: 1,
      type: 'st:Photo'
      data: {
        title: '',
        collection: ''
      geometry: {
        type: "Point",
        coordinates: [

  tools.writer.writeObject(object, callback)

module.exports.steps = [

You can now run this ETL module with the following command:

spacetime-etl nyc-municipal-archives

Copyright (C) 2015 Waag Society, 2017 The New York Public Library


Extract/Transform/Load tool for NYC Space/Time Directory data




No releases published


No packages published
You can’t perform that action at this time.