Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
This branch is 40 commits ahead, 1 commit behind histograph:master.

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

NYC Space/Time Directory ETL tool

Extract/Transform/Load tool for NYC Space/Time Directory data: it loads separate data modules which perform ETL tasks, such as downloading and transforming data to the NYC Space/Time Directory data model.

For more information about the NYC Space/Time Directory project, as well as datasets produced by spacetime-etl, see

ETL Modules

Space/Time's ETL modules are separate Node.js modules which need to be installed individually. Each ETL module represents a NYC Space/Time Directory dataset or data transformation, and defines a set of steps; spacetime-etl loads these modules, and executes the steps they define.

Some examples:

ETL Module Description
etl-mapwarper Outlines of maps from Map Warper, NYPL's tool for georectifying historical maps
etl-group-maps Map Warper maps, grouped by decade — used by Maps by Decade
etl-spacetime-graph Graph of all NYC Space/Time Directory datasets
etl-oldnyc Locations of 40,000 geotagged photos from OldNYC

For more ETL modules, see GitHub.


The configuration of the data tool is done in the NYC Space/Time Directory configuration file, under the etl key.

The following configuration options must be specified:

Parameter Description
moduleDir Path (absolute, or relative to data tool) where spacetime-etl looks for data modules
modulePrefix Directory prefix used to identify data modules (e.g. etl-mapwarper) — default is etl-
outputDir Directory to which ETL modules write their data


  modulePrefix: "etl-"
  moduleDir: /Users/bertspaan/code/etl-modules
  outputDir: /Users/bertspaan/data/spacetime/etl

The configuration of the separate ETL modules can also be done in configuration file. Please see the README of the respective ETL modules for more information. Example:

        PPL: 'st:Place'
        PPLX: 'st:Neighborhood'

Usage & Installation

Installing ETL Modules

To use spacetime-etl to run ETL modules, you first need to install them. Go to the directory specified by the moduleDir configuration option, and clone the ETL modules you need, for example:

git clone
git clone
git clone

Then, install the dependencies of each module:

cd etl-nyc-wards
npm install
cd ..
cd etl-mapwarper
npm install
cd ..
cd etl-oldnyc
npm install

You can now use spacetime-etl to run the three ETL modules you have just installed: nyc-wards, mapwarper and oldnyc.

Command-line Interface


npm install -g nypl-spacetime/spacetime-etl

Run the data tool without command-line arguments to get a list of the available data modules:


To run one or more ETL modules, provide their IDs as command-line arguments:

spacetime-etl mapwarper oldnyc ...

Alternatively, you can select the processing steps you want to run:


By default, all steps are run consecutively.

From a Node.js script


npm install nypl-spacetime/spacetime-etl

Usage (to run this example, first install etl-mapwarper, see Installing ETL Modules):

const etl = require('spacetime-etl')

// Fetch all installed ETL modules:
const modules = etl.modules()

// Execute all steps:
etl.execute('mapwarper', (err) => {
  if (err) {
  } else {

// Execute a single step:
etl.execute('', (err) => {
  if (err) {
  } else {

The produced data files are written in a subdirectory of the configured output directory: <outputDir>/<step>/mapwarper.

Creating an ETL module from scratch

It's easy! Let's say we want to write a scraper which, very illegally, reads photos and their metadata from the NYC Municipal Archives Online Gallery.

First, create a directory in spacetime-etl's moduleDir with the following name:

mkdir etl-nyc-municipal-archives

In this directory, create two files:

First, nyc-municipal-archives.dataset.json, holding the metadata of the ETL module and the resulting dataset:

  "id": "nyc-municipal-archives",
  "title": "NYC Municipal Archives Online Gallery",
  "license": "CC0",
  "description": "The NYC Municipal Archives Online Gallery provides research access to over 900,000 items digitized from the Municipal Archives' vast holdings, including photographs, maps, motion-pictures and audio recordings",
  "author": "Bert Spaan",
  "website": ""

The actual code goes in nyc-municipal-archives.js:

function download (config, dirs, tools, callback) {
  // Download data, write data to output directory;
  //   dirs.current contains the path of the
  //   output directory of the current step

  // config object contains configuration from
  // this module's section (if available)


function transform (config, dirs, tools, callback) {
  // Read downloaded data from output directory;
  // contains the path of the
  //   output directory of the download step

  // Do data transformations, and write the
  //   resulting Space/Time objects to disk
  //   using tools.writer

  const object = {
    type: 'object',
    obj: {
      id: 1,
      type: 'st:Photo'
      data: {
        title: '',
        collection: ''
      geometry: {
        type: "Point",
        coordinates: [

  tools.writer.writeObject(object, callback)

module.exports.steps = [

You can now run this ETL module with the following command:

spacetime-etl nyc-municipal-archives

Copyright (C) 2015 Waag Society, 2017 The New York Public Library


Extract/Transform/Load tool for NYC Space/Time Directory data







No releases published


No packages published


  • JavaScript 100.0%