Skip to content

Augmenter Importer specification

Rémy Greinhofer edited this page May 11, 2019 · 5 revisions

Goal

The goal of the augmenters is to automatically augment the raw data sets with extra information.

An example is the geocoding augmenter which adds coordinates to a fatality entry.

How it works

An augmenter is in charge of generating data to augment a raw data set. It generates a file containing set of extra data that will be injected into the raw data set. This file is called an "augmentation".

The way it works is very similar to generating database migrations and applying them. "Augmentations" are applied using the scrapd-merger tool.

Conventions

General

  • Augmenters must be written in Python or Go.
  • Augmenters should not have any external dependency other than what ScrAPD uses (if written in Python).
  • Augmenters must have unit as well as integration tests (if applicable).
  • Augmenters must generate a file containing the data to be injected into a fatality case.
  • A file containing the data to be injected is called an augmentation.
  • An augmentation is applied using the scrapd-merger.
    • The key for matching the information is "Case".
    • If "Case" is not found, the entry is ignored.

Options

The augmenters must implement the following flags:

  • Include entries without results (they are excluded by default)
    • -e, --empties
  • Include entries which don't match an existing entry
    • -x, --extras
  • Add existing augmentation to avoid reprocessing entries that where previously processed
    • --augmentations [augmentation_1, ...]

Input

  • A JSON file representing the raw data set.
  • The format is a list of objects.

Output

  • On-screen or in a file
  • The format is a list of objects:
[
  {
    "Case": "19-0400694",
    "Latitude": 30.303625,
    "Longitude": -97.67139
  },
  {
    "Case": "19-0370320",
    "Latitude": 30.243967,
    "Longitude": -97.764366
  }
]

Augmenters naming

  • The general format is: scrapd-{type}-{operation_or_datatype}-{service}.
    • type: tool type (augmenter or importer)
    • operation_or_datatype: the type of operation performed by the augmenter or the type of data added to the data set
    • service: the name of the service used to perform the operation or retrieve the data
  • All the components of the name must be in lower case

Examples

  • scrapd-augmenter-geocoding-geocensus.py
  • scrapd-augmenter-geocoding-tamu.py
  • scrapd-importer-dataset-apd.py

Augmentation naming

The naming convention for the augmentations is very similar to the augmenters one, EXCEPT it MUST include the year:

  • The general format is: augmentation-{operation_or_datatype}-{service}-{year}.
    • operation_or_datatype: the type of operation performed by the augmenter or the type of data added to the data set
    • service: the name of the service used to perform the operation or retrieve the data
  • All the components of the name must be in lower case

Examples

  • augmentation-geocoding-geocensus-2017.json
  • augmentation-geocoding-tamu-2017.json
  • augmentation-import-apd-2017.json
Clone this wiki locally
You can’t perform that action at this time.