Skip to content
This repository has been archived by the owner on Aug 13, 2021. It is now read-only.

Latest commit

 

History

History
19 lines (14 loc) · 904 Bytes

nesta.core.batchables.crunchbase.rst

File metadata and controls

19 lines (14 loc) · 904 Bytes

Crunchbase data (private companies)

NB: The Crunchbase pipeline may not work until this issue has been resolved.

Batchables for the collection and processing of Crunchbase data. As documented under packages and routines, the pipeline is executed in the following order (documentation for the run.py files is given below, which isn't super-informative. You're better off looking under packages and routines).

The data is collected from proprietary data dumps, parsed into MySQL (tier 0) and then piped into Elasticsearch (tier 1), post-processing.

core.batchables.crunchbase.crunchbase_collect.run

core.batchables.crunchbase.crunchbase_elasticsearch.run