# `datapackage-pipelines` demo

This is the directory structure we're working with. 
Notice the pipeline specification in the `pipeline-spec.yaml` file.

In [1]:
!ls -l

total 136
-rw-r--r--  1 adam  staff  35478 Apr  3 12:27 Datapackage Pipelines Demo.ipynb
-rw-r--r--  1 adam  staff    629 Feb  1 18:35 add_constant.py
-rw-r--r--  1 adam  staff  20480 Mar 27 18:05 celerybeat-schedule
-rw-r--r--  1 adam  staff    928 Feb  1 18:35 co2-information-cdiac.zip
-rw-r--r--  1 adam  staff    839 Apr  3 12:25 pipeline-spec.yaml


Let's see what's inside:

In [2]:
!cat pipeline-spec.yaml

worldbank-co2-emissions:
  schedule:
    crontab: '0 * * * *'
  pipeline:
    -
      run: add_metadata
      parameters:
        name: 'co2-emissions'
        title: 'CO2 emissions [metric tons per capita]'
        homepage: 'http://worldbank.org/'
    -
      run: add_resource
      parameters:
        name: 'global-data'
        url: "http://api.worldbank.org/v2/en/indicator/EN.ATM.CO2E.PC?downloadformat=excel"
        format: xls
        headers: 4
    -
      run: stream_remote_resources
    -
      run: set_types
      parameters:
         resources: global-data
         types:
           "[12][0-9]{3}":
              type: number
    -
      run: add_constant
      parameters:
         column-name: the_constant
         value: the value
    -
      run: dump.to_path
      parameters:
          out-path: co2-emisonss-wb



In [3]:
!dpp

Available Pipelines:
- ./worldbank-co2-emissions 


In [4]:
!dpp run ./worldbank-co2-emissions

DEBUG   :Main                            :Using selector: KqueueSelector
INFO    :Main                            :RUNNING ./worldbank-co2-emissions
INFO    :Main                            :- add_metadata
INFO    :Main                            :- add_resource
INFO    :Main                            :- stream_remote_resources
INFO    :Main                            :- set_types
INFO    :Main                            :- add_constant
INFO    :Main                            :- dump.to_path
INFO    :Main                            :DONE /Users/adam/code/os/datapackage-pipelines/datapackage_pipelines/specs/../lib/add_metadata.py
INFO    :Main                            :DONE /Users/adam/code/os/datapackage-pipelines/datapackage_pipelines/specs/../lib/add_resource.py
INFO    :Main                            :stream_remote_resources: INFO    :EN.ATM.CO2E.PC?downloadformat=excel: OPENING http://api.worldbank.org/v2/en/indicator/EN.ATM.CO2E.PC?downloadformat=excel
INFO    :Main          

In [5]:
!ls -l

total 136
-rw-r--r--  1 adam  staff  35478 Apr  3 12:27 Datapackage Pipelines Demo.ipynb
-rw-r--r--  1 adam  staff    629 Feb  1 18:35 add_constant.py
-rw-r--r--  1 adam  staff  20480 Mar 27 18:05 celerybeat-schedule
drwxr-xr-x  4 adam  staff    136 Apr  3 12:27 [34mco2-emisonss-wb[m[m
-rw-r--r--  1 adam  staff    928 Feb  1 18:35 co2-information-cdiac.zip
-rw-r--r--  1 adam  staff    839 Apr  3 12:25 pipeline-spec.yaml


In [6]:
!ls -l co2-emisonss-wb/

total 16
drwxr-xr-x  3 adam  staff   102 Apr  3 12:27 [34mdata[m[m
-rw-------  1 adam  staff  4902 Apr  3 12:27 datapackage.json


In [7]:
%cat co2-emisonss-wb/datapackage.json | json_pp

{
   "count_of_rows" : 264,
   "title" : "CO2 emissions [metric tons per capita]",
   "name" : "co2-emissions",
   "resources" : [
      {
         "headers" : 4,
         "encoding" : "utf-8",
         "schema" : {
            "fields" : [
               {
                  "type" : "string",
                  "name" : "Country Name"
               },
               {
                  "name" : "Country Code",
                  "type" : "string"
               },
               {
                  "name" : "Indicator Name",
                  "type" : "string"
               },
               {
                  "type" : "string",
                  "name" : "Indicator Code"
               },
               {
                  "name" : "1960",
                  "decimalChar" : ".",
                  "type" : "number",
                  "groupChar" : ""
               },
               {
                  "name" : "1961",
                  "groupChar" : 

In [8]:
!head co2-emisonss-wb/data/global-data.csv


Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,the_constant
Aruba,ABW,CO2 emissions (metric tons per capita),EN.ATM.CO2E.PC,,,,,,,,,,,,,,,,,,,,,,,,,,,2.8683193921205543,7.234964017142395,10.026507523290272,10.634732599292175,27.850035399369247,27.407594819182023,22.83981827507877,22.103768379817375,20.79719687092568,20.040995443567464,19.438031131678585,19.31197116341124,19.62153398414226,19.65258864770123,25.54767879548306,25.382489719465198,24.9755145007632,24.90906560841107,24.51054262623807,24.964530995391428,24.766706337399583,25.613714951886028,24.750133212291054,24.876705845231523,24.182702245145034,23.922412101710876,12.713613235279755,8.515395303193712,,,,the value
Afghanistan,AFG,CO2