# Running pipelines

The pipelines are run on the server periodically and based on pipeline and data dependencies. 

You can also run specific pipelines manually for development or to run custom pipelines.

## Change directory to project root

The Jupyter notebooks run in the jupyter-notebooks directory. To run pipelines you need to change directory to the parent directory

When running using Docker the directory will be `/pipelines`

In [1]:
import os

os.chdir('..')
os.getcwd()

'/pipelines'

## List the available pipelines

In [2]:
!{'dpp'}

Available Pipelines:
- ./knesset/kns_knessetdates (*)
- ./knesset/kns_govministry (*)
- ./knesset/kns_itemtype (*)
- ./knesset/kns_status (*)
- ./members/kns_person (*)
- ./members/kns_position (*)
- ./members/kns_persontoposition (*)
- ./members/kns_mksitecode (*)
- ./members/mk_individual (E)
	Dirty dependency: Cannot run until dependency is executed: ./members/kns_mksitecode
	Missing dependency: Couldn't open datapackage data/members/kns_mksitecode/datapackage.json
	Dirty dependency: Cannot run until dependency is executed: ./members/kns_persontoposition
	Missing dependency: Couldn't open datapackage data/members/kns_persontoposition/datapackage.json
	Dirty dependency: Cannot run until dependency is executed: ./members/kns_position
	Missing dependency: Couldn't open datapackage data/members/kns_position/datapackage.json
	Dirty dependency: Cannot run until dependency is executed: ./members/kns_person
	Missing dependency: Couldn't open datapackage data/members/kns_person/datapackage.j

## Run a pipeline

The following runs the `./committees/kns_committee` pipeline which downloads committees from the Knesset API

In [3]:
!{'dpp run --verbose ./committees/kns_committee'}

[./committees/kns_committee:T_0] >>> INFO    :ce2aa8bd RUNNING ./committees/kns_committee
[./committees/kns_committee:T_0] >>> INFO    :ce2aa8bd Collecting dependencies
[./committees/kns_committee:T_0] >>> INFO    :ce2aa8bd Running async task
[./committees/kns_committee:T_0] >>> INFO    :ce2aa8bd Waiting for completion
[./committees/kns_committee:T_0] >>> INFO    :ce2aa8bd Async task starting
[./committees/kns_committee:T_0] >>> INFO    :ce2aa8bd Searching for existing caches
[./committees/kns_committee:T_0] >>> INFO    :ce2aa8bd Building process chain:
[./committees/kns_committee:T_0] >>> INFO    :- ..datapackage_pipelines_knesset.dataservice.processors.add_dataservice_collection_resource
[./committees/kns_committee:T_0] >>> INFO    :- ..datapackage_pipelines_knesset.common.processors.throttle
[./committees/kns_committee:T_0] >>> INFO    :- knesset.dump_to_path
[./committees/kns_committee:T_0] >>> INFO    :- knesset.dump_to_sql
[./committees/kns_committee:T_0] >>> INFO    :- (sink)
[.

## Inspect the output datapackage descriptor

Pipelines use datapackages as the primary input and output data.

Pipeline and datapackage names usually match, so the output of the `./committees/kns_committee` pipeline is available at local directory `./data/committees/kns_committee/datapackage.json`

In [4]:
KNS_COMMITTEE_DATAPACKAGE_PATH = './data/committees/kns_committee/datapackage.json'

Each package may contain multiple resources, let's see which resource names are available for the kns_committee package

In [5]:
from datapackage import Package

kns_committee_package = Package(KNS_COMMITTEE_DATAPACKAGE_PATH)
kns_committee_package.resource_names

['kns_committee']

In [6]:
KNS_COMMITTEE_RESOURE_NAME = 'kns_committee'

Inspect the kns_committee resource descriptor which includes metadata and field descriptions

In [7]:
import yaml

print(yaml.dump(kns_committee_package.get_resource(KNS_COMMITTEE_RESOURE_NAME).descriptor, 
                allow_unicode=True, default_flow_style=False))

bytes: 175227
count_of_rows: 729
dialect:
  delimiter: ','
  doubleQuote: true
  lineTerminator: "\r\n"
  quoteChar: '"'
  skipInitialSpace: false
encoding: utf-8
format: csv
hash: 7a034fe5da80e37c797770486ab35e79
name: kns_committee
path: kns_committee.csv
profile: data-resource
schema:
  fields:
  - description: קוד הוועדה
    name: CommitteeID
    type: integer
  - description: שם הוועדה
    name: Name
    type: string
  - description: קוד הקטגוריה של הוועדה
    name: CategoryID
    type: integer
  - description: 'תיאור הקטגוריה של הוועדה בכל כנסת, כל הוועדות מוקמות מחדש. השדה
      קטגוריה כולל את רשימת הקטגוריות הנושאיות שאליהן משויכות הוועדות. למשל הקטגוריה
      של ועדת הפנים והגנת הסביבה היא "פנים" וכך היה גם כאשר שם הוועדה היה ועדת הפנים
      ואיכות הסביבה. גם ועדות המשנה של כל ועדה משויכות לקטגוריה שלה. מדובר בשיוך נושאי
      של הוועדות.

      '
    name: CategoryDesc
    type: string
  - description: מספר הכנסת
    name: KnessetNum
    type: integer
  - description: קוד ס

Print the first 5 row of data

In [8]:
for i, row in enumerate(kns_committee_package.get_resource(KNS_COMMITTEE_RESOURE_NAME).iter(keyed=True), 1):
    if i > 5: continue
    print(f'-- row {i} --')
    print(yaml.dump(row, allow_unicode=True, default_flow_style=False))
    

-- row 1 --
AdditionalTypeDesc: קבועה
AdditionalTypeID: 991
CategoryDesc: ועדת הכנסת
CategoryID: 1
CommitteeID: 1
CommitteeParentName: null
CommitteeTypeDesc: ועדת הכנסת
CommitteeTypeID: 70
Email: vadatk@knesset.gov.il
FinishDate: null
IsCurrent: true
KnessetNum: 15
LastUpdatedDate: 2017-04-24 16:47:06
Name: הכנסת
ParentCommitteeID: null
StartDate: 1999-06-07 00:00:00

-- row 2 --
AdditionalTypeDesc: קבועה
AdditionalTypeID: 991
CategoryDesc: ועדת הכספים
CategoryID: 2
CommitteeID: 2
CommitteeParentName: null
CommitteeTypeDesc: ועדה ראשית
CommitteeTypeID: 71
Email: null
FinishDate: null
IsCurrent: true
KnessetNum: 15
LastUpdatedDate: 2015-03-20 12:02:57
Name: הכספים
ParentCommitteeID: null
StartDate: 1999-06-07 00:00:00

-- row 3 --
AdditionalTypeDesc: קבועה
AdditionalTypeID: 991
CategoryDesc: ועדת החוץ והביטחון
CategoryID: 4
CommitteeID: 3
CommitteeParentName: null
CommitteeTypeDesc: ועדה ראשית
CommitteeTypeID: 71
Email: null
FinishDate: null
IsCurrent: true
KnessetNum: 15
LastUpdatedDa