# Render site pages

[dpp](https://github.com/frictionlessdata/datapackage-pipelines) runs the knesset data pipelines periodically on our server.

This notebook shows how to run pipelines that render pages for the static website at https://oknesset.org

## Load the source data

Download the source data, can take a few minutes.

In [None]:
!{'cd /pipelines; KNESSET_LOAD_FROM_URL=1 dpp run --concurrency 4 '\
  './committees/kns_committee,'\
  './people/committee-meeting-attendees,'\
  './members/mk_individual'}

## Run the build pipeline

This pipeline aggregates the relevant data and allows to filter for quicker development cycles.

You can uncomment and modify the filter step in committees/dist/knesset.source-spec.yaml under the `build` pipeline to change the filter.

The build pipeline can take a few minutes to process for the first time.

In [2]:
!{'cd /pipelines; dpp run --verbose ./committees/dist/build'}

[./committees/dist/build:T_0] >>> INFO    :168911d3 RUNNING ./committees/dist/build
[./committees/dist/build:T_0] >>> INFO    :168911d3 Collecting dependencies
[./committees/dist/build:T_0] >>> INFO    :168911d3 Running async task
[./committees/dist/build:T_0] >>> INFO    :168911d3 Waiting for completion
[./committees/dist/build:T_0] >>> INFO    :168911d3 Async task starting
[./committees/dist/build:T_0] >>> INFO    :168911d3 Searching for existing caches
[./committees/dist/build:T_0] >>> INFO    :168911d3 Building process chain:
[./committees/dist/build:T_0] >>> INFO    :- load_resource
[./committees/dist/build:T_0] >>> INFO    :- knesset.load_large_csv_resource
[./committees/dist/build:T_0] >>> INFO    :- knesset.rename_resource
[./committees/dist/build:T_0] >>> INFO    :- load_resource
[./committees/dist/build:T_0] >>> INFO    :- filter
[./committees/dist/build:T_0] >>> INFO    :- build_meetings
[./committees/dist/build:T_0] >>> INFO    :- dump.to_path
[./committees/dist/build:T_0] 

## Download some protocol files for rendering

upgrade to latest dataflows library

In [None]:
!{'pip install --upgrade dataflows'}

Restart the kernel if an upgrade was done

Choose some session IDs to download protocol files for:

In [1]:
session_ids = [2063122, 2063126]

In [2]:
from dataflows import Flow, load, printer, filter_rows

sessions_data = Flow(
    load('/pipelines/data/committees/kns_committeesession/datapackage.json'),
    filter_rows(lambda row: row['CommitteeSessionID'] in session_ids),
    printer(tablefmt='html')
).results()

#,CommitteeSessionID (integer),Number (integer),KnessetNum (integer),TypeID (integer),TypeDesc (string),CommitteeID (integer),Location (string),SessionUrl (string),BroadcastUrl (string),StartDate (datetime),FinishDate (datetime),Note (string),LastUpdatedDate (datetime),download_crc32c (string),download_filename (string),download_filesize (integer),parts_crc32c (string),parts_filesize (integer),parts_parsed_filename (string),text_crc32c (string),text_filesize (integer),text_parsed_filename (string),topics (array),committee_name (string)
1,2063122,29,15,161,פתוחה,2045,"חדר הוועדה, באגף הוועדות (קדמה), קומה 3, חדר 3710",http://main.knesset.gov.il/Activity/committees/Pages/AllCommitteesAgenda.aspx?Tab=3&ItemID=2063122,,2000-07-05 00:00:00,2000-07-05 00:00:00,"פניות ציבור בנושא איכות והתאמה לתקנים של שירותי הסעדה בבתי-הספר, פעוטונים, קייטנות ומוסדות ציבור",2018-10-10 11:03:06,UCgupg==,files/23/4/3/434231.DOC,47154,/4kpmQ==,85239,files/2/0/2063122.csv,pybkkw==,85134,files/2/0/2063122.txt,,המיוחדת לפניות הציבור
2,2063126,33,15,161,פתוחה,2045,"חדר הוועדה, באגף הוועדות (קדמה), קומה 3, חדר 3710",http://main.knesset.gov.il/Activity/committees/Pages/AllCommitteesAgenda.aspx?Tab=3&ItemID=2063126,,2000-10-30 00:00:00,2000-10-30 00:00:00,פניות של דיירי רחוב מאור הגולה בשכונת שפירא בתל-אביב שביתם נהרס והם ממשיכים לשלם משכנתא ולא מקבלים כ ...,2018-10-10 11:03:06,ryN9+g==,files/23/4/3/434233.DOC,36724,qiGAHw==,56525,files/2/0/2063126.csv,+Gw5Mw==,56419,files/2/0/2063126.txt,,המיוחדת לפניות הציבור


In [7]:
import os
import subprocess
import sys

for session in sessions_data[0][0]:
    for attr in ['text_parsed_filename', 'parts_parsed_filename']:
        pathpart = 'meeting_protocols_text' if attr == 'text_parsed_filename' else 'meeting_protocols_parts'
        url = 'https://production.oknesset.org/pipelines/data/committees/{}/{}'.format(pathpart, session[attr])
        filename = '/pipelines/data/committees/{}/{}'.format(pathpart, session[attr])
        os.makedirs(os.path.dirname(filename), exist_ok=True)
        cmd = 'curl -s -o {} {}'.format(filename, url)
        print(cmd, file=sys.stderr)
        subprocess.check_call(cmd, shell=True)

curl -s -o /pipelines/data/committees/meeting_protocols_text/files/2/0/2063122.txt https://production.oknesset.org/pipelines/data/committees/meeting_protocols_text/files/2/0/2063122.txt
curl -s -o /pipelines/data/committees/meeting_protocols_parts/files/2/0/2063122.csv https://production.oknesset.org/pipelines/data/committees/meeting_protocols_parts/files/2/0/2063122.csv
curl -s -o /pipelines/data/committees/meeting_protocols_text/files/2/0/2063126.txt https://production.oknesset.org/pipelines/data/committees/meeting_protocols_text/files/2/0/2063126.txt
curl -s -o /pipelines/data/committees/meeting_protocols_parts/files/2/0/2063126.csv https://production.oknesset.org/pipelines/data/committees/meeting_protocols_parts/files/2/0/2063126.csv


## Delete dist hash files

In [8]:
%%bash
find /pipelines/data/committees/dist -type f -name '*.hash' -delete

## Render pages

Should run the render pipelines in the following order:

## Meetings:

In [9]:
!{'cd /pipelines; dpp run ./committees/dist/render_meetings'}

[1A
[2K./committees/dist/render_meetings: [31mWAITING FOR OUTPUT[0m
[2A
[2K./committees/dist/render_meetings: [33mRUNNING, processed 94 rows[0m
[2A
[2K./committees/dist/render_meetings: [32mSUCCESS, processed 94 rows[0m
INFO    :RESULTS:
INFO    :SUCCESS: ./committees/dist/render_meetings {'bytes': 1742, 'count_of_rows': 94, 'dataset_name': '_', 'failed meetings': 0, 'hash': 'fb41c59fff6c4eced438aa6e29556b24', 'kns_committees': 756, 'meetings': 94, 'mk_individuals': 1015}


#### Rendered meetings stats

In [10]:
from dataflows import Flow, load, printer, filter_rows, add_field

def add_filenames():
    
    def _add_filenames(row):
        for ext in ['html', 'json']:
            row['rendered_'+ext] = '/pipelines/data/committees/dist/dist/meetings/{}/{}/{}.{}'.format(
                str(row['CommitteeSessionID'])[0], str(row['CommitteeSessionID'])[1], str(row['CommitteeSessionID']), ext)
    
    return Flow(
        add_field('rendered_html', 'string'),
        add_field('rendered_json', 'string'),
        _add_filenames
    )

rendered_meetings = Flow(
    load('/pipelines/data/committees/dist/rendered_meetings_stats/datapackage.json'), 
    add_filenames(),
    filter_rows(lambda row: row['CommitteeSessionID'] in session_ids),
    printer(tablefmt='html')
).results()[0][0]

#,CommitteeSessionID (integer),num_speech_parts (integer),hash (string),rendered_html (string),rendered_json (string)
1,2063122,186,,/pipelines/data/committees/dist/dist/meetings/2/0/2063122.html,/pipelines/data/committees/dist/dist/meetings/2/0/2063122.json
2,2063126,209,,/pipelines/data/committees/dist/dist/meetings/2/0/2063126.html,/pipelines/data/committees/dist/dist/meetings/2/0/2063126.json


## Committees and homepage

In [13]:
!{'cd /pipelines; dpp run ./committees/dist/render_committees'}

[1A
[2K./committees/dist/render_committees: [31mWAITING FOR OUTPUT[0m
[2A
[2K./committees/dist/render_committees: [31mWAITING FOR OUTPUT[0m
[2A
[2K./committees/dist/render_committees: [32mSUCCESS, processed 0 rows[0m
INFO    :RESULTS:
INFO    :SUCCESS: ./committees/dist/render_committees {'all chairpersons': 756, 'all committees': 756, 'all meeting stats': 94, 'all meetings': 94, 'all members': 7446, 'all mks': 1015, 'all others': 2, 'all replacements': 244, 'all watchers': 2, 'built index': 1, 'built_committees': 756, 'built_knesset_nums': 21, 'failed_committees': 0, 'failed_knesset_nums': 0}


## Members / Factions

In [12]:
!{'cd /pipelines; dpp run ./committees/dist/create_members,./committees/dist/build_positions,./committees/dist/create_factions'}

[1A
[2K./committees/dist/build_positions: [31mWAITING FOR OUTPUT[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 100 rows[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 200 rows[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 300 rows[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 400 rows[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 500 rows[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 600 rows[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 700 rows[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 800 rows[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 900 rows[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 1000 rows[0m
[2A
[2K./committees/dist/build_positions: [33mRUNNING, processed 1100 rows[0m
[2A
[2K./committees/dist/build_po

## Showing the rendered pages

To serve the site, locate the correspondoing local directory for /pipelines/data/committees/dist/dist and run:
    
`python -m http.server 8000`

Pages should be available at http://localhost:8000/