## Data registry usability set up


Use this section to set up the notebook for a usability analysis using OCDS from the [Data Registry](https://data.open-contracting.org/).

### Setup

#### Import libraries

In [None]:
%%shell
pip install --upgrade 'ocdskingfishercolab<0.4' psycopg2-binary > pip.log

In [None]:
import pandas as pd
from google.colab.files import download
from google.colab import data_table

from ocdskingfishercolab import (
    set_spreadsheet_name,
    save_dataframe_to_sheet,
    download_dataframe_as_csv)

# Load https://pypi.org/project/ipython-sql/
%load_ext sql
# Load https://colab.research.google.com/notebooks/data_table.ipynb
%load_ext google.colab.data_table

Set the name of the spreadsheet to export results to:

In [None]:
spreadsheet_name = 'feedback_results'

set_spreadsheet_name(spreadsheet_name)

#### Install Cardinal

This notebook uses [Cardinal](https://cardinal.readthedocs.io/en/latest/), a python library to calculate Red Flags with OCDS data.

In [None]:
! curl -sSOL https://github.com/open-contracting/cardinal-rs/releases/download/0.0.2/ocdscardinal-0.0.2-linux-64-bit.zip

In [None]:
! unzip -oj ocdscardinal-0.0.2-linux-64-bit.zip ocdscardinal-0.0.2-linux-64-bit/ocdscardinal
! ls

### Download the data from the Data Registry

To select the data source go to the [OCDS data registry](https://data.open-contracting.org/) and select the desired publisher.  For the publisher of choice select a **jsonl file** and copy the url, in the command below.

**In the registry you will also find a description of the data source and direct links to the publisher website where you can find more information about the scope of the publication.**

<img src="https://drive.google.com/uc?id=10dlm8c55pN89YTGEyZgvsLDc8fFMLNf0"  width="200" height="300">

In [None]:
url = input('Add URL from the registry source:')

In [None]:
! curl -sSOJ "$url"

In the files tab at the left hand side of the notebook, look for the file ending in .gz you just downloaded (e.g `chile_compra_api_releases_full.jsonl.gz`), and add it to the command below (see example):

<img src="https://drive.google.com/uc?id=19z86Nj5OY7Y8REfcd2sZbFPXDTAWZYS6" width="200" height="200">



In [None]:
file=input('Add name of .gz file')

In [None]:
file_jsonl=file.replace('.gz', '')

In [None]:
! gunzip -f $file

In [None]:
! ls -lh $file_jsonl

### Calculate the list of fields

Use ocdscardinal [coverage command](https://cardinal.readthedocs.io/en/latest/cli/coverage.html) to extract the OCDS data fields published in the dataset.  Store the results in a dataframe.

**In the command below you need to substitute the name of the jsonl file and the name you want to give to you new file**:

In [None]:
! ./ocdscardinal coverage  $file_jsonl >> result_fields.json

In the table below you will see the list of fields that are published and the number of [OCDS releases](https://standard.open-contracting.org/latest/en/schema/reference/).

In [None]:
fields = pd.DataFrame(pd.read_json('result_fields.json', typ='series'), columns=['releases']).rename_axis('path').reset_index()
#Leaves only object members
fields_table=fields[fields.path.str.contains('[a-z]$')]
fields_table['path']=fields_table['path'].str.replace(r'[][]|^/', '', regex=True)
fields_table


In [None]:
save_dataframe_to_sheet(fields_table, 'fields')