# OC4IDS Database - Check and Import

Use this notebook to check data using the OC4IDS Data Review Tool and to import the data and check results into the OC4IDS database.

If your data is formatted as project package, edit the `source_id` and download url then run all cells in the notebook (`Ctrl+F9`) and enter your database credentials at the prompt.

Otherwise, you need to reformat your data into a project package and save it as `project_package.json` before running the notebook.

Enter database credentials.

> **Helpdesk analysts:** See [CRM-6335](https://crm.open-contracting.org/issues/6335).

In [None]:
import getpass

user = 'postgres'
password = getpass.getpass('Password:')

Set `source_id`:

In [None]:
source_id = 'example'

Download a project package:

In [None]:
%%shell

curl -L https://standard.open-contracting.org/infrastructure/latest/en/_static/example.json > project_package.json

## Setup

Install `psql` client:

In [None]:
%%shell

sudo apt-get update
sudo apt-get install -y postgresql-client

Create a `.pgpass` file with database credentials:

In [None]:
!touch ~/.pgpass
!chmod 0600 ~/.pgpass
!echo database-1.cmc8bohiuyg3.us-east-1.rds.amazonaws.com:5432:postgres:{user}:{password} > ~/.pgpass

Install `.jq`:

In [None]:
%%shell

sudo apt-get install jq

Connect notebook to database:

In [None]:
connection_string = 'postgresql://' + user + ':' + password + '@database-1.cmc8bohiuyg3.us-east-1.rds.amazonaws.com/postgres'

%load_ext sql
%sql $connection_string

Install lib-cove-oc4ids:

In [None]:
%%shell

pip install jsonschema>=3.0.0 --upgrade
pip install libcoveoc4ids

## Check data

Check data using `libcoveoc4ids`:

In [None]:
%%shell

libcoveoc4ids project_package.json > results.json

## Import data and check results

Use `jq` to generate a new-line delimited JSON file from the project package:

In [None]:
%%shell

cat project_package.json | jq -crM .projects[] > projects.json

Import data to `temp_data` table:

In [None]:
%%sql

delete from temp_data;

In [None]:
!cat projects.json | psql -h "database-1.cmc8bohiuyg3.us-east-1.rds.amazonaws.com" -U {user} -d "postgres" -c "COPY temp_data (data) FROM STDIN WITH escape '\' quote e'\x01' delimiter e'\x02' CSV"

Import check results to `temp_checks`:

In [None]:
%%sql

delete from temp_checks;

In [None]:
!cat results.json | jq -crM . | psql -h "database-1.cmc8bohiuyg3.us-east-1.rds.amazonaws.com" -U {user} -d "postgres" -c "COPY temp_checks (cove_output) FROM STDIN WITH escape '\' quote e'\x01' delimiter e'\x02' CSV"

Create collection, copy data to `projects` table, copy check results to `collection_check` table, populate `field_counts` and `project_fields` tables:

In [None]:
%%sql

INSERT INTO collection (source_id, data_version)
    VALUES (:source_id, CURRENT_TIMESTAMP);

INSERT INTO projects (collection_id, project_id, data)
SELECT
    (
        SELECT
            id
        FROM
            collection
        ORDER BY
            id DESC
        LIMIT 1) AS collection_id,
    trim(BOTH '"' FROM (data -> 'id')::text) AS project_id,
    data AS data
FROM
    temp_data;

DELETE FROM temp_data;

INSERT INTO collection_check (collection_id, cove_output)
SELECT
    (
        SELECT
            id
        FROM
            collection
        ORDER BY
            id DESC
        LIMIT 1) AS collection_id,
    cove_output AS cove_output
FROM
    temp_checks;

DELETE FROM temp_checks;

INSERT INTO field_counts
SELECT
    (
        SELECT
            id
        FROM
            collection
        ORDER BY
            id DESC
        LIMIT 1) AS collection_id,
    path,
    regexp_split_to_array(path, '/') AS path_array,
    sum(object_property) object_property,
    sum(array_item) array_count,
    count(DISTINCT id) distinct_projects
FROM
    projects
    CROSS JOIN flatten (data)
WHERE
    collection_id = (
        SELECT
            id
        FROM
            collection
        ORDER BY
            id DESC
        LIMIT 1)
GROUP BY
    collection_id,
    path;

WITH RECURSIVE paths (
    project_id,
    path,
    "value"
) AS (
    SELECT
        project_id,
        (key_value).KEY "path",
        (key_value).value "value",
        'true'::boolean "use_path"
    FROM (
        SELECT
            project_id,
            jsonb_each(data) key_value
        FROM
            projects
        WHERE
            collection_id = (
                SELECT
                    id
                FROM
                    collection
                ORDER BY
                    id DESC
                LIMIT 1)) a
    UNION ALL (
        SELECT
            project_id,
            CASE WHEN key_value IS NOT NULL THEN
                path || '/'::text || (key_value).KEY::text
            ELSE
                path
            END "path",
            CASE WHEN key_value IS NOT NULL THEN
            (key_value).value
        ELSE
            array_value
            END "value",
            key_value IS NOT NULL "use_path"
        FROM (
            SELECT
                project_id,
                path,
                jsonb_each(
                    CASE WHEN jsonb_typeof(value) = 'object' THEN
                        value
                    ELSE
                        '{}'::jsonb
                    END) key_value,
                jsonb_array_elements(
                    CASE WHEN jsonb_typeof(value) = 'array'
                        AND jsonb_typeof(value -> 0) = 'object' THEN
                        value
                    ELSE
                        '[]'::jsonb
                    END) "array_value"
            FROM
                paths) a))
    INSERT INTO project_fields
    SELECT
        (
            SELECT
                id
            FROM
                collection
            ORDER BY
                id DESC
            LIMIT 1) AS collection_id,
        project_id,
        array_agg(path) AS paths
FROM
    paths
WHERE
    use_path
GROUP BY
    project_id;
