# OC4IDS Publisher Status Report

## How to author a report

1. Load the data that you want to report on using the [data import notebook](https://colab.research.google.com/github/open-contracting/oc4ids_database/blob/main/OC4IDS_Database_Data_Import.ipynb).
2. Run the status checks using the [status checks notebook]().
3. Run the cells in [Appendix 1: Report Setup](#scrollTo=GYwdqQevW-zi).
4. Run all cells in the [Summary](), [Criteria](), [Checks]() and [Metrics]() sections.
5. For each criteria and check whose methodology is `manual`:
  1. Follow the instructions in the methodology
  2. Add code and/or Markdown cells to document your findings (e.g. check failures).
  3. Save the results.
6. Run the cells in the Summary section to update the summary table.
7. Remove this cell.

## Introduction

This report assesses the status of OC4IDS publications. It covers:

* Quality [criteria](#scrollTo=7N-KAMJHQkad) that all OC4IDS publications should meet.
* Other [checks](#scrollTo=JZ_mhib6Q_sK) on the quality of OC4IDS data.
* [Metrics](#scrollTo=xmCUh-_CtLwA) related to the criteria and checks.
* [Coverage](#scrollTo=yX4trq4L2Nro) measured against the OC4IDS schema, the core CoST IDS elements and the CoST IDS sustainability modules.

## Data sources

This report covers data from the following OC4IDS publications:

In [5]:
# @title ### Publications

%%sql

select
  source_id,
  data_version as collection_date
from
  collection
join
  run_collection
on
  collection.id = run_collection.collection_id
where
  run_id = :run_id
order by
  source_id asc;

Unnamed: 0,source_id,collection_date
0,ghana_cost_sekondi_takoradi,2024-08-02 05:56:23.308692
1,indonesia_cost_west_lombok,2024-08-02 06:03:16.244687
2,malawi_cost_malawi,2024-08-02 06:17:25.776151
3,mexico_cost_jalisco,2024-08-02 05:51:54.429629
4,mexico_nuevo_leon,2024-08-02 06:01:51.084424
5,uganda_gpp,2024-08-02 06:06:35.558474


## Summary

This section provides a summary of criteria and check results. It is intended to support comparison between publications and assessment of the overall quality of the corpus of OC4IDS data.

`True` indicates success against a criteria or check, `False` indicates failure and `None` indicates that a critiera or check was not assessed.



In [6]:
# @title ### Comparison table
get_results(run_id = run_id, extra_results = manual_checks)

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
check,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Check: Contract values are realistic,True,True,False,True,True,False
Check: Coordinates are valid,True,False,False,True,True,True
Check: Dates are realistic,True,False,False,False,True,True
Check: Funder names are realistic,,,,,,
Check: Public authority names are realistic,,,,,,
Check: Roles are set,True,False,False,True,True,False
Check: Sectors are standardised,False,True,False,False,True,True
Check: Supplier names are realistic,,,,,,
Criteria: Active,False,True,True,True,True,True
Criteria: Appropriate,,,,,,


## Criteria

This section assesses publications against pass/fail criteria that all publications should meet.

### Registered

**Description:**

The data uses an OC4IDS prefix in its project identifiers.

**Methodology:** `automated`

Check against the [list of registered prefixes](https://standard.open-contracting.org/infrastructure/latest/en/reference/prefixes/).

**Output:**

List of project identifiers without a registered prefix.

In [7]:
# @title #### Output
get_output(run_id = run_id, check_id = 'criteria_registered')

In [8]:
# @title ### Results
get_results(run_id = run_id, check_id='criteria_registered')

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
check,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Criteria: Registered,True,True,True,True,True,True


### Discoverable

**Description:**

It is possible to discover the data by navigating a website whose homepage is indexed by popular web search engines.

**Methodology:** `manual`

Ask the publisher where the access methods are publicly listed and/or review the publisher’s website.

**Output:**

None

In [9]:
# @title #### Results

display_result_widgets('criteria_discoverable')

Dropdown(description='ghana_cost_sekondi_takoradi:', layout=Layout(width='35em'), options=(True, False, None),…

Dropdown(description='indonesia_cost_west_lombok:', layout=Layout(width='35em'), options=(True, False, None), …

Dropdown(description='malawi_cost_malawi:', layout=Layout(width='35em'), options=(True, False, None), style=De…

Dropdown(description='mexico_cost_jalisco:', layout=Layout(width='35em'), options=(True, False, None), style=D…

Dropdown(description='mexico_nuevo_leon:', layout=Layout(width='35em'), options=(True, False, None), style=Des…

Dropdown(description='uganda_gpp:', layout=Layout(width='35em'), options=(True, False, None), style=Descriptio…

Button(description='Save', style=ButtonStyle())

### Retrievable


**Description:**

It is possible to automate the download of all the data, either using an HTML page listing bulk download URLs, or using only machine-readable data as input.

**Methodology:** `manual`

First review: Author and run a Python scraper.

Subsequent reviews: Run the Python scraper and update if needed.

**Output:**

None

In [10]:
# @title #### Results

display_result_widgets('criteria_retrievable')

Dropdown(description='ghana_cost_sekondi_takoradi:', layout=Layout(width='35em'), options=(True, False, None),…

Dropdown(description='indonesia_cost_west_lombok:', layout=Layout(width='35em'), options=(True, False, None), …

Dropdown(description='malawi_cost_malawi:', layout=Layout(width='35em'), options=(True, False, None), style=De…

Dropdown(description='mexico_cost_jalisco:', layout=Layout(width='35em'), options=(True, False, None), style=D…

Dropdown(description='mexico_nuevo_leon:', layout=Layout(width='35em'), options=(True, False, None), style=Des…

Dropdown(description='uganda_gpp:', layout=Layout(width='35em'), options=(True, False, None), style=Descriptio…

Button(description='Save', style=ButtonStyle())

### Reviewable

**Description:**

The OC4IDS Data Review Tool is able to report results on the data.

**Methodology:** `manual`

Check that libcoveoc4ids reports results.


**Output:**

None

In [11]:
# @title #### Results

display_result_widgets('criteria_reviewable')

Dropdown(description='ghana_cost_sekondi_takoradi:', layout=Layout(width='35em'), options=(True, False, None),…

Dropdown(description='indonesia_cost_west_lombok:', layout=Layout(width='35em'), options=(True, False, None), …

Dropdown(description='malawi_cost_malawi:', layout=Layout(width='35em'), options=(True, False, None), style=De…

Dropdown(description='mexico_cost_jalisco:', layout=Layout(width='35em'), options=(True, False, None), style=D…

Dropdown(description='mexico_nuevo_leon:', layout=Layout(width='35em'), options=(True, False, None), style=Des…

Dropdown(description='uganda_gpp:', layout=Layout(width='35em'), options=(True, False, None), style=Descriptio…

Button(description='Save', style=ButtonStyle())

### Appropriate

**Description:**

Concepts are published in semantic accordance with the rules of the OC4IDS rather than using a non-OC4IDS field or code. There must not be more than 10 cases in which a concept is covered by a field or code in OC4IDS but is disclosed using another field or code.

**Methodology:** `manual`

Review the output to identify concepts covered by a field or code in OC4IDS but disclosed using another field or code.


**Output:**

List of additional fields and example values reported by the Data Review Tool.

In [12]:
# @title #### Output

%%sql

select
  source_id,
  output.key as path,
  output.value -> 'count' as count,
  output.value -> 'examples' as examples
from
  check_results
cross join
  jsonb_each(output) as output
join collection on
  collection_id = collection.id
where
  run_id = :run_id
and
  check_id = 'criteria_appropriate'
order by
  source_id asc;

Unnamed: 0,source_id,path,count,examples
0,ghana_cost_sekondi_takoradi,/projects/budget/budgetBreakdown/id,6,[4_Construction_of_1_No._CHPS_Compound_at_Yabi...
1,ghana_cost_sekondi_takoradi,/projects/budget/budgetBreakdown/currency,6,"[GHS, GHS, GHS]"
2,ghana_cost_sekondi_takoradi,/projects/budget/budgetBreakdown/description,6,"[Construction of 1 No. CHPS Compound at Yabiw,..."
3,ghana_cost_sekondi_takoradi,/projects/parties/additionalContactPoints/name,54,"[William Tei-Kpoti, C.Ing Ebenezer Annoh - Kwa..."
4,ghana_cost_sekondi_takoradi,/projects/budget/budgetBreakdown/period/endDate,6,"[2020-03-30T00:00:00.000Z, 2021-11-15T00:00:00..."
5,ghana_cost_sekondi_takoradi,/projects/parties/additionalContactPoints/email,54,"[wteikpoti@yahoo.com, eakwafot@yahoo.com, eric..."
6,ghana_cost_sekondi_takoradi,/projects/budget/budgetBreakdown/period/startDate,6,"[2019-03-15T00:00:00.000Z, 2021-07-15T00:00:00..."
7,ghana_cost_sekondi_takoradi,/projects/parties/additionalContactPoints/tele...,54,"[0244678562, 0507128000, 0208447948]"
8,ghana_cost_sekondi_takoradi,/projects/budget/budgetBreakdown/period/maxExt...,3,"[2022-02-28T00:00:00.000Z, 2022-08-31T00:00:00..."
9,indonesia_cost_west_lombok,/projects/budget/budgetBreakdown/id,270,"[SIE-0011/2017, PEI-0105/2018, PEI-0119/2019]"


In [13]:
# @title #### Results

display_result_widgets('criteria_appropriate')

Dropdown(description='ghana_cost_sekondi_takoradi:', layout=Layout(width='35em'), options=(True, False, None),…

Dropdown(description='indonesia_cost_west_lombok:', layout=Layout(width='35em'), options=(True, False, None), …

Dropdown(description='malawi_cost_malawi:', layout=Layout(width='35em'), options=(True, False, None), style=De…

Dropdown(description='mexico_cost_jalisco:', layout=Layout(width='35em'), options=(True, False, None), style=D…

Dropdown(description='mexico_nuevo_leon:', layout=Layout(width='35em'), options=(True, False, None), style=Des…

Dropdown(description='uganda_gpp:', layout=Layout(width='35em'), options=(True, False, None), style=Descriptio…

Button(description='Save', style=ButtonStyle())

### Active

**Description:**

The data has been updated within the last 12 months.

**Methodology:** `automated`

There is a project with a top-level `updated` field value within the last 12 months.

**Output:**

None. For more information, see the [last updated metric](#scrollTo=RdJl4q6sw-pj).

In [14]:
# @title #### Results
get_results(run_id = run_id, check_id='criteria_active')

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
check,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Criteria: Active,False,True,True,True,True,True


### Documented

**Description:**

The publisher provides a publication policy/data user guide alongside the data.

**Methodology:** `manual`

Ask the publisher where the publication policy/data user guide is publicly available and/or review the publisher’s website.

**Output:**

None


In [15]:
# @title #### Results

display_result_widgets('criteria_documented')

Dropdown(description='ghana_cost_sekondi_takoradi:', layout=Layout(width='35em'), options=(True, False, None),…

Dropdown(description='indonesia_cost_west_lombok:', layout=Layout(width='35em'), options=(True, False, None), …

Dropdown(description='malawi_cost_malawi:', layout=Layout(width='35em'), options=(True, False, None), style=De…

Dropdown(description='mexico_cost_jalisco:', layout=Layout(width='35em'), options=(True, False, None), style=D…

Dropdown(description='mexico_nuevo_leon:', layout=Layout(width='35em'), options=(True, False, None), style=Des…

Dropdown(description='uganda_gpp:', layout=Layout(width='35em'), options=(True, False, None), style=Descriptio…

Button(description='Save', style=ButtonStyle())

### Accessible

**Description:**

The data is available as a bulk download in tabular (CSV or spreadsheet) format.

**Methodology:** `manual`

Ask the publisher for a link to the bulk downloads and/or review the publisher’s website.

**Output:**

None

In [16]:
# @title #### Results

display_result_widgets('criteria_accessible')

Dropdown(description='ghana_cost_sekondi_takoradi:', layout=Layout(width='35em'), options=(True, False, None),…

Dropdown(description='indonesia_cost_west_lombok:', layout=Layout(width='35em'), options=(True, False, None), …

Dropdown(description='malawi_cost_malawi:', layout=Layout(width='35em'), options=(True, False, None), style=De…

Dropdown(description='mexico_cost_jalisco:', layout=Layout(width='35em'), options=(True, False, None), style=D…

Dropdown(description='mexico_nuevo_leon:', layout=Layout(width='35em'), options=(True, False, None), style=Des…

Dropdown(description='uganda_gpp:', layout=Layout(width='35em'), options=(True, False, None), style=Descriptio…

Button(description='Save', style=ButtonStyle())

### Valid

**Description:**

The OC4IDS Data Review Tool reports no validation errors.

**Methodology:** `automated`

Use libcoveoc4ids to generate a list of validation errors.

**Output:**

None. For more information, see the [validation error count metric](#scrollTo=HYOcgsFSxKWD).

In [17]:
# @title #### Results

get_results(run_id = run_id, check_id='criteria_valid')

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
check,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Criteria: Valid,False,False,False,False,False,True


### Conformant

**Description:**

The OC4IDS Data Review Tool reports no structure warnings.

**Methodology:** `automated`

Use libcoveoc4ids to generate a list of structure warnings.

**Output:**

None


In [18]:
# @title ### Results

get_results(run_id = run_id, check_id='criteria_conformant')

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
check,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Criteria: Conformant,True,True,True,False,True,True


## Checks

This section documents the results of pass/fail checks on the quality of OC4IDS data.

### Sectors are standardised

**Description:**

Projects are classified against the OC4IDS sector codelist

**Methodology:** `automated`

Check that `sector` is present for at least one project and that it contains no values from outside the OC4IDS sector codelist.

**Output:**

List of additional sector codes.

In [19]:
# @title ### Output

%%sql

select
  source_id,
  output -> 'all_projects' as additional_codes
from
  check_results
cross join
  jsonb_each(output)
join collection on
  collection_id = collection.id
where
  run_id = :run_id
and
  check_id = 'semantics_sector_codelist'
order by
  source_id asc;

Unnamed: 0,source_id,additional_codes
0,ghana_cost_sekondi_takoradi,"[culture, sports and recreation, transport (po..."
1,indonesia_cost_west_lombok,
2,malawi_cost_malawi,"[agriculture, homeland_security]"
3,mexico_cost_jalisco,[por_definir]
4,mexico_nuevo_leon,
5,uganda_gpp,


In [20]:
# @title ### Results

get_results(run_id = run_id, check_id='semantics_sector_codelist')

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
check,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Check: Sectors are standardised,False,True,False,False,True,True


### Public authority names are realistic

**Description:**

Check that a sample of public authority names are realistic e.g. they are government departments, rather than suppliers or individuals etc.

**Methodology:** `manual`

Review the output and check that names are realistic.

**Output:**

Sample of `publicAuthority.name` values.

In [21]:
# @title #### Output

get_output(run_id, 'semantics_public_authority_names')

Unnamed: 0,source_id,project_id,output
0,indonesia_cost_west_lombok,oc4ids-jj5f2u-1d2e63a0-fd51-11ed-bb9f-c1147151...,"Dinas Lingkungan Hidup Kabupaten Lombok Barat,..."
1,indonesia_cost_west_lombok,oc4ids-jj5f2u-42eb8c40-fd0e-11ed-89a0-391652bd...,"Dinas Kesehatan, Bidang Sumber Daya Kesehatan"
2,indonesia_cost_west_lombok,oc4ids-jj5f2u-48767050-fd90-11ed-ae27-c7c77287...,"DINAS PERTANIAN LOMBOK BARAT, Dinas Pertanian"
3,indonesia_cost_west_lombok,oc4ids-jj5f2u-4d16d940-c14d-11ed-948e-df17dcc0...,"satpolpp lobar, Satpolpp"
4,indonesia_cost_west_lombok,oc4ids-jj5f2u-4e9aba00-fca6-11ed-aabd-47697b57...,"Dispar Lobar, Bidang Destinasi Pariwisata"
5,indonesia_cost_west_lombok,oc4ids-jj5f2u-58630f30-fd32-11ed-b023-81ce51c6...,"DPUTR, Bidang Bina Marga"
6,indonesia_cost_west_lombok,oc4ids-jj5f2u-6feff700-fe24-11ed-8f04-3b059f5b...,"Dinas Perumahan dan Permukiman, Dinas Perumaha..."
7,indonesia_cost_west_lombok,oc4ids-jj5f2u-abe027e0-1d8c-11ef-93e4-ef158f31...,"Dispar Lobar, Bidang Pemasaran Pariwisata"
8,indonesia_cost_west_lombok,oc4ids-jj5f2u-bb36b3b0-fd5e-11ed-a750-fbd6f22f...,"Sekretariat Daerah, Sekretariat Daerah"
9,indonesia_cost_west_lombok,oc4ids-jj5f2u-fd9e6b60-1be1-11ef-a04d-81e723be...,"Badan Perencanaan Pembangunan Daerah, Sekretar..."


In [22]:
# @title ##### Results

display_result_widgets('semantics_public_authority_names')

Dropdown(description='ghana_cost_sekondi_takoradi:', layout=Layout(width='35em'), options=(True, False, None),…

Dropdown(description='indonesia_cost_west_lombok:', layout=Layout(width='35em'), options=(True, False, None), …

Dropdown(description='malawi_cost_malawi:', layout=Layout(width='35em'), options=(True, False, None), style=De…

Dropdown(description='mexico_cost_jalisco:', layout=Layout(width='35em'), options=(True, False, None), style=D…

Dropdown(description='mexico_nuevo_leon:', layout=Layout(width='35em'), options=(True, False, None), style=Des…

Dropdown(description='uganda_gpp:', layout=Layout(width='35em'), options=(True, False, None), style=Descriptio…

Button(description='Save', style=ButtonStyle())

### Supplier names are realistic

**Description:**

Check that a sample of supplier names are realistic e.g. they are private businesses, rather than government departments etc.


**Methodology:** `manual`

Review the output and check that names are realistic.

**Output:**

Sample of `contractingProcesses/summary/suppliers/name` values.


In [23]:
# @title ### Output

get_output(run_id, 'semantics_supplier_names')

Unnamed: 0,source_id,project_id,output
0,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-38_Construction_Of_10_Seater_Wc,M/S Opo-Max
1,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-39_Construction_Of_1no._6-Unit_C...,M/S US Construction Limited
2,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-85_Construction_Of_1no.CHPS_Comp...,Smartfalcon Company Limited
3,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-73_Construction_Of_1no._Fish_Smo...,M/S Richtech Enterprise Limited
4,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-19_Construction_of_a_900mm_U-dra...,m/s Dagbene Borns Company Ltd
5,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-56_Construction_Of_1_No_3_Unit_C...,M/S U.S. Global Co. Ltd
6,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-75_Construction_Of_Male_And_Fema...,M/S U.S. Global Co. Ltd
7,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-46_Construction_Of_1no.__3_Unit_...,1
8,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-48_Construction_Of_Storm_Drain_F...,FINISHERS COMPANY LIMTED
9,indonesia_cost_west_lombok,oc4ids-jj5f2u-0aa03690-c14c-11ed-9c82-91310aed...,CV. KENCANA PUTIH


In [24]:
# @title #### Results

display_result_widgets('semantics_supplier_names')

Dropdown(description='ghana_cost_sekondi_takoradi:', layout=Layout(width='35em'), options=(True, False, None),…

Dropdown(description='indonesia_cost_west_lombok:', layout=Layout(width='35em'), options=(True, False, None), …

Dropdown(description='malawi_cost_malawi:', layout=Layout(width='35em'), options=(True, False, None), style=De…

Dropdown(description='mexico_cost_jalisco:', layout=Layout(width='35em'), options=(True, False, None), style=D…

Dropdown(description='mexico_nuevo_leon:', layout=Layout(width='35em'), options=(True, False, None), style=Des…

Dropdown(description='uganda_gpp:', layout=Layout(width='35em'), options=(True, False, None), style=Descriptio…

Button(description='Save', style=ButtonStyle())

### Project budgets are realistic

**Description:**

Check that project budgets are non-zero and less than 5bn USD.

**Methodology:** `automated`

Convert `project.budget` to USD and check against the thresholds.

**Output:**

List of unrealistic budgets.


In [29]:
# @title #### Output

get_output(run_id, 'semantics_budgets').rename(columns={"output": "budget_usd"})

In [30]:
# @title ### Results

get_results(run_id, 'semantics_budgets')

KeyError: 'check'

### Contract values are realistic

**Description:**

Check that contract values are non-zero and less than 5bn USD.

**Methodology:** `automated`

Convert `contractingProcesses/summary/contractValue` to USD and check against the thresholds.

**Output:**

List of unrealistic contract values.


In [27]:
# @title #### Output

get_output(run_id, 'semantics_contract_values').rename(columns={"output": "contract_value_usd"})

Unnamed: 0,source_id,project_id,contract_value_usd
0,malawi_cost_malawi,oc4ids-iuq5r5_449,0.0
1,uganda_gpp,oc4ids-o8h2mh-1668587026-128,0.0


In [31]:
# @title ### Results

get_results(run_id, 'semantics_contract_values')

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
check,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Check: Contract values are realistic,True,True,False,True,True,False


### Funder names are realistic

**Description:**

Check that a sample of funder names are realistic e.g. they are government agencies, donors or multi-lateral financial institutions, rather than private businesses.

**Methodology:** `manual`

Review the output and check that names are realistic.

**Output:**

Sample of `parties/name` values.


In [32]:
# @title #### Output
get_output(run_id, 'semantics_funder_names')

Unnamed: 0,source_id,project_id,output
0,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-66_Construction_of_Proposed_Two_...,District Assemblies Common Fund
1,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-1_Upgrading_of_Kokompe_Light-Ind...,Agence Française de\nDéveloppement
2,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-26_Construction_Of_Out-Patient_D...,District Assemblies Common Fund
3,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-84_Construction_Of_1no._3unit_Cl...,DACF - Responsiveness Factor Grant
4,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-76_Rehabilitation_of_Municipal_H...,DACF - Responsiveness Factor Grant
5,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-34_Construction_of_1No._3-unit_c...,DACF - Responsiveness Factor Grant
6,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-29_Reconstruction_Of_1no._Open_M...,DACF - Responsiveness Factor Grant
7,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-49_Construction_of_1No._2Unit_cl...,DACF - Responsiveness Factor Grant
8,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-47_Construction_&_Furnishing_Of_...,DACF - Responsiveness Factor Grant
9,ghana_cost_sekondi_takoradi,oc4ids-o2imm9-65_Rehabilitation_Of_12-Unit_Cla...,District Assemblies Common Fund


In [33]:
# @title #### Results

display_result_widgets('semantics_public_funder_names')

Dropdown(description='ghana_cost_sekondi_takoradi:', layout=Layout(width='35em'), options=(True, False, None),…

Dropdown(description='indonesia_cost_west_lombok:', layout=Layout(width='35em'), options=(True, False, None), …

Dropdown(description='malawi_cost_malawi:', layout=Layout(width='35em'), options=(True, False, None), style=De…

Dropdown(description='mexico_cost_jalisco:', layout=Layout(width='35em'), options=(True, False, None), style=D…

Dropdown(description='mexico_nuevo_leon:', layout=Layout(width='35em'), options=(True, False, None), style=Des…

Dropdown(description='uganda_gpp:', layout=Layout(width='35em'), options=(True, False, None), style=Descriptio…

Button(description='Save', style=ButtonStyle())

### Dates are realistic

**Description:**

Check that dates are after 1st January 1970 and before 1st January 2050.

**Methodology:** `automated`

Check the following dates against the thresholds:

* `updated`
* `period/startDate`
* `period/endDate`
* `completion/endDate`

**Output:**

List of unrealistic dates.

In [34]:
# @title #### Output

get_output(run_id, 'semantics_dates')

Unnamed: 0,source_id,project_id,output
0,indonesia_cost_west_lombok,oc4ids-jj5f2u-c04efa80-fe43-11ed-8cb6-c5065064...,1970-01-01T08:00:00+08:00
1,indonesia_cost_west_lombok,oc4ids-jj5f2u-00058770-fe4d-11ed-a83d-1b7c3287...,1970-01-01T08:00:00+08:00
2,indonesia_cost_west_lombok,oc4ids-jj5f2u-000a4240-fd6a-11ed-8449-67186a4a...,1970-01-01T08:00:00+08:00
3,indonesia_cost_west_lombok,oc4ids-jj5f2u-0010a500-fe2a-11ed-93bd-0faab4a7...,1970-01-01T08:00:00+08:00
4,indonesia_cost_west_lombok,oc4ids-jj5f2u-001e1310-2237-11ef-b559-51910036...,1970-01-01T08:00:00+08:00
...,...,...,...
1867,indonesia_cost_west_lombok,oc4ids-jj5f2u-ff00da50-fc7f-11ed-8201-3d41c4be...,1970-01-01T08:00:00+08:00
1868,indonesia_cost_west_lombok,oc4ids-jj5f2u-ff867260-fe1c-11ed-86fa-217e216b...,1970-01-01T08:00:00+08:00
1869,malawi_cost_malawi,oc4ids-iuq5r5_82,1922-06-18T00:00:00Z
1870,malawi_cost_malawi,oc4ids-iuq5r5_597,0202-01-01T00:00:00Z


In [35]:
# @title #### Results

get_results(run_id, 'semantics_dates')

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
check,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Check: Dates are realistic,True,False,False,False,True,True


### Roles are set

**Description:**

Check that organization `.roles` are set according to their references.

**Methodology:** `automated`

Check that:

* The organization referenced in `publicAuthority` has 'publicAuthority' in `.roles`.
* The organizations referenced in `budget/sourceParty` have 'sourceParty' in `.roles`.
* The organizations referenced in `contractingProcesses/summary/tender/tenderers` have 'tenderer' in `.roles`.
* The organization referenced in `contractingProcesses/summary/tender/procuringEntity` has 'procuringEntity' in `.roles`.
* The organization referenced in `contractingProcesses/summary/tender/administrativeEntity` has 'administrativeEntity' in `.roles`.
* The organizations referenced in `contractingProcesses/summary/suppliers` have 'supplier' in `.roles`.

**Output:**

List of missing roles.

In [36]:
# @title ### Output

get_output(run_id, 'semantics_role_coherence')

Unnamed: 0,source_id,project_id,output
0,indonesia_cost_west_lombok,oc4ids-jj5f2u-0aa03690-c14c-11ed-9c82-91310aed...,administrativeEntity
1,indonesia_cost_west_lombok,oc4ids-jj5f2u-11423830-62d2-11ec-9d91-3f1f1f18...,administrativeEntity
2,indonesia_cost_west_lombok,oc4ids-jj5f2u-41ed6970-639a-11ec-ad71-af01031f...,administrativeEntity
3,indonesia_cost_west_lombok,oc4ids-jj5f2u-49628ea0-c163-11ed-abd5-c969b33d...,administrativeEntity
4,indonesia_cost_west_lombok,oc4ids-jj5f2u-4d16d940-c14d-11ed-948e-df17dcc0...,administrativeEntity
...,...,...,...
253,uganda_gpp,oc4ids-o8h2mh-1667464372-113,supplier
254,uganda_gpp,oc4ids-o8h2mh-1668170130-115,supplier
255,uganda_gpp,oc4ids-o8h2mh-1668587026-128,supplier
256,uganda_gpp,oc4ids-o8h2mh-1690965224-143,supplier


In [37]:
# @title ### Results

get_results(run_id, 'semantics_role_coherence')

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
check,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Check: Roles are set,True,False,False,True,True,False


### Coordinates are valid

**Description:**

Check that project location coordinates are valid.

**Methodology:** `automated`

Check that `locations/geometry/coordinates` are in the range of [-90, 90] for latitudes and [-180, 180] for longitudes.

**Output:**

List of invalid coordinates.

In [38]:
# @title ### Output

get_output(run_id, 'semantics_coordinates')

Unnamed: 0,source_id,project_id,output
0,indonesia_cost_west_lombok,oc4ids-jj5f2u-02f12100-c14f-11ed-96a0-7b4f4f6c...,"[6378137, 6356752]"
1,indonesia_cost_west_lombok,oc4ids-jj5f2u-0400b820-6387-11ec-a9af-d39c9a52...,"[6378137, 6356752]"
2,indonesia_cost_west_lombok,oc4ids-jj5f2u-06eb0bf0-fd30-11ed-8dc5-ab58fb26...,"[6378137, 6356752]"
3,indonesia_cost_west_lombok,oc4ids-jj5f2u-07189ad0-fd32-11ed-a195-d3ad3632...,"[6378137, 6356752]"
4,indonesia_cost_west_lombok,oc4ids-jj5f2u-07394ca0-fd2a-11ed-a8a3-53bca292...,"[6378137, 6356752]"
5,indonesia_cost_west_lombok,oc4ids-jj5f2u-091c6c40-c14f-11ed-97c4-7f94450a...,"[6378137, 6356752]"
6,indonesia_cost_west_lombok,oc4ids-jj5f2u-226de4c0-fe59-11ed-862d-5b339e76...,"[6378137, 6356752]"
7,indonesia_cost_west_lombok,oc4ids-jj5f2u-23340fb0-6754-11eb-8891-db8af83c...,"[6378137, 6356752]"
8,indonesia_cost_west_lombok,oc4ids-jj5f2u-252a6db0-fd2c-11ed-8a91-2d5056f1...,"[6378137, 6356752]"
9,indonesia_cost_west_lombok,oc4ids-jj5f2u-270acbf0-fd31-11ed-a10d-6be05ac4...,"[6378137, 6356752]"


In [39]:
# @title ### Results

get_results(run_id, 'semantics_coordinates')

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
check,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Check: Coordinates are valid,True,False,False,True,True,True


## Metrics

This section provides measurements related to the criteria and checks. There are no judgements associated with these measurements, rather they provide additional context to the pass/fail criteria and checks.

### New project count

**Description:**

A count of projects added since the previous report.

**Methodology:** `automated`

Identify projects added since the previous report by comparing project identifiers (`id`).

In [40]:
# @title Output

get_metric_output(run_id, 'metrics_new_projects')

Unnamed: 0,source_id,count
0,ghana_cost_sekondi_takoradi,54
1,indonesia_cost_west_lombok,2133
2,malawi_cost_malawi,488
3,mexico_cost_jalisco,66
4,mexico_nuevo_leon,23
5,uganda_gpp,10


### Last updated date

**Description:**

The last updated date of the most recently updated project.


**Methodology:** `automated`

The maximum `date` amongst the projects in the dataset.

In [41]:
# @title #### Output

get_metric_output(run_id, 'metrics_last_updated')

Unnamed: 0,source_id,count
0,ghana_cost_sekondi_takoradi,
1,indonesia_cost_west_lombok,2024-07-03
2,malawi_cost_malawi,2024-05-30
3,mexico_cost_jalisco,2024-06-14
4,mexico_nuevo_leon,2023-12-28
5,uganda_gpp,2023-08-02


### Earliest project start date

**Description:**

The earliest project start date.

**Methodology:** `automated`

The minimum `period/startDate` amongst the projects in the dataset.


In [42]:
# @title #### Output

get_metric_output(run_id, 'metrics_earliest_start_date')

Unnamed: 0,source_id,count
0,ghana_cost_sekondi_takoradi,2015-01-05
1,indonesia_cost_west_lombok,0020-02-13
2,malawi_cost_malawi,0202-01-01
3,mexico_cost_jalisco,
4,mexico_nuevo_leon,2012-01-30
5,uganda_gpp,2019-06-30


### Latest project end date

**Description:**

The latest project end date.

**Methodology:** `automated`

The maximum `period/endDate` amongst the projects in the dataset.

In [43]:
# @title #### Output

get_metric_output(run_id, 'metrics_latest_end_date')

Unnamed: 0,source_id,count
0,ghana_cost_sekondi_takoradi,2023-08-31
1,indonesia_cost_west_lombok,2024-08-31
2,malawi_cost_malawi,2033-03-31
3,mexico_cost_jalisco,
4,mexico_nuevo_leon,2027-07-30
5,uganda_gpp,2025-12-30


### Additional field count

**Description:**

A count of non-OC4IDS fields in the dataset.


**Methodology:** `automated`

Use libcoveoc4ids to generate a count of additional fields.


In [44]:
# @title #### Output

get_metric_output(run_id, 'metrics_additional_field_count')

Unnamed: 0,source_id,count
0,ghana_cost_sekondi_takoradi,12.0
1,indonesia_cost_west_lombok,22.0
2,malawi_cost_malawi,14.0
3,mexico_cost_jalisco,7.0
4,mexico_nuevo_leon,22.0
5,uganda_gpp,


### Project count

**Description:**

A count of projects in the dataset.

**Methodology:** `automated`

Count the projects in the dataset.

In [45]:
# @title #### Output

get_metric_output(run_id, 'metrics_project_count')

Unnamed: 0,source_id,count
0,ghana_cost_sekondi_takoradi,54
1,indonesia_cost_west_lombok,2165
2,malawi_cost_malawi,632
3,mexico_cost_jalisco,635
4,mexico_nuevo_leon,254
5,uganda_gpp,50


### Validation error count

**Description:**

A count of the validation errors reported by the OC4IDS data review tool.

**Methodology:** `automated`

Count the types of validation error reported by libcoveoc4ids, not the number of occurrences of each error type.


In [46]:
# @title #### Output

get_metric_output(run_id, 'metrics_validation_error_count')

Unnamed: 0,source_id,count
0,ghana_cost_sekondi_takoradi,2.0
1,indonesia_cost_west_lombok,5.0
2,malawi_cost_malawi,1.0
3,mexico_cost_jalisco,4.0
4,mexico_nuevo_leon,5.0
5,uganda_gpp,


### Structure warning count

**Description:**

A count of the structure warnings reported by the OC4IDS data review tool.

**Methodology:** `automated`

Count the structure warnings reported by libcoveoc4ids, not the number of occurrences of each structure warning

In [47]:
# @title #### Output

get_metric_output(run_id, 'metrics_structure_warning_count')

Unnamed: 0,source_id,count
0,ghana_cost_sekondi_takoradi,
1,indonesia_cost_west_lombok,
2,malawi_cost_malawi,
3,mexico_cost_jalisco,1.0
4,mexico_nuevo_leon,
5,uganda_gpp,


## Coverage

This section measures data coverage against the OC4IDS schema, the core CoST IDS elements and the CoST IDS Sustainability modules. That is, which of the fields in the schema, or CoST IDS elements, are present in each data source.

### OC4IDS

If a field is on an object in an array, then coverage is reported for each object in the array. Example: There are 100 projects, all of which have 5 parties. The check for the `parties` field will be reported out of 100, but the checks for its child fields (like `parties.id`) will be reported out of 500.

Child fields are reported in the context of their parent field. Example: There are 100 projects, 10 of which set `publicAuthority`. The check for the `publicAuthority` field will be reported out of 100, but the checks for its child fields (like `publicAuthority.id`) will be reported out of 10.

In [67]:
collection_ids = %sql select collection_id, source_id from run_collection join collection on run_collection.collection_id = collection.id where run_id = :run_id order by source_id asc;

oc4ids_coverage = pd.DataFrame()

for collection_id in collection_ids['collection_id']:
  oc4ids_coverage = pd.concat([oc4ids_coverage, get_schema_coverage(collection_id)])

oc4ids_coverage = oc4ids_coverage.merge(collection_ids, on="collection_id")

oc4ids_coverage = oc4ids_coverage.pivot(index=['path', 'title', 'required'], columns='source_id', values='coverage')

oc4ids_coverage

Unnamed: 0_level_0,Unnamed: 1_level_0,source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
path,title,required,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
additionalClassifications,Additional classifications,False,0.00,0.00,0.00,0.00,0.00,0.00
additionalClassifications/description,Description,False,0.00,0.00,0.00,0.00,0.00,0.00
additionalClassifications/id,ID,False,0.00,0.00,0.00,0.00,0.00,0.00
additionalClassifications/scheme,Scheme,False,0.00,0.00,0.00,0.00,0.00,0.00
additionalClassifications/uri,URI,False,0.00,0.00,0.00,0.00,0.00,0.00
...,...,...,...,...,...,...,...,...
sector,Project sector,False,1.00,0.69,0.83,1.00,1.00,1.00
status,Status,False,1.00,1.00,1.00,1.00,1.00,1.00
title,Project title,False,1.00,1.00,1.00,1.00,1.00,1.00
type,Project type,False,1.00,1.00,1.00,1.00,1.00,0.00


### CoST IDS

Coverage is measured based on how many projects include the required fields for an element. There may be some false positives, since more granular levels of coverage are not measured. For example, how many individual contracting processes within a project include the required fields. See this issue for a detailed explanation.

In [11]:
results = get_indicator_coverage_results(run_id, 'cost_ids')

results = results.pivot(index=['indicator'], columns='source_id', values='coverage')

results.style

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
indicator,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Budget amendment decision,0.0,0.0,0.0522151898734177,0.0,0.0,0.0
Completion date,0.8148148148148148,0.0124711316397228,0.430379746835443,0.9626623376623376,0.6811023622047244,0.52
Contact details,0.6481481481481481,0.0,0.5110759493670886,1.0,1.0,0.0
Contract administrative entity,0.0,0.0,0.0,0.0,0.0,0.0
Contract agreement and conditions,0.0,0.0,0.0,0.0,0.0,0.0
Contract amendments,0.0,0.0,0.0,0.0,0.0,0.0
Contract firm(s),0.0,0.0,0.0,0.0,0.0,0.0
Contract officials and roles,0.0,0.0,0.0,0.0,0.0,0.0
Contract price,0.0,0.0244803695150115,0.370253164556962,0.9983766233766233,0.0,0.74
Contract scope of work,1.0,0.0244803695150115,0.370253164556962,0.0,0.0,0.74


### Sustainability modules

Coverage is measured based on how many projects include the required fields for an element. There may be some false positives, since more granular levels of coverage are not measured. For example, how many individual contracting processes within a project include the required fields. See this issue for a detailed explanation.

In [12]:
results = get_indicator_coverage_results(run_id, 'sustainability_modules')

results = results.pivot(index=['indicator'], columns='source_id', values='coverage')

results.style

source_id,ghana_cost_sekondi_takoradi,indonesia_cost_west_lombok,malawi_cost_malawi,mexico_cost_jalisco,mexico_nuevo_leon,uganda_gpp
indicator,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1.10: Maintenance plan or program,0.0,0.0,0.0,0.0,0.0,0.0
1.11: Asset lifetime,0.0,0.0,0.0,0.0,0.0,0.0
1.1: Procurement strategy,0.0,0.0,0.0,0.0,0.0,0.0
1.2: Life cycle cost,0.0,0.0,0.0,0.0,0.0,0.0
1.3: Life cycle cost calculation methodology,0.0,0.0,0.0,0.0,0.0,0.0
"1.4: Funding source for preparation, implementation and maintenance",0.0,0.0,0.0,0.0,0.0,0.0
"1.5: Budget for preparation, implementation and maintenance",0.0,0.0,0.0,0.0,0.0,0.0
1.6: Cost benefit analysis,0.0,0.0,0.0,0.0,0.0,0.0
1.7: Value for money,0.0,0.0,0.0,0.0,0.0,0.0
1.8: Budget projections,0.0,0.0,0.0,0.0,0.0,0.0


## Appendix 1: Report Setup

In [1]:
# @title ### Install requirements
# @markdown After running this cell, you must restart the session (Ctrl+M+.)
!pip install --upgrade ipython-sql > pip.log
!pip install --upgrade pandas>=2.2

In [2]:
# @title ### Connect to the database
# @markdown ODS users: enter the password for the `readonly` user, from the ODS password database.
import getpass

print('Enter your credentials')
user = 'readonly'
password = getpass.getpass('Password:')

connection_string = 'postgresql://' + user + ':' + password + '@oc4ids-database-2.cuujgua4wses.us-east-1.rds.amazonaws.com/postgres'
%load_ext sql
%sql $connection_string
%config SqlMagic.autopandas = True  # Return Pandas DataFrames instead of regular result sets
%config SqlMagic.displaycon = False  # Don't show connection string after execute
%config SqlMagic.feedback = False  # Don't print number of rows affected by DML


Enter your credentials
Password:··········


In [3]:
# @title Choose a `run_id` to report on
from ipywidgets import interact

def set_run_id(id):
  global run_id
  run_id = id

  global source_ids
  source_ids = %sql select source_id from run_collection join collection on run_collection.collection_id = collection.id where run_id = :run_id order by source_id asc;
  source_ids = source_ids['source_id']

run_ids = %sql select distinct run_id from run_collection order by run_id desc;

interact(set_run_id, id=run_ids['run_id']);

interactive(children=(Dropdown(description='id', options=('2024-08-02 06:29:25.083245', '2024-08-01 10:55:54.6…

In [4]:
# @title Setup notebook environment

# https://colab.research.google.com/notebooks/data_table.ipynb
%load_ext google.colab.data_table
from google.colab.data_table import DataTable
DataTable.max_columns = 50 # Increase max columns so that dataframes with many columns are rendered as data tables
DataTable.include_index = False # Remove the index from data tables for easier copy-pasting to Google Docs
DataTable.num_rows_per_page = 10

import functools
import ipywidgets
import pandas as pd

from IPython.display import display

manual_checks = {}

In [5]:
# @title ### Define functions

def get_results(run_id = run_id, check_id = None, extra_results = None):

  query = f"""

  select
    case
      when check_id = 'criteria_registered' then 'Criteria: Registered'
      when check_id = 'criteria_discoverable' then 'Criteria: Discoverable'
      when check_id = 'criteria_retrievable' then 'Criteria: Retrievable'
      when check_id = 'criteria_reviewable' then 'Criteria: Reviewable'
      when check_id = 'criteria_appropriate' then 'Criteria: Appropriate'
      when check_id = 'criteria_active' then 'Criteria: Active'
      when check_id = 'criteria_documented' then 'Criteria: Documented'
      when check_id = 'criteria_accessible' then 'Criteria: Accessible'
      when check_id = 'criteria_valid' then 'Criteria: Valid'
      when check_id = 'criteria_conformant' then 'Criteria: Conformant'
      when check_id = 'semantics_sector_codelist' then 'Check: Sectors are standardised'
      when check_id = 'semantics_public_authority_names' then 'Check: Public authority names are realistic'
      when check_id = 'semantics_supplier_names' then 'Check: Supplier names are realistic'
      when check_id = 'semantics_budgets' then 'Check: Project budgets are realistic'
      when check_id = 'semantics_contract_values' then 'Check: Contract values are realistic'
      when check_id = 'semantics_funder_names' then 'Check: Funder names are realistic'
      when check_id = 'semantics_dates' then 'Check: Dates are realistic'
      when check_id = 'semantics_role_coherence' then 'Check: Roles are set'
      when check_id = 'semantics_coordinates' then 'Check: Coordinates are valid'
      else check_id
    end as check,
    source_id,
    result
  from
    check_results
  join collection on
    collection_id = collection.id
  where
    run_id = '{run_id}'
    and (left(check_id, 8) = 'criteria'
      or left(check_id, 9) = 'semantics')
    {f"and check_id = '{check_id}'" if check_id else ""}
  order by
    array_position(array[
    'critiera_registered',
    'critiera_discoverable',
    'critiera_retrievable',
    'critiera_reviewable',
    'critiera_appropriate',
    'critiera_active',
    'critiera_documented',
    'critiera_accessible',
    'critiera_valid',
    'critiera_conformant',
    'semantics_sector_codelist',
    'semantics_public_authority_names',
    'semantics_supplier_names',
    'semantics_budgets',
    'semantics_contract_values',
    'semantics_funder_names',
    'semantics_dates',
    'semantics_role_coherence',
    'semantics_coordinates'],
    check_id) asc,
    source_id asc;

  """

  results = %sql {query}

  if extra_results is not None:
    for check, source in extra_results.items():
      for source_id, result in source.items():
        results = results._append(pd.DataFrame([{'check': check, 'source_id': source_id, 'result': result}]))

  results = results.pivot(index=['check'], columns='source_id', values='result')

  styler = results.style

  return styler.map(lambda x: 'background-color:rgba(0, 255, 0, 0.25);' if x == True else ('background-color:rgba(255, 0, 0, 0.25);' if x == False else 'background-color:rgba(100, 100, 100, 0.25);'))

def get_output(run_id, check_id):

  query = f"""

  select
    source_id,
    key as project_id,
    value as output
  from
    check_results
  cross join
    jsonb_each(output)
  join collection on
    collection_id = collection.id
  where
    run_id = '{run_id}'
  and
    check_id = '{check_id}'
  order by
    check_id, source_id;

  """

  output = %sql {query}

  return output

def get_metric_output(run_id, check_id):

  query = f"""

  select
    source_id,
    coalesce(output->'count', output->'date') as count
  from check_results
  join collection on
    collection_id = collection.id
  where
    run_id = '{run_id}'
  and
    check_id = '{check_id}'
  order by
    check_id, source_id;

  """

  output = %sql {query}

  return output

def save_results(b, check_id, widgets):
  global manual_checks

  results = {source_id: widget.value for source_id, widget in widgets.items()}

  manual_checks[check_id] = results

def display_result_widgets(check_id):
  global source_ids

  widgets = {}

  description_length = max([len(source_id) for source_id in source_ids])

  for source_id in source_ids:

    widgets[source_id] = ipywidgets.Dropdown(
      options=[True, False, None],
      value=None,
      description=f'{source_id}:',
      disabled=False,
      layout={'width': '35em'},
      style={'description_width': f'{description_length}em'}
  )

  button = ipywidgets.Button(description="Save")

  for widget in widgets.values():
    display(widget)

  display(button)

  button.on_click(functools.partial(save_results, check_id = check_id, widgets = widgets))

def get_schema_coverage(collection_id):

  query = """

    WITH project_count AS (
    SELECT
      count(*)::NUMERIC
    FROM
      projects
    WHERE
      collection_id = :collection_id ),
    field_counts_filtered AS (
    SELECT
      *
    FROM
      field_counts
    WHERE
      field_counts.collection_id = :collection_id
    )
    SELECT DISTINCT ON (oc4ids_schema.path)
      :collection_id as collection_id,
      oc4ids_schema.path,
      title,
      CASE
        WHEN substring(RANGE FROM 1 FOR 1)::int = 1 THEN TRUE
        ELSE FALSE
      END AS required,
      COALESCE(CASE
        WHEN array_length(field_counts.path_array,
        1) = 1 THEN round(field_counts.object_property::NUMERIC / (
        SELECT
          *
        FROM
          project_count),
        2)
        ELSE
        CASE
          WHEN parent_field_counts.array_count = 0 THEN round(field_counts.object_property::NUMERIC / parent_field_counts.object_property::NUMERIC,
          2)
          ELSE round(field_counts.object_property::NUMERIC / parent_field_counts.array_count::NUMERIC,
          2)
        END
      END, 0.00) AS coverage
    FROM
      oc4ids_schema
    LEFT JOIN field_counts_filtered AS field_counts ON
      oc4ids_schema.path = field_counts.path
    LEFT JOIN field_counts_filtered AS parent_field_counts ON
      array_to_string(field_counts.path_array[1:array_length(field_counts.path_array,
      1)-1],
      '/') = parent_field_counts.PATH;

  """

  results = %sql {query}

  return results

def get_indicator_coverage_results(run_id, indicator_source):

  query = """

    select
      collection_id,
      source_id,
      indicator,
      successes/checks as coverage
    from
      indicator_coverage
    join
      collection on collection_id = collection.id
    where
      indicator_source = :indicator_source
    and
      run_id = :run_id;

  """

  results = %sql {query}

  return results