## Check data quality

### Local extensions and additional fields and codes

[Conforming publications](https://standard.open-contracting.org/latest/en/schema/conformance_and_extensions/#publication-conformance) must not use terms from outside the OCDS schema where the OCDS schema's terms would suffice.

Use this section to identify fields in local extensions and additional fields and codes that should be mapped to fields and codes in the OCDS schema and extensions.

You can use the following resources to find fields and codes with similar semantics:

* [OCDS Schema and Codelist Reference](https://standard.open-contracting.org/latest/en/schema/), for fields in the core OCDS schema.
* [OCDS Extensions Field and Code Search](https://open-contracting.github.io/editor-tools/), for fields and codes in OCDS extensions.
* [Github Issue Tracker](https://github.com/open-contracting/standard/issues), for discussions about adding new fields and codes.

Check that field and code names [conform to the style guide](https://ocds-standard-development-handbook.readthedocs.io/en/latest/meta/schema_style_guide.html#field-and-code-names) and report any issues to the publisher.

If you cannot find a suitable mapping for an additional field or code, open a Github issue to describe the sematics of the field or code and to discuss how to model it. Report any issues to the publisher.


#### Local extensions

For each field and code in extensions authored by the publisher, in addition to the above checks, consider whether to [review the extension in detail](https://docs.google.com/document/d/1bRhVVkuTPXw6acE2opKD-Yj80y0lvAsfAqDjG632nNY/).

Generate a list of extensions declared in the package metadata:

In [None]:
%%sql

select 
  collection_id,
  release_type,
  jsonb_array_elements(package_data -> 'extensions') as extension,
  count(*) as count
from
  release_summary
where
  collection_id in :collection_ids
AND
  package_data is not null
GROUP BY
  collection_id,
  release_type,
  extension
ORDER BY
  collection_id,
  release_type,
  count DESC;

#### Additional fields

Note that the DRT also reports additional fields in the following scenarios.

* Fields from undeclared extensions
* Fields with language variations, e.g. `title_es`. You do not need to report language variations to the publisher, but you should check that the field [conforms to the rules for language variations](https://standard.open-contracting.org/latest/en/schema/reference/#language).
* OCDS 1.0 data with fields from extensions. You should report the fields to the publisher and recommend that they upgrade to OCDS 1.1.
* Data with no package, or where a version is not declared, which is checked against both the 1.0 and 1.1 schemas. You should filter the `schema_version` to get additional fields against the latest version of the schema and report them to the publisher.
* Fields from extensions that patch multiple schemas ([issue](https://github.com/OpenDataServices/cove/issues/1132)). If additional fields are reported in the package schema, you should check whether any extensions have multiple schemas before reporting them to the publisher.

Generate a list of additional fields reported by the DRT:

In [None]:
%%sql

WITH check_results AS (
    SELECT
      *,
      CASE WHEN (release_type IN ('record', 'embedded_release')) THEN record_check ELSE release_check END as results
    FROM
      release_summary
    WHERE
      collection_id IN :collection_ids
    AND
      release_type <> 'compiled_release'
),
counts AS (
  SELECT
    collection_id,
    release_type,
    additional_fields ->> 'path' AS path,
    additional_fields ->> 'field_name' AS field,
    SUM(CAST(additional_fields ->> 'count' AS int)) AS count,
    results -> 'schema_url' AS schema_version
  FROM
    check_results
  CROSS JOIN
    jsonb_array_elements(results -> 'all_additional_fields') AS additional_fields
  GROUP BY
    collection_id,
    release_type,
    schema_version,
    field,
    path
  ORDER BY
    schema_version,
    path,
    count DESC
), examples AS (
  SELECT DISTINCT ON (collection_id, release_type, results -> 'schema_url', additional_fields ->> 'path', additional_fields ->> 'field_name')
    collection_id,
    release_type,
    results -> 'schema_url' AS schema_version,
    additional_fields ->> 'path' AS path,
    additional_fields ->> 'field_name' AS field,
    additional_fields ->> 'examples' AS examples
  FROM
    check_results
  CROSS JOIN
    jsonb_array_elements(results -> 'all_additional_fields') AS additional_fields
  WHERE
    additional_fields ->> 'examples' <> '[]'
  AND
    additional_fields ->> 'examples' IS NOT NULL
)
SELECT
  counts.collection_id,
  counts.release_type,
  counts.schema_version,
  counts.path path,
  counts.field field,
  count,
  examples examples
FROM
  counts
LEFT JOIN
  examples
USING (collection_id, release_type, schema_version, path, field)
ORDER BY
  counts.schema_version,
  path,
  field;

##### **Additional field examples**

Use the query in the following cell to generate a release package containing an example release for each additional field.

In [None]:
query = """
WITH additional_field_releases AS (
  SELECT
    ocid as ocid,
    release.release_id as release_id,
    data_id as data_id,
    additional_fields->>'path' AS path,
    additional_fields->>'field_name' AS field
  FROM
    release_check
  CROSS JOIN
    jsonb_array_elements(cove_output->'all_additional_fields') AS additional_fields
  JOIN
    release ON release_check.release_id = release.id
  WHERE
    collection_id IN :collection_ids
), additional_fields as(
  SELECT DISTINCT
    path,
    field
  FROM
    additional_field_releases
), examples AS (
  SELECT DISTINCT ON (additional_fields.path, additional_fields.field)
    additional_fields.path,
    additional_fields.field,
    ocid,
    release_id,
    data_id,
    data
  FROM
    additional_fields
  INNER JOIN
    additional_field_releases
  ON
    additional_fields.path = additional_field_releases.path AND additional_fields.field = additional_field_releases.field
  JOIN
    data ON data.id = data_id
  ORDER BY
    additional_fields.path,
    additional_fields.field
)
SELECT
  jsonb_build_object('releases', jsonb_agg(data)) release_package
FROM
  examples
"""

results = %sql {query}

render_json(results['release_package'][0])

#### Additional codes

List additional open codelist values reported by the DRT:

In [None]:
%%sql

WITH check_results AS (
    SELECT
      *,
      CASE WHEN (release_type IN ('record', 'embedded_release')) THEN record_check ELSE release_check END as results
    FROM
      release_summary
    WHERE
      collection_id IN :collection_ids
    AND
      release_type <> 'compiled_release'
)
SELECT
  collection_id,
  release_type,
  additional_open_codelist_values.value -> 'codelist' codelist,
  codes.value code,
  count(*) occurrences
FROM
  check_results
CROSS JOIN
  jsonb_each(results -> 'additional_open_codelist_values') additional_open_codelist_values
CROSS JOIN
  jsonb_array_elements(value -> 'values') codes
GROUP BY
  collection_id,
  release_type,
  codelist,
  code
ORDER BY
  collection_id,
  release_type,
  codelist,
  count(*) DESC


### Conformance

#### Deprecated fields

Before a field or codelist is removed from the standard, it is first marked as [deprecated](https://standard.open-contracting.org/latest/en/governance/deprecation/#deprecation).

Use this section to check for deprecated fields.

Generate a list of deprecated fields:

In [None]:
%%sql

SELECT DISTINCT ON (
  collection_id,
  path,
  deprecated_version,
  explanation
)
  collection_id,
  regexp_replace(TRIM('"' from paths::text), '\/[0-9]+', '', 'g') || '/' || (deprecated_fields ->> 'field') as path,
  deprecated_fields -> 'explanation' -> 0 as deprecated_version,
  deprecated_fields -> 'explanation' -> 1 as explanation,
  ocid as example_ocid
FROM
  release_check
CROSS JOIN
  jsonb_array_elements(cove_output -> 'deprecated_fields') AS deprecated_fields
CROSS JOIN
  jsonb_array_elements(deprecated_fields -> 'paths') as paths
JOIN
  release on release_check.release_id = release.id
WHERE
  collection_id in :collection_ids;


#### Metadata

##### **Package metadata**

OCDS data must be published within either a [release package](https://standard.open-contracting.org/latest/en/schema/reference/#package-metadata) or a [record package](https://standard.open-contracting.org/latest/en/schema/records_reference/#package-metadata).

Use this section to check that the values in the package metadata conform to the descriptions in the schema.

Look out for the following issues and report them to the publisher:

* Placeholder values
* Empty strings and objects
* Discrepancies in the package metadata between different releases 

Generate a summary of the package metadata values used in each collection:

In [None]:
%%sql

SELECT
	collection_id,
	release_type,
	package_data-> 'version' AS ocds_version,
	package_data-> 'publisher' -> 'name' AS publisher_name,
	package_data-> 'publisher' -> 'name' -> 'scheme' AS publisher_scheme,
	package_data-> 'publisher' -> 'name' -> 'uid' AS publisher_uid,
	package_data-> 'publisher' -> 'name' -> 'uri' AS publisher_uri,
	package_data-> 'license' AS license,
	package_data-> 'publicationPolicy' AS publicationPolicy,
	count(*)
FROM
	release_summary
WHERE
  collection_id in :collection_ids
AND
  release_type != 'compiled_release'
GROUP BY
	collection_id,
	release_type,
	publisher_name,
	publisher_scheme,
	publisher_uid,
	publisher_uri,
	license,
	publicationPolicy,
	ocds_version;

##### **Release tags**

> Releases must be tagged with one or more values from the [release tag codelist](https://standard.open-contracting.org/latest/en/schema/codelists/#release-tag).Tags may be used to filter releases and to understand the kind of information that a release might contain.

Use this section to check that release tags reflect the data included in each release.

Read the descriptions in the codelist to understand which sections can be provided for each tag.

Remember that releases can repeat information from previous releases.

Generate a summary of the sections published for each release tag.

Note that this check only counts whether the section exists, not whether it contains any fields or objects, so the results may include empty objects (e.g. `planning`) and arrays (e.g. `awards`).

In [None]:
%%sql release_tag_section_summary <<

WITH IMPLEMENTATION AS (
SELECT
	cs.collection_id,
	cs.release_type,
	release_tag,
	COUNT(contract -> 'implementation') AS IMPLEMENTATION
FROM
	contracts_summary cs
LEFT JOIN release_summary
		USING (id)
GROUP BY
	cs.collection_id,
	cs.release_type,
	release_tag ),
sections AS (
SELECT
	collection_id,
	release_type,
	release_tag,
	count(*) AS release_count,
	COUNT(RELEASE -> 'planning') AS planning,
	COUNT(RELEASE -> 'tender') AS tender,
	COUNT(RELEASE -> 'awards') AS award,
	COUNT(RELEASE -> 'contracts') AS contract
FROM
	release_summary
GROUP BY
	collection_id,
	release_type,
	release_tag )
SELECT
	collection_id,
	release_type,
	sections.release_tag,
	release_count,
	planning,
	tender,
	award,
	contract,
	IMPLEMENTATION
FROM
	sections
LEFT JOIN IMPLEMENTATION
		USING (collection_id,
	release_type,
	release_tag);


In [None]:
release_tag_section_summary

In [None]:
save_dataframe_to_sheet(release_tag_section_summary, 'release_tags')

##### **Release date**

Use this section to check that all releases do not share the same date.

Generate a count of releases by date:

In [None]:
%%sql

select
  collection_id,
  release_type,
  release_date,
  count(*) as release_count
from
  release_summary
group by
  collection_id,
  release_type,
  release_date
order by
  collection_id,
  release_type,
  release_count desc;

##### **Language**

> The default language of the data using either two-letter [ISO639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes), or extended [BCP47 language tags](http://www.w3.org/International/articles/language-tags/). The use of lowercase two-letter codes from [ISO639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) is recommended.

Use this section to check that the code declared in `language` reflects the language used in free-text fields in the data.

Generate a list of language codes used in the data, with an example release for each language.

In [None]:
%%sql

select distinct on (collection_id, release_type, release_language)
  collection_id,
  release_type,
  release_language,
  release as example_release
from
  release_summary
order by
  collection_id,
  release_type,
  release_language,
  random();

#### Change history

OCDS supports the publication of a change history, using [releases and records](https://standard.open-contracting.org/latest/en/getting_started/releases_and_records/#releases-and-records).

Fully implemented, releases and records can be used to publish the following for each contracting process:

* Multiple OCDS releases, one for each change or update to the contracting process
* A single OCDS record, containing:
  * `releases` - an index of releases for the contracting process
  * optionally, a `compiledRelease` - the latest version of the data about the contracting process
  * optionally, a `versionedRelease` - a history of changes for each field

However, many publishers use the ['easy releases'](https://standard.open-contracting.org/latest/en/guidance/build/easy_releases/) approach: publish a single release per contracting process with the latest version of the data about the contracting process.

Use this section to understand the approach used by the publisher.

##### **Multiple releases per contracting process**

Use this section to:

* check if there are multiple releases per contracting process
* check the distribution of releases per contracting process
* examine examples of contracting processes with multiple releases

Generate statistics on the minimum, maximum, average and standard deviation of releases per contracting process.

In [None]:
%%sql

WITH release_counts AS (
SELECT
	collection_id,
	release_type,
	ocid,
	count(*) AS release_count
FROM
	release_summary rs
WHERE
  collection_id in :collection_ids
GROUP BY
	collection_id ,
	release_type,
	ocid )
SELECT
	collection_id,
	release_type,
	MIN(release_count) AS min_releases_per_ocid,
	MAX(release_count) AS max_releases_per_ocid,
	ROUND(AVG(release_count), 2) AS avg_releases_per_ocid,
	ROUND(STDDEV(release_count), 2) AS sd_releases_per_ocid
FROM
	release_counts
GROUP BY
	collection_id,
	release_type;

Generate a summary of the release count per contracting process:

In [None]:
%%sql release_count_summary <<

WITH release_counts AS (
SELECT
	collection_id,
	release_type,
	ocid,
	count(*) AS release_count
FROM
	release_summary rs
WHERE
  collection_id IN :collection_ids
AND
  release_type IN ('release', 'embedded_release')
GROUP BY
	collection_id ,
	release_type,
	ocid )
SELECT
	collection_id,
	release_type,
	release_count,
	count(*) AS contracting_processes
FROM
	release_counts
GROUP BY
	collection_id,
	release_type,
	release_count;

In [None]:
release_count_summary

Plot the distribution of releases per contracting process:

In [None]:
%%sql release_counts <<

WITH release_counts AS (
  SELECT
    collection_id,
    release_type,
    ocid,
    count(*) AS release_count
  FROM
    release_summary rs
  WHERE
    collection_id in :collection_ids
  AND
    release_type IN ('release', 'embedded_release')
  GROUP BY
    collection_id ,
    release_type,
    ocid
)
SELECT
  collection_id,
  release_type,
  release_count,
  count(*) as ocid_count
FROM
  release_counts
GROUP BY
  collection_id,
  release_type,
  release_count;


In [None]:
release_count_chart = sns.catplot(x="release_count", y="ocid_count", kind="bar", col="collection_id", hue="release_type", data=release_counts).set_xticklabels(rotation=90)

for ax in release_count_chart.axes.flat:
  format_thousands(ax.yaxis)

plt.show(release_count_chart)

Generate a spreadsheet containing the top 5 contracting processes with the most releases. 

Specific things to check include:

* Does the `date` field differ between releases?
* Does the `tag` field differ between releases?

Also check for differences in which fields are provided for each release and for differences in the values of fields.


In [None]:
%%sql multiple_release_examples <<

WITH ranked_ocids AS (
  SELECT
      collection_id,
      release_type,
      ocid,
      count(*),
      row_number() OVER (PARTITION BY collection_id, release_type ORDER BY count(*) DESC) as row_number
    FROM
      release_summary
    WHERE
      collection_id IN :collection_ids
    AND
      release_type in ('release', 'embedded_release')
    GROUP BY
      collection_id,
      release_type,
      ocid
)

SELECT
  jsonb_build_object('releases', jsonb_agg(release)) as release_package
FROM
  release_summary
WHERE
  collection_id in :collection_ids
AND
  release_type in ('release', 'embedded_release')
AND
  ocid in
    (SELECT ocid FROM ranked_ocids where row_number <=5);

In [None]:
save_dataframe_to_spreadsheet(multiple_release_examples, '{}_{}_multiple_releases'.format(source_id, '-'.join([str(id) for id in collection_ids])))

##### **Release ID**

The release identifer must be updated when the information about a contracting process changes.

A common error is to set the release ID to the same value as the `ocid`, or to set it to a subset of the `ocid`, and to neglect to update it.

Use this section to check that the release ID differs from the `ocid`.

Generate a list of releases where `id` and `ocid` share the same value.

In [None]:
%%sql

SELECT
  collection_id,
  release_type,
  ocid,
  release_id
from
  release_summary
where
  collection_id in :collection_ids
and
  (
    ocid = release_id
      OR
    ocid ilike '%%' || release_id || '%%'
  )

#### Overfill

In a whole dataset, we expect there to be some differences between the values, items and dates listed in the tender, award and contract sections of OCDS.

In an effort to publish as many field as possible, publishers sometimes ignore semantics and map one field from their data source to several fields in OCDS, known as overfill.

Use this section to identify instances of overfill.

##### **Awards and contracts**

Use this section to check if there are any differences the following fields in the award and contract sections:

* `awards/date` and `contracts/dateSigned`
* `awards/value` and `contracts/value`
* `awards/items` and `contracts/items`
* `awards/contractPeriod` and `contracts/period`
* `award/documents` and `contracts/documents`

In [None]:
%%sql

SELECT
  contracts_summary.collection_id,
  contracts_summary.release_type,
  CASE WHEN awards_summary.award_date = contracts_summary.dateSigned THEN true ELSE false END AS date_match,
  CASE WHEN
    (awards_summary.award_value_amount = contracts_summary.contract_value_amount)
  AND
    (awards_summary.award_value_currency = contracts_summary.contract_value_currency)
  THEN
    true
  ELSE
    false
  END AS value_match,
  CASE WHEN
    (awards_summary.award_contractperiod_startDate = contracts_summary.contract_period_startDate)
  AND
    (awards_summary.award_contractperiod_endDate = contracts_summary.contract_period_startDate)
  THEN
    true
  ELSE
    false
  END AS period_match,
  CASE WHEN awards_summary.award ->> 'documents' = contracts_summary.contract ->> 'documents' THEN true ELSE false END AS documents_match,
  count(contracts_summary.id) as contract_count
FROM
  contracts_summary
JOIN
  awards_summary
ON
  awards_summary.id = contracts_summary.id
AND
  awards_summary.award_id = contracts_summary.award_id
WHERE
  contracts_summary.collection_id IN :collection_ids
AND
  contracts_summary.release_type IN ('record', 'compiled_release')
GROUP BY
  contracts_summary.collection_id,
  contracts_summary.release_type,
  date_match,
  value_match,
  period_match,
  documents_match
ORDER BY
  contracts_summary.collection_id,
  contracts_summary.release_type,
  contract_count DESC

##### **Items**

Items are attached to the tender, award and contract sections of an OCDS release, so that users can see if there were any changes to the items being procured during the contracting process.

Use this section to check for differences between the items attached to the tender, award and contract sections.

In [None]:
%%sql

SELECT
  tender_summary.collection_id,
  tender_summary.release_type,
  CASE WHEN contracts_summary.contract -> 'items' = awards_summary.award -> 'items' THEN true ELSE false END AS award_contract_match,
  count(contracts_summary.id) AS contracts_count,
  CASE WHEN awards_summary.award -> 'items' = tender_summary.tender -> 'items' THEN true ELSE false END AS tender_award_match,
  count(awards_summary.id) AS awards_count
FROM
  tender_summary
JOIN
  awards_summary USING (id)
LEFT JOIN
  contracts_summary USING (id, award_id)
WHERE
  tender_summary.collection_id IN :collection_ids
AND
  tender_summary.release_type IN ('record', 'compiled_release')
GROUP BY
  tender_summary.collection_id,
  tender_summary.release_type,
  award_contract_match,
  tender_award_match


#### Placeholder values

Use this section to check for placeholder values.

Manually review the example release to identify placeholder values, e.g. 'n/a', 'test', '1970-01-01T00:00:00Z' etc.

Get an example release:

In [None]:
%%sql example_releases <<

WITH examples AS (
  SELECT DISTINCT ON (collection_id, release_type)
    collection_id,
    release_type,
    release
  FROM
    release_summary
  WHERE
    collection_id IN :collection_ids
  AND
    release_type IN ('release', 'embedded_release')
  ORDER BY
    collection_id,
    release_type,
    random()
)
SELECT
  jsonb_build_object('releases', jsonb_agg(release)) release_package
FROM
  examples


In [None]:
render(example_releases['release_package'][0])

### Coherence

#### Organization identifiers 

Publishers should collect and publish [organization identifiers](https://standard.open-contracting.org/latest/en/schema/identifiers/#organization-ids).

Use this section to check for invalid organization identifiers or incorrect organization identifiers.

For each organization identifier:

1. Look up the `scheme` in [org-id.guide](http://org-id.guide/) and follow the guidance to look up the organization identifiers in the register.
1. Check that the identifier exists in the register.

Get a list of all organization identifiers and select a random sample of 3 identifiers from each organization identifier scheme:

In [None]:
%%sql organization_identifiers <<

SELECT
  collection_id,
  release_type,
  party ->> 'name' as name,
  party -> 'identifier' ->> 'legalName' as legalName,
  roles,
  party -> 'identifier' ->> 'scheme' as scheme,
  party -> 'identifier' ->> 'id' as id,
  ocid
FROM
  parties_summary
WHERE
  collection_id IN :collection_ids;

In [None]:
organization_identifiers.groupby(['collection_id', 'release_type', 'scheme']).sample(n=3)

#### Foreign companies 

Publishers sometimes erroneously populate `.scheme` for foreign-registered companiues with the code for a domestic organization register.

Use this section to check that the correct organization identifier scheme is provided for foreign-registered companies.

Set the `country` variable to the name of the country for the publisher before running the query.

For each organization identifier:

1. Look up the `scheme` in [org-id.guide](http://org-id.guide/) and follow the guidance to look up the organization identifiers in the register.
1. Check that the identifier exists in the register.

In [None]:
country = 'Paraguay'

In [None]:
%%sql

SELECT
  collection_id,
  release_type,
  name,
  scheme,
  id,
  legalName,
  country,
  roles
FROM
  (SELECT DISTINCT
    collection_id,
    release_type,
    party ->> 'name' as name, 
    party -> 'identifier' ->> 'scheme' scheme,
    party -> 'identifier' ->> 'id' id,
    party -> 'identifier' ->> 'legalName' legalName,
    party -> 'address' ->> 'country' as country,
    roles, 
    rank()
  OVER
    (PARTITION BY collection_id, release_id, party -> 'identifier' ->> 'scheme' ORDER BY random()) 
  FROM
    parties_summary
  WHERE
    collection_id IN :collection_ids
  AND
    release_type IN ('record', 'compiled_release')
  AND
    party -> 'address' ->> 'country' NOT ILIKE :country
  ) AS identifiers
WHERE
  rank <= 3
ORDER BY
  scheme;

#### Document metadata

Use this section to check that document metadata is accurate.

Retrieve the document from the `url` and check that each metadata field accurate reflects the actual document.





Get a random document:

In [None]:
%%sql

WITH documents AS (
    SELECT
      collection_id,
      release_type,
      'planning' as section,
      ocid,
      document
    FROM
      planning_documents_summary
    WHERE
      collection_id IN :collection_ids
    AND
      release_type IN ('record', 'compiled_release')
  UNION
    SELECT
      collection_id,
      release_type,
      'tender' as section,
      ocid,
      document
    FROM
      tender_documents_summary
    WHERE
      collection_id IN :collection_ids
    AND
      release_type IN ('record', 'compiled_release') 
  UNION
    SELECT
      collection_id,
      release_type,
      'awards' as section,
      ocid,
      document
    FROM
      award_documents_summary
    WHERE
      collection_id IN :collection_ids
    AND
      release_type IN ('record', 'compiled_release')
  UNION
    SELECT
      collection_id,
      release_type,
      'contracts' as section,
      ocid,
      document
    FROM
      contract_documents_summary
    WHERE
      collection_id IN :collection_ids
    AND
      release_type IN ('record', 'compiled_release') 
  UNION
    SELECT
      collection_id,
      release_type,
      ocid,
      'implementation' as section,
      document
    FROM
      contract_implementation_documents_summary
    WHERE
      collection_id IN :collection_ids
    AND
      release_type IN ('record', 'compiled_release')
  )
SELECT
  DISTINCT ON (collection_id, release_type)
  collection_id,
  release_type,
  section,
  document ->> 'id' as id,
  document ->> 'documentType' as documentType,
  document ->> 'title' as title,
  document ->> 'description' as description,
  document ->> 'url' as url,
  document ->> 'datePublished' as datePublished,
  document ->> 'dateModified' as dateModified,
  document ->> 'format' as format,
  document ->> 'language' as language
FROM
  documents
ORDER BY
  collection_id,
  release_type;

# Coverage

Use this section to check whether the data includes key fields.

## Organization identifiers

Use this section to check whether the data includes organization identifiers for buyers and procuring entities, and for suppliers and tenderers.

Calculate the coverage of `parties/identifier/id` and `parties/identifier/scheme` grouped by `parties/role`:

In [None]:
%%sql

SELECT
  collection_id,
  release_type,
  CASE
    WHEN roles @> '["buyer"]'::jsonb THEN 'buyer'
    WHEN roles @> '["procuringEntity"]'::jsonb THEN 'procuringEntity'
    WHEN roles @> '["supplier"]'::jsonb THEN 'supplier'
    WHEN roles @> '["tenderer"]'::jsonb THEN 'tenderer'
    ELSE 'other'
  END as role,
  COUNT(*) party_count,
  ROUND(SUM(CASE WHEN party -> 'identifier' ->> 'id' is not null THEN 1 ELSE 0 END)::numeric / COUNT(*), 2) id_coverage,
  ROUND(SUM(CASE WHEN party -> 'identifier' ->> 'scheme' is not null THEN 1 ELSE 0 END)::numeric / COUNT(*), 2) scheme_coverage
FROM
  parties_summary
WHERE
  collection_id in :collection_ids
AND
  release_type = 'compiled_release'
GROUP BY
  collection_id,
  release_type,
  role;

## Item classifications

Use this section to check whether the data includes item classifications.

Calculate the coverage of `items/classification/id` and `items/classification/scheme` grouped by `stage`:

In [None]:
%%sql

WITH items AS (
  SELECT
    collection_id,
    release_type,
    'tender' as stage,
    item -> 'classification' ->> 'id' as id,
    item -> 'classification' ->> 'scheme' as scheme
  FROM 
    tender_items_summary
  WHERE
    collection_id in :collection_ids
  AND
    release_type = 'compiled_release'
  UNION ALL
  SELECT
    collection_id,
    release_type,
    'award' as stage,
    item -> 'classification' ->> 'id' as id,
    item -> 'classification' ->> 'scheme' as scheme
  FROM 
    award_items_summary
  WHERE
    collection_id in :collection_ids
  AND
    release_type = 'compiled_release'
  UNION ALL
  SELECT
    collection_id,
    release_type,
    'contract' as stage,
    item -> 'classification' ->> 'id' as id,
    item -> 'classification' ->> 'scheme' as scheme
  FROM 
    contract_items_summary
  WHERE
    collection_id in :collection_ids
  AND
    release_type = 'compiled_release'
  
)
SELECT
  collection_id,
  release_type,
  stage,
  COUNT(*) item_count,
  ROUND(SUM(CASE WHEN id is not null THEN 1 ELSE 0 END)::numeric / COUNT(*), 2) id_coverage,
  ROUND(SUM(CASE WHEN scheme is not null THEN 1 ELSE 0 END)::numeric / COUNT(*), 2) scheme_coverage
FROM 
  items
GROUP BY
  collection_id,
  release_type,
  stage;

## Documents

OCDS encourages the disclosure of both data and documents, to enable both systemic analysis of large numbers of contracting processes and detailed scrutiny of individual procurements.

Use this section to check whether the data includes any documents.

Get a count of documents from each `documents` array:

In [None]:
%%sql

SELECT
  COUNT(*) as contracting_processes,
  SUM(ps.documents_count) as planning_documents,
  SUM(ts.documents_count) as tender_documents,
  SUM(total_award_documents) as award_documents,
  SUM(total_contract_documents) as contract_documents,
  SUM(total_contract_implementation_documents) as implementation_documents
FROM
  release_summary rs
LEFT JOIN
  planning_summary ps ON rs.id = ps.id
LEFT JOIN
  tender_summary ts ON rs.id = ts.id
WHERE
  rs.collection_id in :collection_ids
AND
  rs.release_type = 'compiled_release'