## Check scope



Use this section to check:

* how many releases, records and compiled releases your data contains
* what stages of the contracting process your data covers
* what date range your data covers

### Release and record counts

Collections in Kingfisher Process contain either [releases](https://standard.open-contracting.org/latest/en/schema/reference/), [records](https://standard.open-contracting.org/latest/en/schema/records_reference/) or [compiled releases](https://standard.open-contracting.org/latest/en/schema/records_reference/#compiled-release). Kingfisher generates compiled release collections from release or record collections.

Use this section to check that the data contains the expected number of releases, records and compiled releases.

Generate a count of releases, records and compiled releases for each collection.

In [None]:
%%sql

SELECT
	id AS collection_id,
	cached_releases_count AS releases_count,
	cached_records_count AS records_count,
	cached_compiled_releases_count AS compiled_releases_count
FROM
	collection
WHERE
	id IN :collection_ids

### Contracting process stages

Use this section to check that the data covers the expected stages of the contracting process.

#### Release tags

[Release tags](https://standard.open-contracting.org/latest/en/schema/codelists/#release-tag) indicate the stage of the contracting process an OCDS release relates to.

Generate a summary of releases by `tag`.

In [None]:
%%sql

SELECT
	collection_id,
	release_type,
	tag,
	count(*)
FROM
	release_summary
GROUP BY
	collection_id,
	release_type,
	tag
ORDER BY
	collection_id;

#### Objects per stage

In OCDS, data is organized into objects, for each stage of a contracting process. Each compiled release has: at most one `Planning` object, at most one `Tender` object, any number of `Award` objects, and any number of `Contract` objects. Each `Contract` object has at most one `Implementation` object. As such, the number of `Award` objects can exceed the number of unique OCIDs, but the number of `Tender` objects can't.

Generate and plot a count of objects per stage:

In [None]:
query = """

  SELECT
    CASE WHEN paths.path = 'contracts/implementation' THEN 'implementation' ELSE paths.path END as stage,
    CASE WHEN paths.path IN ('planning', 'tender', 'contracts/implementation') THEN
      GREATEST (object_property, 0)
    ELSE
      GREATEST (array_count, 0)
    END AS object_count
  FROM (
    SELECT
      unnest(ARRAY['planning', 'tender', 'awards', 'contracts', 'contracts/implementation']) AS path) AS paths
    LEFT JOIN (
      SELECT
        *
      FROM
        field_counts
      WHERE
        collection_id IN :collection_ids
        AND release_type = 'compiled_release'
        AND path IN ('planning', 'tender', 'awards', 'contracts', 'contracts/implementation')) AS field_counts USING (path)

"""

objects_per_stage = %sql {query}

objects_per_stage_chart = sns.catplot(x="stage", y="object_count", kind="bar", data=objects_per_stage).set_xticklabels(rotation=90)

for ax in objects_per_stage_chart.axes.flat:
  format_thousands(ax.yaxis)

objects_per_stage

### Date ranges


Use this section to check that the data covers the expected date range.

Generate a summary of the earliest and latest `date`, `awards/date` and `contracts/dateSigned`.

In [None]:
%%sql

SELECT
	collection_id,
	release_type,
	'release_date' AS date_type,
	min(date) AS min,
	max(date) AS max
FROM
	release_summary
GROUP BY
	collection_id,
	release_type,
  date_type
UNION ALL
SELECT
	collection_id,
	release_type,
	'award_date' AS date_type,
	min(first_award_date) AS min,
	max(last_award_date) AS max
FROM
	release_summary
GROUP BY
	collection_id,
	release_type,
  date_type
UNION ALL
SELECT
	collection_id,
	release_type,
	'contract_datesigned' AS date_type,
	min(first_contract_datesigned) AS min,
	max(last_contract_datesigned) AS max
FROM
	release_summary
GROUP BY
	collection_id,
	release_type
ORDER BY
  collection_id,
  release_type,
  date_type;

### Release date distribution

Use this section to check that releases are distributed as expected.

Plot the count of releases per month:

In [None]:
query = """

SELECT
  collection_id::text,
  release_type,
  date,
  count(*) as release_count
FROM
	release_summary rs
WHERE
  collection_id in :collection_ids
GROUP BY
  collection_id,
  release_type,
  date
order by
  date ASC;

"""

release_dates = %sql {query}

# Resample by month
release_dates = release_dates.set_index('date')
release_dates = release_dates.groupby(['collection_id', 'release_type']).resample("M").sum()
release_dates = release_dates.reset_index()
release_dates = release_dates.set_index('date')

fig, ax = plt.subplots(figsize = [15,5])
sns.lineplot(data = release_dates, x='date', y='release_count', hue = 'collection_id', style = 'release_type')

format_thousands(ax.yaxis)
sns.despine()

### Extensions 

Use this section to check which extensions the data uses.

Generate a list of extensions declared in the package metadata:

In [None]:
%%sql

select 
  collection_id,
  release_type,
  jsonb_array_elements(package_data -> 'extensions') as extension,
  count(*) as count
from
  release_summary
where
  collection_id in :collection_ids
AND
  package_data is not null
GROUP BY
  collection_id,
  release_type,
  extension
ORDER BY
  collection_id,
  release_type,
  count DESC;