Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiled release checks: Display values/details in the quality failure samples table #49

Open
duncandewhurst opened this issue Jul 29, 2020 · 8 comments
Labels
compiled release checks Relating to compiled release-level check pages
Milestone

Comments

@duncandewhurst
Copy link

duncandewhurst commented Jul 29, 2020

Original title: Amendment dates check: List of amendments with issues

Similar to open-contracting-archive/pelican#53, from the amendment dates check I can see that there are 4,820 amendments with incoherent dates, but I can't see which amendments/dates are incoherent.

For feedback to the publisher it would be better if I could report which amendments/dates have issues. Ideally, I would like to export a results table with the following columns:

ocid, amendment path, amendment/id, date path, date parent id

e.g. for a contract amendment which is before the contract's dateSigned

ocds-xxxxx-123456, contracts/amendments, 123, contract/dateSigned, 456

Where '456' is the id of the contract.

@jpmckinney
Copy link
Member

The check performs four comparisons, so the table would have to be a fair bit more complicated if you want the publisher to be able to see what is incoherent about the dates.

In your simpler table, since the error might be with either of the two dates in each comparison, it's not clear to me which should be reported.

In the metadata of the Previews, it shows the pairs of date paths and values that fail. For Moldova, it seems like they frequently amend tenders before the tender period.

We should clarify how they implemented the semantics. tenderPeriod does mean submission period, so it's possible to amend a published tender prior to the start of the submission period (if it's not immediately open after publication). In that case, maybe the action is to change the definition of the check.

@duncandewhurst
Copy link
Author

I think the table can express each type of incoherence:

ocid amendment path amendment/id date path date parent id
ocds-xxxxx-123456 tender/amendments ta-01 release/date r-01
ocds-xxxxx-123456 tender/amendments ta-01 tender/tenderPeriod/startDate t-01
ocds-xxxxx-123456 contracts/amendments ca-01 release/date r-01
ocds-xxxxx-123456 contracts/amendments ca-01 contract/dateSigned c-01

As an analyst I could then summarise and provide examples for each:

  • There are X tender amendments which are before the release date (e.g. amendment/id in ocid)
  • There are X tender amendments which are before the tender period (e.g. amendment/id in ocid)
  • There are X contract amendments which are before the release date (e.g. amendment/id in ocid)
  • There are X contract amendments which are before the contract signature date (e.g. amendment/id in contract/id in ocid)

With the current set-up I can report to Moldova that there are definitely issues with tender amendments being before the tender period (subject to how they interpreted it), but I don't know whether there are other date coherence issues that don't appear in the previews I happened to click in to, so I also have to say 'there might also be issues with...' and explain the rest of the checks.

@jpmckinney
Copy link
Member

Wouldn't it help to have the actual date values, not just the paths and IDs?

I'm thinking we can perhaps add a tag to the reporting feature that reformats the metadata from multiple samples into a table. @hrubyjan

@duncandewhurst
Copy link
Author

Wouldn't it help to have the actual date values, not just the paths and IDs?

I think it's most useful to have the relevant ids, e.g. to help track down an issue with the OCDS mapping/export for a particular procedure type or system, but I guess the actual dates could be useful to catch other types of issue, e.g. all amendments have date 1970-01-01 etc.

@jpmckinney
Copy link
Member

jpmckinney commented Jul 30, 2020

For me, the dates give a sense of whether it's e.g. a time zone issue (just a few hours off), a semantic issue (like here, it's just a few days), or a clear data quality or mapping issue (month/years). Pulling up a compiled release (the publisher might not have compiled releases, so they'd have to figure out the specific releases) based on the OCID and then navigating to an entry in an array is a lot of effort, I think. The values can be sufficient to diagnose the issue.

@duncandewhurst
Copy link
Author

Okay, makes sense. In any case, it would be useful to know which amendment date / other date field combos have issues.

@jpmckinney jpmckinney changed the title Amendment dates check: Export list of amendments with issues Amendment dates check: List of amendments with issues Sep 2, 2020
@jpmckinney jpmckinney transferred this issue from open-contracting-archive/pelican Sep 14, 2021
@jpmckinney jpmckinney added the compiled release checks Relating to compiled release-level check pages label Sep 14, 2021
@jpmckinney jpmckinney changed the title Amendment dates check: List of amendments with issues Display the objects with quality issues Sep 14, 2021
@jpmckinney
Copy link
Member

Merging #48 into this issue:

Title: consistent.period_duration_in_days: Report which periods the check failed for

For example, from checking a few sample failiures it looks like the issue is with tender.tenderPeriod, but I would like to know if any of the other periods in the dataset are also affected so I can report that to the publisher.

@jpmckinney
Copy link
Member

Merging #47 into this issue:

Title: Field paths that fail the check of "Contracting process timeline"

If we have many errors, it's not possible to know exactly which fields cause this failure. For example, in https://dqt.datlab.eu/resource/201/detail/coherent.dates we have 6,147 failed compiled releases, making it difficult to add the field paths in the Data Quality Feedback Report:

You should check your OCDS mapping for [field paths] and, if necessary, update your data pipeline so that your data is consistent.

And for example, if we have 10 errors, we have to open each release one by one and verify in which field the problem is.

@jpmckinney jpmckinney changed the title Display the objects with quality issues Display the objects/values with quality issues Sep 14, 2021
@jpmckinney jpmckinney added the field checks Relating to field-level check pages label Sep 14, 2021
@jpmckinney jpmckinney removed the field checks Relating to field-level check pages label Sep 14, 2021
@jpmckinney jpmckinney added this to the Priority milestone Dec 1, 2021
@jpmckinney jpmckinney changed the title Display the objects/values with quality issues Compiled release checks: Display values/details in the quality failure samples table Aug 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiled release checks Relating to compiled release-level check pages
Projects
None yet
Development

No branches or pull requests

2 participants