Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
This issue is under consideration for updates to the core OCDS standard in 1.1
It sits alongside proposals for a substantial extension to budgets which would allow multi-year and multi-source budgets to be captured. See #377
It builds upon past discussion in #345
In the current version of OCDS we have a very simple budget block which talks of linking out to the 'Budget Data Package' for more in-depth information.
However, the Budget Data Package has been superseded by the Fiscal Data Package which does not currently have the concept of a transaction identifier and for which current publication approaches do not focus on providing data at a stable URI.
This makes cross-linking between OCDS and FDP challenging.
We also see
The schema allows both string and uri formats for
We will also update the valid types for
Suggested draft documentation updates for the
We will explore guidance to the effect that URIs should include a # component indicating the identifier of the particular budget line item.
For example, if a contract is funded through the DFID aid project with the IATI Identifier 'GB-1-107171-101' then a contract process planning record could cross-reference this by:
An application would need to be 'IATI aware' to understand that #GB-1-107171-101 refers to an entry in the XML file found at that URL with GB-1-107171-101 as the value of //iati-activities/iati-activity/iati-identifier
The updated approach in the Fiscal Data Package does not currently appear to offer either:
(a) Stable URIs for packages;
which makes this approach very difficult.
Budget breakdown in #377
Please indicate support or opposition for this proposal using the +1 / -1 buttons or a comment. If opposing the proposal, please give clear justifications, and where possible, make an alternative proposals.
Views on the discussion points are welcome.
I'm not sure I'm clear on the motivation for this. A lot of other IDs in OCDS can be string or integer.
We can't actually make the change to the schema and be backwards compatible, because any data that uses an integer would have been valid, but now invalid. We could possibly think about deprecating the use of integers here though.
@Bjwebb I had understood that the mixing of strings and integers as possible field values was an issue for tools like flatten-tool. However, if not, happy to leave this / just deprecate use of integers to slowly move to string-only to avoid placing extra requirements on future tools to handle both.
I think this is possibly more of an issue for tabulate, and any other attempt to import OCDS into a database, than flatten-tool specifically.
I count 12 identifiers in OCDS that can be string or integer, and this issue only addresses 2 of them. As far as I can tell, they all have the same problem, so should any proposed change apply to all of them?
We can easily add a transaction ID to Fiscal Data Package.
We do have a very strong preference to implementing data models that reflect the data we actually see. I can't remember ever seeing a budget document with transaction identifiers. I have seen spending data with transaction identifiers.
Have you got some examples of budgets with transaction identifiers, so we could have a look and understand how this linkage will work with real data?
About URIs, the data package specifications don't assume that all data is available on persistent, immutable, publicly accessible URIs. While that would be desirable, we don't want to enforce it at the level of the specification. I don't see this as a blocker, however - a transaction id could be a string that is a URI, or not.
Thank for exploring this.
Our approach has always been that open data specifications should seek to balance publisher and user needs.
Just because data has never been represented a particular way to serve internal government needs does not mean that representation will not be important for consumers of the open data version of that dataset - and so specification development offers an opportunity for a conversation that connects data from inside governments, with user needs outside.
Of course, there should be care taken not to invent new data requirements without being clear on their feasibility - but including identifiers is, I would argue, a very important part of creating an ecosystem of distributed open data.
My understanding (from conversations with budget specialists... not direct experience), is that it should be possible to construct identifiers for budget lines from the various budget-line components and classifications.
I.e. Budget items may not have an existing ID, but a composite ID could be created for them.
I'm not certain that this would yield unique budget line identifiers (e.g. an identifier might span multiple budget lines), but even in that case it could be useful to help connect contracts back to their budget sources.
Agreed that immutable URIs cannot be enforced at the level of the specification - but unless there is guidance on this we can point to, it makes it hard to reference FDP specifically.
Agreed on including identifiers in theory. It is just that, in the absence of actual identifiers from the source, one goes down other paths which may or may not have the desired effect. e.g.: using the OpenSpending internal identifier as the transaction ID for a budget line - one could wonder if this is a good thing. From your perspective, I might guess that it is a good thing - OpenSpending is designed as a persistent, web-accessible service, and therefore can be a URI provider. However, what then is the relationship of this representation of the data to the source of it?
It is actually extremely complex, and I've explored this quite deeply. Even hashing all the values of a budget line does not guarantee uniqueness in a single data source (I've had real examples from UK govt. where multiple transactions for a single department in a given month are identical, and definitely different transactions), let alone globally. One can even add the row number in a source file to the hash, which would give source-level uniqueness - then, one is confronted with the problem of updates at the source - if the row number of the same line of data changes, is it still the same line of data?
These possibilities alone make it close to impossible to uniquely identify any budget or spending line if the source does not provide an internal transaction id.
In the context of OCDS, we ask publishers to link out to additional contextual budget information.
In the update proposed for 1.1, we would have a situation where:
As I understand, because of the way FDP has evolved, we would still need to remove the formal reference to it from OCDS, but could include a link out to some guidance / blog posts / other content showing the different potential ways (or ideally examples of practice) for people making this linkage work.
Agree there is complexity here - but there are also important use-cases of being able to track between contracts and budgets that can't be served without encouraging publishers to find an approach to creating identifiers.
We face the same challenges with the concept of a 'procurement process', where user needs call for the ability to link tenders, contracts and awards - but many prior systems don't clearly link these. The role of the spec in this case is to show what users need and to encourage publishers to find approaches to meet that need.
On this, I 100% agree on the use cases. However, if the source data can't meet this promise, a spec can't either. And, that is why composite keys in this regard are actually dangerous in terms of the goals we'd want to achieve - they can't even guarantee uniqueness in a single dataset, and therefore encouraging their usage could be misleading or worse.
Either sounds fine to me (formal reference, or not), but, if we made an official "semantic type" of transaction ID, not as a
On the second point first - a
On the first point - I think it's important to understand specifications as part of a dynamic system of data production: the underlying data and it's features are not static. We've seen how people adapt systems and data to meet specifications, and so a spec (in the context of
Flagging that Budget Data Package is also referenced in the
Based on the discussion above I think this just needs updating to read "Fiscal Data Package" rather than "Budget Data Package" which I will include in the updates to the transaction block in #372
During peer-review there was a request for a minor revision:
However, the schema states for
As human readable documents can be included in the planning.documents block, we don't propose to update the schema guidance.