Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to core budget block #378

Closed
timgdavies opened this issue Sep 18, 2016 · 13 comments
Assignees
Milestone

Comments

@timgdavies
Copy link
Contributor

@timgdavies timgdavies commented Sep 18, 2016

This issue is under consideration for updates to the core OCDS standard in 1.1

It sits alongside proposals for a substantial extension to budgets which would allow multi-year and multi-source budgets to be captured. See #377

It builds upon past discussion in #345

The issue

In the current version of OCDS we have a very simple budget block which talks of linking out to the 'Budget Data Package' for more in-depth information.

However, the Budget Data Package has been superseded by the Fiscal Data Package which does not currently have the concept of a transaction identifier and for which current publication approaches do not focus on providing data at a stable URI.

This makes cross-linking between OCDS and FDP challenging.

We also see budget.source, intended as a link to the BDP in which an identified budget line item would exist, commonly mis-used to provide a named budget line or name of a department providing budget.

The schema allows both string and uri formats for budget.source which may be the source of this confusion. Changing budget.source to only a URI would not be backwards compatible.

The proposal

We propose

  • deprecating (see #367) the budget.source field;
  • updating documentation for the budget block to make clear that URI can be used in a general purpose way to link to any machine-readable source of budget information;

We will also update the valid types for id and projectID to strings only (they are currently able to be string or integer).

Suggested draft documentation updates for the uri field are below

Current text:

A URI pointing directly to a machine-readable record about the related budget or projects for this contracting process.

Proposed update:

A URI pointing directly to a machine-readable record about the budget line-item or line-items that fund this contracting process. Information may be provided in a range of formats, including using IATI, the Open Fiscal Data Standard or any other standard which provides structured data on budget sources.
Human readable documents can be included using the planning.documents block.

Discussion

We will explore guidance to the effect that URIs should include a # component indicating the identifier of the particular budget line item.

For example, if a contract is funded through the DFID aid project with the IATI Identifier 'GB-1-107171-101' then a contract process planning record could cross-reference this by:

  • Including a planning.budget block
  • Including the URI http://iati.dfid.gov.uk/iati_files/Country/DFID-Afghanistan-AF.xml#GB-1-107171-101 in the uri field

An application would need to be 'IATI aware' to understand that #GB-1-107171-101 refers to an entry in the XML file found at that URL with GB-1-107171-101 as the value of //iati-activities/iati-activity/iati-identifier

The updated approach in the Fiscal Data Package does not currently appear to offer either:

(a) Stable URIs for packages;
(b) Line-item identifiers;

which makes this approach very difficult.

See also

Budget breakdown in #377

Questions

  • @Bjwebb @kindly can we make the string,integer -> string change I suggest above whilst retaining backwards compatibility. @Edafe Is it possible to check if any publishers are providing these fields as integers at present?
    • As a follow up - if we wanted to deprecate, but for 1.1 continue to allow as valid, the ability to provide integers, is there an easy way to mark this up in the schema?
  • Open Spending / FDP folk - is there any way we could make a cross-reference from an Open Contracting process to data in the new FDP format?
  • Should we introduce an explicit URI for projects as well as project name and ID. The proposal above removes the idea that uri might also link to project information, and makes this budget specific.

Engagement

Please indicate support or opposition for this proposal using the +1 / -1 buttons or a comment. If opposing the proposal, please give clear justifications, and where possible, make an alternative proposals.

Views on the discussion points are welcome.

@Bjwebb

This comment has been minimized.

Copy link
Contributor

@Bjwebb Bjwebb commented Sep 21, 2016

We will also update the valid types for id and projectID to strings only (they are currently able to be string or integer).

I'm not sure I'm clear on the motivation for this. A lot of other IDs in OCDS can be string or integer.

@Bjwebb @kindly can we make the string,integer -> string change I suggest above whilst retaining backwards compatibility.

We can't actually make the change to the schema and be backwards compatible, because any data that uses an integer would have been valid, but now invalid. We could possibly think about deprecating the use of integers here though.

@duncandewhurst

This comment has been minimized.

Copy link
Contributor

@duncandewhurst duncandewhurst commented Sep 22, 2016

The schema also makes reference to the budgtet data package in contract.implementation.transactions.id and contract.implementation.transactions.source we should also update the text for these fields

@timgdavies

This comment has been minimized.

Copy link
Contributor Author

@timgdavies timgdavies commented Sep 22, 2016

@Bjwebb I had understood that the mixing of strings and integers as possible field values was an issue for tools like flatten-tool. However, if not, happy to leave this / just deprecate use of integers to slowly move to string-only to avoid placing extra requirements on future tools to handle both.

@Bjwebb

This comment has been minimized.

Copy link
Contributor

@Bjwebb Bjwebb commented Sep 26, 2016

I think this is possibly more of an issue for tabulate, and any other attempt to import OCDS into a database, than flatten-tool specifically.

I count 12 identifiers in OCDS that can be string or integer, and this issue only addresses 2 of them. As far as I can tell, they all have the same problem, so should any proposed change apply to all of them?

@pwalsh

This comment has been minimized.

Copy link

@pwalsh pwalsh commented Dec 14, 2016

Hi @timgdavies

Please do @pwalsh or @akariv for the FDP team to jump in here - I was not aware of this discussion.

We can easily add a transaction ID to Fiscal Data Package.

We do have a very strong preference to implementing data models that reflect the data we actually see. I can't remember ever seeing a budget document with transaction identifiers. I have seen spending data with transaction identifiers.

Have you got some examples of budgets with transaction identifiers, so we could have a look and understand how this linkage will work with real data?

About URIs, the data package specifications don't assume that all data is available on persistent, immutable, publicly accessible URIs. While that would be desirable, we don't want to enforce it at the level of the specification. I don't see this as a blocker, however - a transaction id could be a string that is a URI, or not.

@timgdavies

This comment has been minimized.

Copy link
Contributor Author

@timgdavies timgdavies commented Dec 14, 2016

Thank for exploring this.

Our approach has always been that open data specifications should seek to balance publisher and user needs.

Just because data has never been represented a particular way to serve internal government needs does not mean that representation will not be important for consumers of the open data version of that dataset - and so specification development offers an opportunity for a conversation that connects data from inside governments, with user needs outside.

Of course, there should be care taken not to invent new data requirements without being clear on their feasibility - but including identifiers is, I would argue, a very important part of creating an ecosystem of distributed open data.

On budgets

My understanding (from conversations with budget specialists... not direct experience), is that it should be possible to construct identifiers for budget lines from the various budget-line components and classifications.

I.e. Budget items may not have an existing ID, but a composite ID could be created for them.

I'm not certain that this would yield unique budget line identifiers (e.g. an identifier might span multiple budget lines), but even in that case it could be useful to help connect contracts back to their budget sources.

On URIs

Agreed that immutable URIs cannot be enforced at the level of the specification - but unless there is guidance on this we can point to, it makes it hard to reference FDP specifically.

@pwalsh

This comment has been minimized.

Copy link

@pwalsh pwalsh commented Dec 14, 2016

@timgdavies

Agreed on including identifiers in theory. It is just that, in the absence of actual identifiers from the source, one goes down other paths which may or may not have the desired effect. e.g.: using the OpenSpending internal identifier as the transaction ID for a budget line - one could wonder if this is a good thing. From your perspective, I might guess that it is a good thing - OpenSpending is designed as a persistent, web-accessible service, and therefore can be a URI provider. However, what then is the relationship of this representation of the data to the source of it?

Composite keys:

It is actually extremely complex, and I've explored this quite deeply. Even hashing all the values of a budget line does not guarantee uniqueness in a single data source (I've had real examples from UK govt. where multiple transactions for a single department in a given month are identical, and definitely different transactions), let alone globally. One can even add the row number in a source file to the hash, which would give source-level uniqueness - then, one is confronted with the problem of updates at the source - if the row number of the same line of data changes, is it still the same line of data?

These possibilities alone make it close to impossible to uniquely identify any budget or spending line if the source does not provide an internal transaction id.

@timgdavies

This comment has been minimized.

Copy link
Contributor Author

@timgdavies timgdavies commented Dec 14, 2016

Thanks @pwalsh

In the context of OCDS, we ask publishers to link out to additional contextual budget information.

In the update proposed for 1.1, we would have a situation where:

  • If they are choosing to manage that information persistently via Open Spending, they can link there.
  • If they choose to add identifiers to a bulk FDP (and FDP permits this in a standard way), then they can link there.
  • If they have some other system providing URIs for individual budget lines, or aggregated data on budget lines but with some fragment identifiers that support linking, they can link there.

As I understand, because of the way FDP has evolved, we would still need to remove the formal reference to it from OCDS, but could include a link out to some guidance / blog posts / other content showing the different potential ways (or ideally examples of practice) for people making this linkage work.

Agree there is complexity here - but there are also important use-cases of being able to track between contracts and budgets that can't be served without encouraging publishers to find an approach to creating identifiers.

We face the same challenges with the concept of a 'procurement process', where user needs call for the ability to link tenders, contracts and awards - but many prior systems don't clearly link these. The role of the spec in this case is to show what users need and to encourage publishers to find approaches to meet that need.

@pwalsh

This comment has been minimized.

Copy link

@pwalsh pwalsh commented Dec 14, 2016

Agree there is complexity here - but there are also important use-cases of being able to track between contracts and budgets that can't be served without encouraging publishers to find an approach to creating identifiers.

On this, I 100% agree on the use cases. However, if the source data can't meet this promise, a spec can't either. And, that is why composite keys in this regard are actually dangerous in terms of the goals we'd want to achieve - they can't even guarantee uniqueness in a single dataset, and therefore encouraging their usage could be misleading or worse.

As I understand, because of the way FDP has evolved, we would still need to remove the formal reference to it from OCDS, but could include a link out to some guidance / blog posts / other content showing the different potential ways (or ideally examples of practice) for people making this linkage work.

Either sounds fine to me (formal reference, or not), but, if we made an official "semantic type" of transaction ID, not as a MUST (for all the reasons above), but as a MAY, would that be what you need to keep a formal reference to FDP?

@timgdavies

This comment has been minimized.

Copy link
Contributor Author

@timgdavies timgdavies commented Dec 14, 2016

Thanks @pwalsh

On the second point first - a MAY should get us to the place where we can encourage use of FDP alongside other solutions and be able to mention it (which I would like to be able to do).

On the first point - I think it's important to understand specifications as part of a dynamic system of data production: the underlying data and it's features are not static. We've seen how people adapt systems and data to meet specifications, and so a spec (in the context of MAY for things which cannot be required right now) there is an opportunity to work to align data production and user needs in pragmatic ways.

@pwalsh

This comment has been minimized.

Copy link

@pwalsh pwalsh commented Dec 14, 2016

Great @timgdavies I'll talk with @akariv on getting a transaction id type added.

@akariv anything to add here in this context?

@duncandewhurst

This comment has been minimized.

Copy link
Contributor

@duncandewhurst duncandewhurst commented Jan 16, 2017

Flagging that Budget Data Package is also referenced in the contract.implementation.transactions section of the schema:

A spending transaction related to the contracting process. Draws upon the data models of the Fiscal Data Package and the International Aid Transpareny Initiative and should be used to cross-reference to more detailed information held using a Budget Data Package, IATI file, or to provide enough information to allow a user to manually or automatically cross-reference with some other published source of transactional spending data.

Based on the discussion above I think this just needs updating to read "Fiscal Data Package" rather than "Budget Data Package" which I will include in the updates to the transaction block in #372

@timgdavies

This comment has been minimized.

Copy link
Contributor Author

@timgdavies timgdavies commented Apr 11, 2017

During peer-review there was a request for a minor revision:

URL should not be limitative to machine readable documents. In the mexican case, we have webpages with more information about the budget line than any machine readable file. This webpages are fed with information from the Open Fiscal Data Package, for which Mexico was the first implementation pilot.

However, the schema states for uri that this should be:

A URI pointing directly to a machine-readable record about the budget line-item or line-items that fund this contracting process. Information may be provided in a range of formats, including using IATI, the Open Fiscal Data Standard or any other standard which provides structured data on budget sources. Human readable documents can be included using the planning.documents block.

As human readable documents can be included in the planning.documents block, we don't propose to update the schema guidance.

@timgdavies timgdavies closed this Jun 15, 2017
@pwalsh pwalsh mentioned this issue Aug 29, 2017
0 of 14 tasks complete
@pwalsh pwalsh mentioned this issue Sep 7, 2017
0 of 2 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.