Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

Clarify the schema requirements or provide an alternate version for non-federal sources #247

Closed
philipashlock opened this issue Jan 13, 2014 · 16 comments

Comments

@philipashlock
Copy link
Contributor

The data.json schema is already based on the DCAT standard, but clearly its particular implementation of it is intended to serve as a standard for not only the federal government but others as well. For this to happen, there needs to be a delineation in the schema for which fields and requirements are specific to federal agencies.

For resources like data.gov this is important to help them federate data sources from state and local government based on a compatible data.json schema used by those entities.

philipashlock referenced this issue in opendataphilly/Open-Data-Catalog Jan 13, 2014
@JeanneHolm
Copy link

+1

@georgethomas
Copy link

from http://www.w3.org/TR/vocab-dcat/;

"A DCAT profile is a specification for data catalogs that adds additional
constraints to DCAT. A data catalog that conforms to the profile also
conforms to DCAT. Additional constraints in a profile may include:

  • A minimum set of required metadata fields
  • Classes and properties for additional metadata fields not covered in
    DCAT
  • Controlled vocabularies or URI sets as acceptable values for properties
  • Requirements for specific access mechanisms (RDF syntaxes, protocols)
    to the catalog's RDF description"

I believe some profile examples of other governments using DCAT can be
found here;

http://www.w3.org/2011/gld/wiki/DCAT_Implementations

On Mon, Jan 13, 2014 at 12:55 PM, Jeanne Holm notifications@github.comwrote:

+1


Reply to this email directly or view it on GitHubhttps://github.com//issues/247#issuecomment-32193382
.

@mhogeweg
Copy link
Contributor

would a state/local implementation be a profile of DCAT proper or a profile 'on top of' the Data.gov profile? would state/local DCAT have to conform to the same validation rules as federal DCAT? that would be hard if there are such fields as bureauCode or programCode that have a federal focus. Perhaps the definition of those fields can be extended to include all agencies/programs that share data through Data.gov?

@gbinal
Copy link
Contributor

gbinal commented Jan 14, 2014

Just to check, it seems like an initial version of this would be as simple as noting that the following fields do not apply to non-federal agencies:

  • Bureau Code Program Code
  • Data Quality
  • Primary IT Investment UII
  • System of Records

I don't necc. know what form that would take but am figuring that that's the meat of it. I imagine it's more complicated than that, though.

@nsinai
Copy link
Contributor

nsinai commented Jan 24, 2014

+1

@gbinal those look right

Is there a minimum? E..g. a similar structure with required, required if applicable, etc.

@ianjkalin
Copy link

As a benchmark for non-Federal participation, check out these automatically generated data.json files from city, county and state open data portals:

https://data.raleighnc.gov/data.json
https://data.montgomerycountymd.gov/data.json
https://data.ny.gov/data.json

You'll notice that each government is choosing to use their own metadata template. But there are a good deal of common-sense overlaps with the Federal metadata standard.

@jpvelez
Copy link

jpvelez commented Feb 11, 2014

Forgive me if this has already been discussed elsewhere, but what are the potential uses of this open data metadata standard? The best way to prioritize what fields should be in or out is to think about who need to use them.

Here's a few use cases I'm familiar:

EDIT: looks like the conversation and work are incredibly far advanced, and the fields are largely settled, which is awesome. This can just be a roundup of work that's proposed or being done with this kind of metadata.

  • Open Data Report Card: there's been a lot of discussion around developing open data report card, so people could see how much progress a government entity has made, and how they compare to their peers. The easiest and most sustainable way to do this is if the metrics in said report card could be generated programmatically from a data.json file or similar. Ideally, the report cards would allow apples to apples comparisons between... actually apples. (As in, compare cities to cities, and so on.)
  • Alternative interfaces for understanding / using data portals - Some work has been done on providing better interfaces to open data, wether visualizing single data portals (data portal treemap or creating search engines) for many of them.
  • Research on open data: @tlevine has dome some preliminary analysis on Socrata data portals, lots of academics are interesting in this.
  • Connecting open datasets to their upstream data systems and downstream applications: the City of @Chicago (@tomschenkjr) is developing an open source data dictionary to document the databases the city has in production. There's been talk of connecting databases / tables represented in that system to the datasets they feed on Chicago's data portal. On the civic apps side of things, Chicago's open gov hack night has developed a lightweight mechanism (forked by NYC) for tracking civic apps - compile a list of repos, get all other project details from Github API - and there's been talk of programmatically tying projects to their source datasets. Clearly, making it easy to connect data systems, open datasets, and resulting applications would improve all the already stated use cases.

@philipashlock
Copy link
Contributor Author

@jpvelez Thanks for sharing those! Your comment is essentially a first draft of content that could comprise a new section of the Project Open Data website around case studies or opportunities for the use of the schema. @gbinal Perhaps we can carve out a heading on the frontpage where it would make sense to put things like this?

@philipashlock philipashlock added this to the Next Version of Common Core Metadata Schema milestone May 8, 2014
@rebeccawilliams
Copy link
Contributor

Of note: we discussed this briefly at Thursday's Common-Core Metadata Schema Review (see #325) where an alternative schema for non-federal sources was generally +1'ed, with a particular emphasis on removing federally focused fields for non-federal sources.

Left unresolved was: should additional fields be required for non-federal sources (e.g. license)?

@gbinal
Copy link
Contributor

gbinal commented Jul 21, 2014

@philipashlock - agreed. I think linking to a state/local section from the homepage makes sense.

@gbinal
Copy link
Contributor

gbinal commented Jul 21, 2014

Also, I recommend taking the question of any additional required fields for non-federal sources and treating that as a separate issue from this.

@gbinal gbinal removed this from the Next Version of Common Core Metadata Schema (1.0 -> 1.1.) milestone Jul 24, 2014
@rebeccawilliams
Copy link
Contributor

@jpvelez @philipashlock adding the new Metadata: Existing Practices and Survey from the DataSF Resources page to the list of resources here and I'm happy to chat additions in a new Issue.

@mhogeweg
Copy link
Contributor

mhogeweg commented Aug 1, 2014

A discussion about metadata practices is incomplete without the work done by FGDC, the work at states in the GIS Inventory, and the international metadata initiatives like INSPIRE (Europe), GEMINI (UK), ANZLIC (Australia/New Zealand), ...

@gbinal
Copy link
Contributor

gbinal commented Sep 15, 2014

Thanks for the pull request, @philipashlock. Do you think that it would be sufficient to address this issue?

@philipashlock
Copy link
Contributor Author

@gbinal I think it's most of the way there. I think it'd be useful to also have a separate section the provides more background for non-federal sources on different requirements and I'll update that (now as part of the v1.1 branch - thanks for accepting the PR). I'll also make an update as part of v1.1 to move programCode and bureauCode up to the required section now that we've called out the federal USG specific fields.

philipashlock added a commit that referenced this issue Sep 15, 2014
But only for federal agencies. See #247
@gbinal
Copy link
Contributor

gbinal commented Nov 10, 2014

Thanks everyone for driving the conversation around this issue and helping to assemble the v1.1 metadata update.

There appears to be strong consensus around this issue, which has been accepted in the v1.1 update and merged into Project Open Data. Project Open Data is a living project though. Please continue any conversations around how the schema can be improved with new issues and pull requests!

It's important for government staff as well as the public to continue to collaborate to make the Open Data Policy ever better. Though the v1.1 update is a substantial update, future iterations do not have to be, so whatever your ideas - big or small - please continue to work with this community to improve how government manages and opens its data.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants