Allow CSV resource schema to be string #264

Closed
patcon opened this Issue May 26, 2016 · 12 comments

Projects

None yet

6 participants

@patcon
patcon commented May 26, 2016

It seems that the base spec allows a resource's schema to be an object or a string, but the jsontable and tabular data spec insists that it be an object.

Would it be reasonable to allow this to be a string? It would help keep the individual resource objects small when there are many separate resources (ie different years) with the same columns.

http://data.okfn.org/tools/validate?url=https%3A%2F%2Fraw.githubusercontent.com%2Fpeel-datasets%2Fland-tenure%2Fmaster%2Fdatapackage.json

@pwalsh
Member
pwalsh commented Jul 12, 2016

@patcon a string, as in, the URL to a JSON Table Schema definition?

@patcon
patcon commented Jul 13, 2016

No, sorry, a string as in a reference to a key in the "schemas" hash elsewhere in the json:
https://raw.githubusercontent.com/peel-datasets/land-tenure/master/datapackage.json

@rufuspollock
Contributor

JTS would always be an object because it is a spec which defines that object.

Tabular Data Package should inherit Data Package but the "target" of schema must always be a JTS - either "inline" (the default) or via any of the Data Package reference routes.

@danfowler
Contributor

@pwalsh @rgrp

The Data Package spec defines that the schema on a resource can be one of three things: (a) a string which is a URL pointer to a schema* (b) a string which is a reference to a schema defined in a schemas object in the same datapackage.json (as in the above example) or (c) an "inline" schema object like JSON Table Schema (what we typically expect).

http://specs.frictionlessdata.io/data-packages/#schemas-property

The problem is that the Tabular Data Package spec doesn't refer to the Data Package reference routes when it defines the requirements for a schema. It reads like, "this should be an object" and the JSON Schema that validates Tabular Data Packages backs this up.

The resource metadata MUST include a schema attribute whose value MUST conform to the JSON Table Schema

To resolve this issue, the language in the Tabular Data Package spec should be updated to refer to the referencing mechanism already present in the Data Package spec and the JSON Schemas that validate the tabular and base datapackage.json profiles should be updated.

* Indirectly, via another a hosted datapackage.json. This would probably be cleaner as just a direct link to a self-contained JSON Table Schema .json file.

@pwalsh
Member
pwalsh commented Jul 28, 2016

@danfowler yes, thanks.

@rgrp I think we should remove, at least for V1, the various options for setting a schema. It is too flexible. I'd prefer something simple and explicit: schema property on a resource can be an object (the schema) or a url (to a schema object as json).

@rufuspollock
Contributor

@pwalsh if you want to change the current definition of schema property let's open a new issue and debate. I know that the functionality it offers is pretty valuable to some people - at same time it does add complexity ...

@danfowler for now we definitely should correct so that TDP simply says target of schema property must be a JTS. Can you do a PR for this?

@roll roll added the backlog label Aug 8, 2016
@danfowler
Contributor

@rgrp can do

@danfowler
Contributor

@rgrp Actually, TDP is pretty clear throughout that the schema should be JTS.

@rufuspollock
Contributor

@danfowler right but the point was to emphasize that although the schema property target must be JTS that target can either be inline or "out of line" as per parent Data Package spec.

@roll roll removed the backlog label Aug 29, 2016
@rufuspollock rufuspollock modified the milestone: Current Sep 27, 2016
@muehlenpfordt

@pwalsh: I see you questioned whether option (b) as defined by @danfowler

(b) a string which is a reference to a schema defined in a schemas object in the same datapackage.json

should remain part of the specs.

On behalf of the project Open Power Systtem Data, I can say this: Some of our datapackages include the same content in a CSV and an XLSX file, We find that option (b) useful, as it allows us to just define a schema once and use it for both resources.

@pwalsh
Member
pwalsh commented Oct 12, 2016

@muehlenpfordt ok, got it.

@rgrp maybe we need to consider #295 for v1, which is a generic solution to the problem of referencing properties in the same object, thereby removing the type of special-casing that bothers me in the current spec, as described here.

@rufuspollock
Contributor

FIXED.

To summarize the issue here as it got a bit confused: TDP was not clear that schema follows Data Package spec in allowing the value of schema property to be specified both inline and out of line. This has now been corrected. (Tabular Data Package was clear that schema value had to be a JSON Table Schema.)

Note: #295 etc is something different and can be considered quite separately.

@rufuspollock rufuspollock added a commit that referenced this issue Dec 1, 2016
@rufuspollock rufuspollock [tdp][s]: clarify that TDP supports both inline and out of line schem…
…a values - fixes #264.

Issue: TDP was **not** clear that `schema` follows Data Package spec in allowing the value of schema property to be specified both inline and out of line. This has now been corrected. (Tabular Data Package was clear that `schema` value had to be a JSON Table Schema.)
e33020d
@roll roll added duplicate and removed duplicate labels Dec 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment