Versioning: Data Package v2 or v1.1? #858

peterdesmet · 2023-12-20T13:03:03Z

Hi all, the communication on the Frictionless specs update names it v2 (version 2, see also #853 #857). The announcement blog post also states (emphasis mine):

The modular approach will of course still be the cornerstone of the Frictionless specs v2, and we won’t introduce any breaking changes.

I'm very happy no breaking changes will be introduced, I think that should be a guiding principle. But following semantic versioning, the specs update should then be a minor version. Given that all major specs† are currently v1, I would argue that the upcoming release is v1.1.

I understand that v2 indicates that there is serious momentum behind the current development (dedicated project, new website). But to anyone who's not closely following Frictionless v2 seems like it is a major overhaul without backward compatibility. A v1.1 would (correctly) communicate that while Data Package is now its own standard and most things will work as expected. It also sets us on a path to incorporate more changes in future (minor) releases.

Sidenote: will we version Data Package (the collection of standards) as a whole or will the 4 standards be versioned separately (current approach)? I see benefits and downsides with both approaches.

†All major specs are v1: Data Package, Tabular Data Package, Data Resource, Tabular Data Resource and Table Schema. The exception is CSV Dialect which is v1.2, but it seems this one is renamed to Table dialect so one could argue to start over. Some of the other experimental specs (like Fiscal Package or Views) have other version numbers like 1.0-rc.1 and 1.0-beta.

The text was updated successfully, but these errors were encountered:

khusmann · 2023-12-21T23:40:49Z

+1 -- When I heard the v2 announcement, I immediately assumed it would include breaking changes and was surprised to find it was going to be backwards compatible.

Was v2 chosen because v1.1 felt like it wasn't communicating enough "distance" from v1.0 given the new website, dplib, etc.? If so, a jump to v1.5 might be another option to create separation before/after this initiative, which I would interpret as "major overhaul but no breaking changes".

... that said my opinion isn't very strong on this, so I'm happy to defer to whatever strategy has the most consensus/momentum.

Sidenote: will we version Data Package (the collection of standards) as a whole or will the 4 standards be versioned separately (current approach)? I see benefits and downsides with both approaches.

I think this is an excellent question and definitely warrants further discussion. How it is handled seems intertwined with the standard's governance structure / processes moving forward... Is this the sort of thing we want to/are planning to cover in the working group?

nichtich · 2023-12-22T05:48:59Z

I would not be surprised if there will be an edge case of some artificial piece of data being compliant with 1.0 but not with the new version because the existing wording allows things not planned to be allowed. Moreover I think a version 2.0 will more attract than discourage use.

fjuniorr · 2023-12-22T11:23:53Z

I don't even think we will need artificial data to hit this problem. #379 and #697 are breaking changes likely to be discussed which at some point were added¹ to frictionless-py v5.

Sidenote: will we version Data Package (the collection of standards) as a whole or will the 4 standards be versioned separately (current approach)? I see benefits and downsides with both approaches.

Thinking about "communication simplicity" I think they should be versioned as a whole. This quote from @roll captures the problem quite well:

For example, we would like to make our Python libs 100% compatible/implementing the specs. TBH at the moment, I don't really understand what does it mean. Whether there is a frozen v1 of the specs to be compatible with and where all the current spec changes go v1.1/v2 branch of the specs etc

To give another example, I can see how frictionless-r could support Tabular Data Resource v2 with #379 but not support CSV/Table Dialect v2 with #697. However this creates an explosion on the number of ways a client could be "standard compliant" creating confusion for users.

I think https://github.com/frictionlessdata/specs/issues/379 was removed after https://github.com/frictionlessdata/frictionless-py/issues/868 but frictionless-py 5.16.0 converts "dialect": {"delimiter": ";"} to "dialect": {"csv": {"delimiter": ";"}} unless system.standards = "v1" is specified. I noticed this after having some difficulties in creating data packages that would play nice with both frictionless-py and frictionless-r. ↩

roll · 2024-01-04T10:21:14Z

I think it's a valid point, and as a Working Group, we can vote on the version when we have finished the changelog.

Peter outlined the pros of staying on v1.1 so I'll add some arguments in favor of v2:

I think we should try pursuing the idea of having no breaking changes for the specs forever. It sounds really doable in my opinion as it's only a matter of not changing data types and not strongly changing the semantics of existent metadata properties. So if we stick to v1.1 we might never get v2 at all (it's not bad just stating)
At the same time, currently, we're doing a first update in 5 years, that will include many (mostly minor) but still many changes and some new features. In the future, if e.g. we will need to add just one thing like package.propX by semver we will be still updating to v1.3. So we will get two versions v1.2 and v1.3 (and following) not comparable in size and importance. I think v2 and following small v2.1, v2.2, etc will communicate better the structure of changes
As already mentioned naming major updates with major versions (v2, let's say v3 in a few years) even though it's not breaking is just easier for communication, funding, etc

TBH, I'm not sure if the specs need 100% compliance to semver as it's not software. For example, JSON Schema versioning has been like Draft X for years and now it's yyyy-mm based. Honestly speaking, those Draft X looked really weird but actually they kinda worked implementors just thought about being compliant with draft "version X"

roll · 2024-01-05T11:54:19Z

@peterdesmet
I think we need to work with core standard and domain-specific extensions as projects so it will be core vX, camtrap vY, fiscal vZ etc. So I would just version datapackage repository as whole (I guess you do the same for camtrap).

PS.
Fiscal Data Package as a domain-specific extension moved to its own project - https://github.com/frictionlessdata/datapackage-fiscal

khusmann · 2024-01-10T18:30:37Z

I just realized "backwards compatibility" / "no breaking changes" has different levels/types of strict-ness, and I'm not clear where we stand:

An implementation designed for v2 spec should be equally capable of reading v1 data packages
An implementation designed for v1 spec should be capable of reading v2 data packages (albeit with reduced features)

Different types of modifications to the spec break in different ways:

adding a new optional prop in v2 does not break either type of compatibility
removing a prop in v2 breaks (1) but not (2)
changing a prop type from integer in v1 to integer | string in v2 breaks (2) but not (1)

etc.

In general, it's easier to upgrade software than existing data artifacts... so I'd argue we should hold to (1) and relax (2) to give us more freedom for v2 improvements. It also puts me squarely in the v2 semver camp because although a given v2 spec implementation will be "backwards compatible with v1 data", it still is "breaking" in that v2 data will not necessarily work with a v1 implementation.

peterdesmet · 2024-01-10T18:51:16Z

Thanks @khusmann for the summary, I complete agree that we should hold to (1) and relax (2), i.e. future software application should still be able to read v1 data packages (since those will be around for a long time), but can be slow in adopting new features of v2.

I draw a different conclusion regarding the versioning though, since a v2 spec sounds (to me) that software implementations can at some point give up on v1. A v1.1 indicates that this is still within the same major version of the spec.

roll · 2024-01-25T09:48:26Z

@peterdesmet
Answering frictionlessdata/datapackage#12 (comment) as I think it will be good to have everything related to the versioning discussion in one place.

Why is it structurally non-breaking for implementations?

By structurally breaking change I mean something that will fail all the implementations on the next nightly-build. It will happen if we do a breaking change to one of JSON Schema profiles e.g. changing schema.fields to be a mapping instead of an array.

Unfortunately, as the specs in some places were written very broadly, we also have a grey zone. Maybe finiteNumber was a bad example of it but something like any format for dates. The specs just say that it's implementation specific so e.g. changing this will be implementation-specific breaking.

So in my head for v2 I have these tiers (and my opinion on change possibility):

profiles (JSON Schema) level breaking -> no for v2 (and probably no forever)
semantically/grey zone/etc breaking -> discussible
not breaking - yes

roll · 2024-01-25T09:54:23Z

Also, it's the specifics of working on standards that many kinds of new features (a property added) don't have full forward-compat as e.g. a new constraint will kind of break validation completeness of the current implementations. So maybe this kind of changes might differentiate major and minor in our case. E.g.:

source.version -> minor as it's a part of JSON Schema validation
constraints.inclusiveMaximum -> major as it requires implementations updates and affects validation completeness

peterdesmet · 2024-01-25T11:31:41Z

@roll, since you wanted everything related to versioning be part of this discussion, I'm also referring to this comment by @khughitt and me regarding implementations retrieving or detecting the version of the Data Package spec:

Tangential but, this makes me wonder whether it would make sense to modify the validation machinery to support validating against earlier versions of the spec?

That would be useful, but rather than implementations (or users) guessing what version of the spec was used for a datapackage.json, it will likely be good if that was indicated. I don't think this is currently possible?

roll · 2024-01-25T11:49:36Z

I think on the Standard side, we need to decide whether we provide standard version information for an individual descriptor e.g. as proposed here #444

I think every implementation is free to decide how to handle it as it's just about resources. E.g. some implementation can have a feature that it validates against versions X, Y, and Z. And some just against Y

Note, that currently we consider datapackage.json to be versionless

peterdesmet · 2024-01-29T14:35:14Z

I think the rules for changing the Data Package spec should be declared (on the spec website or elsewhere). I currently find it difficult to assess if PR follow the rules. Here's a first attempt:

General rules

(in line with @khusmann's statement that software is easier to update than data artifacts #858 (comment))

An existing datapackage.json that is valid MUST NOT becoming invalid in the future.
A new datapackage.json MAY be invalid because a software implementation does not support the latest version of the specification (yet).

Because of these rules datapackage.json does not have to indicate what version of Data Package it uses (i.e. it is versionless). Implementations have no direct way of assessing the version (even though this would make it easier #858 (comment) it is not something that we can require from data publishers, imo).

Versioning

The Data Package specification is versioned. This is new over 1.0, where changes were added without increasing the version.
The Data Package specification is versioned as a whole: a number of changes are considered, discussed, added or refused and released as a new minor version.

Property changes

A property MUST NOT change type
A property MAY allow additional type (array) @roll you want to avoid this as a rule, but it does offer flexibility, cf. Make contributor role an array of strings #804 (comment)
A property MUST NOT become required
A property MAY become optional. Example: Make contributors[].title and sources[].title not required datapackage#7
A property MUST NOT add enum
A property MAY remove enum. Example: Allow free text role for the contributors property #809
A property MUST NOT remove enum values
A property MAY add enum values

Table schema changes

A field type MUST NOT change default format. Example: does Updated date/time definitions datapackage#23 align with this?
A field type MUST NOT remove format pattern options
A field type MAY add format pattern options

New properties

A new property MAY make a datapackage.json invalid (because of general rule 2). Example: Added field.missingValues datapackage#24
A new property CANNOT be required

Removed properties

Removing a property CANNOT make a datapackage.json invalid (because of general rule 1)

khughitt · 2024-02-01T16:47:32Z

Thanks for taking the time to put this together, @peterdesmet! This seems like a great idea..

I think it would be useful to use this as a starting point for a description of the revision process in the docs.

I'll create a separate issue so that it can be tracked separately from the issue discussion here.

fomcl · 2024-03-07T15:39:00Z

My 2 cents here:

The profile attribute could refer to this https://raw.githubusercontent.com/frictionlessdata/specs/v2.0/package.json . If profile == tabular-data-package, this would mean this is a v1 datapackage.
During the hang-out somebody mentioned the posibility of "link rot". Perhaps using/adding a DOI would help here: https://docs.github.com/en/repositories/archiving-a-github-repository/referencing-and-citing-content
The _cache property could also be used to cache the jsonschema

roll · 2024-04-26T14:43:21Z

@peterdesmet
Regarding provisional properties, I think we have an even more eloquent solution for example using a special Data Package Draft/Next extension (or a profile per feature) where we can test new features and ideas without actually affecting the core specs itself. Users will just need to use a draft Data Package profile to join testing.

And then if we have a established release cycle we can merge tested features in the core specs based on schedule. Actually using this approach feature development can be even decentrilized

peterdesmet · 2024-04-29T07:06:13Z

@roll sounds promising, would have to see it in action to fully understand. 😄

roll added this to the v2 milestone Jan 3, 2024

roll added the general label Jan 3, 2024

roll removed this from the v2 milestone Jan 3, 2024

khughitt mentioned this issue Feb 1, 2024

Create a dedicated section in the docs describing the spec development process frictionlessdata/frictionlessdata.io#894

Open

peterdesmet mentioned this issue Feb 7, 2024

Discourage usage of unnecessary union types in Table Schema frictionlessdata/datapackage#28

Merged

khusmann mentioned this issue Feb 14, 2024

Promote the Enum Labels and Ordering pattern to the Table Schema spec? #875

Open

peterdesmet mentioned this issue Feb 14, 2024

Add uniqueKeys property to Table Schema frictionlessdata/datapackage#30

Merged

khusmann mentioned this issue Feb 18, 2024

Support for labeled missingness #880

Open

peterdesmet mentioned this issue Feb 19, 2024

Create a reference dataset collection with representative examples of currently supported & planned data types #876

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Versioning: Data Package v2 or v1.1? #858

Versioning: Data Package v2 or v1.1? #858

peterdesmet commented Dec 20, 2023

khusmann commented Dec 21, 2023

nichtich commented Dec 22, 2023

fjuniorr commented Dec 22, 2023 •

edited

roll commented Jan 4, 2024 •

edited

roll commented Jan 5, 2024 •

edited

khusmann commented Jan 10, 2024

peterdesmet commented Jan 10, 2024

roll commented Jan 25, 2024 •

edited

roll commented Jan 25, 2024 •

edited

peterdesmet commented Jan 25, 2024

roll commented Jan 25, 2024 •

edited

peterdesmet commented Jan 29, 2024 •

edited

khughitt commented Feb 1, 2024

fomcl commented Mar 7, 2024 •

edited

roll commented Apr 26, 2024 •

edited

peterdesmet commented Apr 29, 2024 •

edited

Versioning: Data Package v2 or v1.1? #858

Versioning: Data Package v2 or v1.1? #858

Comments

peterdesmet commented Dec 20, 2023

khusmann commented Dec 21, 2023

nichtich commented Dec 22, 2023

fjuniorr commented Dec 22, 2023 • edited

Footnotes

roll commented Jan 4, 2024 • edited

roll commented Jan 5, 2024 • edited

khusmann commented Jan 10, 2024

peterdesmet commented Jan 10, 2024

roll commented Jan 25, 2024 • edited

roll commented Jan 25, 2024 • edited

peterdesmet commented Jan 25, 2024

roll commented Jan 25, 2024 • edited

peterdesmet commented Jan 29, 2024 • edited

General rules

Versioning

Property changes

Table schema changes

New properties

Removed properties

khughitt commented Feb 1, 2024

fomcl commented Mar 7, 2024 • edited

roll commented Apr 26, 2024 • edited

peterdesmet commented Apr 29, 2024 • edited

fjuniorr commented Dec 22, 2023 •

edited

roll commented Jan 4, 2024 •

edited

roll commented Jan 5, 2024 •

edited

roll commented Jan 25, 2024 •

edited

roll commented Jan 25, 2024 •

edited

roll commented Jan 25, 2024 •

edited

peterdesmet commented Jan 29, 2024 •

edited

fomcl commented Mar 7, 2024 •

edited

roll commented Apr 26, 2024 •

edited

peterdesmet commented Apr 29, 2024 •

edited