New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Coopy Highlighter Diff format with column type changes #164
Comments
Thanks @edwindj. There's also work on refining the types in json table scheme in #159. What do you think about leaving types in a separate optional row, like:
I'm thinking that the spec could leave space for meta data associated with columns via a series of The advantage of the separate rows is that the cells can behave exactly as in ordinary rows and be parsed in just the same way. |
Also, I understand from edwindj/daff#6 that you like having a single file for expressing diffs, and that may be the way to go. But just as the Tabular Data Package spec proposes data in csv and schema in json, there may be something to be said for expressing schema differences in a hierachical format like json rather than trying to flatten types out. |
@paulfitz I like your the syntax for extra lines that may be ignored by consumers. Regarding type changes in one file or two: should we follow the diff paradigm of storing all changes in one text or should we follow the json table schema paradigm of describing meta data (changes) in a json file? The last option would force all users to use json table schema which I find too strict. May be we should support both with a preference for json table schema. When a schema is available it should be used, otherwise a less expressive form can be used with the Note that a solution in the spirit of datapackage probably would not calculate a diff, but just reference two resources: table remote and table local. |
I agreed it would make sense to stick the new syntax in. I could take a shot also at adding support for it in This feature should make diffs more useful within an environment with a single kind of data source, even if it wouldn't be very useful for interchange between different kinds of data sources. |
Great! I will follow your changes and implement them in daff for R. |
@paulfitz shoudl this remain open - are their pending changes? Otherwise let's close with summary. |
@rgrp can we keep it open a while longer? I've been plugging away on this, close to maturing. |
Tables with meta-data that can be expressed in tabular form can now have changes in that meta-data included in diffs and applied in patches, following ideas in: frictionlessdata/specs#164 Example implementation for Sqlite tables to follow soon (I hope).
@paulfitz fantastic! |
@paulfitz Great! |
I implemented a version of this some time back, and then got distracted working on a demo for it with sqlite. Suppose we have a
And we modify the type of a column, add another column, and add a row:
Then daff would report this diff: To use this in R, you'd need to implement some code that reports the properties of each column that you care about. That is sufficient for diffing. For patching, you'd need to be able to accept a description of the changes in a particular format and make them happen. I'll need to document this better if you're still interested in pursuing this @edwindj. |
@paulfitz, I'm still interested :-), documentation helps, but I will update my R code so this example works. Won't be until end of this week. |
This issue was moved to okfn/specs#3 |
A useful addition to the coopy highlighter diff format would be column type changes.
For example:
and
The Coopy Diff is:
A typed version of the format could be:
In which the schema row can contain a column type change. IMHO type information is not obligatory, but should be interpreted by an implementation as a type suggestion, since types differs across programming languages. The types of json table schema seems like a good candidate for denoting common types.
The text was updated successfully, but these errors were encountered: