[WIP] Add an on-disk format for datasets using Frictionless and feather #38
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A challenge of the data catalog is to identify the metadata we need to get data into Grapher, and to make sure we have a format that can capture it all. This PR makes a first attempt at an on-disk format for datasets.
Frictionless + feather
Note: this is a prototype format meant as a starting point. We expect to be able to change this on-disk format arbitrarily in future.
dir/mytable1.feather
,dir/mytable2.feather
, ...)datapackage.json
datapackage.json
conforms to the Frictionless data standard, letting us use their tools for validationRich data frames and series
Dataset
protocol andAboutThisDataset
class for metadataRichDataFrame
class andAboutThisTable
class for metadataRichDataSeries
class andAboutThisSeries
class for metadataobj.metadata
Todo
After this, the plan is to get review, merge this, then try to import the WHO GHO dataset into this format.