New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README.md included in a data package not added as a resource in CKAN #60

Open
Stephen-Gates opened this Issue Dec 6, 2017 · 6 comments

Comments

Projects
None yet
4 participants
@Stephen-Gates

Stephen-Gates commented Dec 6, 2017

A data package can include a README.md. When I upload a datapackage.zip file I expect the README.md inside to be added as a resource in CKAN.

Currently the README.md is not uploaded so provenance information is lost.

@amercader

This comment has been minimized.

Member

amercader commented Feb 8, 2018

That sounds like a sensible convention. The original implementation only looked at what was defined in the spec so this was skipped.

Note that READMEs will get currently imported if they are listed as resources in the descriptors just like any other file.

Implementation

This will be quite a lot trickier than it would appear. We are basically passing the uploaded zipfile to the datapackage library, that handles all the extracting etc. We get a descriptor object back that does not include the README file (or any other file included in the zip). So anything we implement will need to be on top of that.

I don't want to re-extract the files just to check if there is a README (and we probably can't), so the only thing I can think of is guessing the temp folder where datapackage extracted the zip file here (using a resource path and the name of zip file), and check the contents to see if there is a README file. If it is, create an additional resource with _create_and_upload_local_resource().

We probably want to check that there isn't an actual resource with the README to avoid duplication.

Estimate

1 day

@amercader amercader added MVP v1.0 and removed MVP v1.0 labels Feb 8, 2018

@amercader amercader added this to the MVP v1 milestone Feb 8, 2018

@GeraldGrootRoessink

This comment has been minimized.

GeraldGrootRoessink commented Feb 10, 2018

To anticipate the importing of the README as a CKAN-resource I am looking for a markdown_view. Has any work been done of that?

@Stephen-Gates

This comment has been minimized.

Stephen-Gates commented Feb 10, 2018

We use https://github.com/markdown-it/markdown-it in Data Curator to allow the input and preview of markdown.

Links to the CommonMark reference and community implementations at at https://github.com/commonmark/CommonMark

@GeraldGrootRoessink

This comment has been minimized.

GeraldGrootRoessink commented Feb 11, 2018

Thanks, I will look into it.
First I would like the imported readme.md to be readable in CKAN. Maybe it is doable to create a fourth text_view for resources with format=md using the helper-function: h.render_markdown?
Is that a good idea?

@mattRedBox

This comment has been minimized.

mattRedBox commented Jul 2, 2018

thanks @amercader
Yep not a pretty solution, but I can't find another way of doing this neatly either - I think perhaps adding README as resource is not the way to go as my understanding is (and happy to be corrected here) that a README isn't considered part of the resources for a datapackage as it doesn't hold data - and can't be validated and inferred like other resources - , and working out the temp folder (from existing resources) seems the only way for now.
Although datapackage has 'base_path' property, which might make this simpler, but it is deprecated (but maybe it is just default_base_path that is meant to be deprecated?). Asked @roll about this.

@mattRedBox

This comment has been minimized.

mattRedBox commented Jul 19, 2018

Hi @amercader
In the end I've disassociated the sense of resources (just in my head) between frictionless datapackages and ckan.
Have a PR for this with uploading zip, where README.md is a local resource (as you have advised, and then download ensures it doesn't exist in associated package.json). Just waiting for approval on our side.
Also had a question, just in regards to issue of handling downloads:

For the upcoming PR I have considered that as the datapackage.json would not reference README.md, containing that README.md in the download zip accounts for the README.md file. But as datapackage.json was being created as file (as would never/rarely be large) and then downloaded, so too for README.md (the case would usually/always be README.md is only a small file by comparison), so it would be OK to add this to, a simple zip, which would have a set list of filenames - that's the only way I can consider restricting this atm, unless it should also have some explicit size check (apart from python's zip library constraint for size) (atm just README.md) that could be downloaded in this way (ie: without using queue or considering the delay in creating the zip and downloading. The intention is that the 'simple' zip would never contain potentially very large data resources such as CSV files in the zip.
That way we can complete this functionality for including a README.md, without tackling the much larger requirement (which I don't think I can find room for), in issue 52
What are your thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment