Skip to content
This repository has been archived by the owner on May 20, 2019. It is now read-only.

install (download/import) command #3

Closed
rufuspollock opened this issue Nov 12, 2013 · 12 comments
Closed

install (download/import) command #3

rufuspollock opened this issue Nov 12, 2013 · 12 comments
Assignees

Comments

@rufuspollock
Copy link
Contributor

"install" (download/import) a data package onto disk (note integration in other apps is outside of scope here).

Motivating user story: I'm building an app, doing some analysis etc and I want to get a bunch of data from data packages into my project

dpm install {url}
dpm install {git-url / github url}
dpm install {pkg-name}           # if we have a registry

Questions

  • into my project could mean: a) data files on disk at a standard location b) into a database c) into my tool. Here we focus only on (a)
  • Do I want entire data package or just the data resources (?) - let's get it all
  • What about ones that are in git - should we clone? (Ans: yes???)
  • Where do we install to? Ans: datapackages/{datapackage-name}/...
  • How do we install from a URL? Answer: get the datapackage.json and then download the resources?
  • What happens if the dp.json has urls and no path? Download to local and set path to local location
@ghost ghost assigned sballesteros Dec 23, 2013
@rufuspollock
Copy link
Contributor Author

@sballesteros merged notes from chat other day here

@rufuspollock rufuspollock mentioned this issue Dec 28, 2013
11 tasks
@sballesteros
Copy link
Contributor

update here: cat get clone and install are documented (draft) and working for the registry. doc/*.md explains the difference between those commands (a good reading order is 0 clone, 1 cat, 2 get, 3 install).

@rufuspollock
Copy link
Contributor Author

@sballesteros is install from url working? I'd love to use that!

@sballesteros
Copy link
Contributor

@rgrp I am still trying to figure out how to do clone knowing only the content of datapackage.json (i.e how to know all the files associated with a data package (scripts...) but not listed as resources).
If you look at https://github.com/component/component/wiki/Spec you have

Component developers MUST explicitly state the relevant file(s) via scripts, styles and others.

Maybe we should force that (listing all the files) in the spec. There is the files property of package.json (for npm) but it allows directories.

Of course all of this is trivial if we only support github URLs but I would like a more generic solution here.

EDIT: to be clear, here I am talking about files not listed in datapackage.resources

@max-mapper
Copy link

IMO the default should be to clone entire paths/repos/folder unless there is a specific manifest defined

@max-mapper
Copy link

so if you have a really bare datapackage.json:

{"name": "mydatapackage", "version": "0.0.1"}

and it is served from data.com/mydatapackage

e.g. it doesn't define any resources, and also has no json-ld context, how about we look for these default files, in order, until we find one:

archives:

data.com/mydatapackage/data.tar.gz
data.com/mydatapackage/data.zip
data.com/mydatapackage/data.csv
data.com/mydatapackage/data.json

(more common/sensible defaults welcome)

if the datapackage.json defines a baseURI property (note: not sure if this is in the datapackage spec or not, i'm open to other names e.g. url, external or something) then it would look for the above list using baseURI as the base URI

@rufuspollock
Copy link
Contributor Author

@sballesteros I think we're putting too much obligations on dpm. We could be assuming that whomever publishes takes care of pushing scripts or they use e.g. git (so you git clone then get data urls if they aren't in the git repo). I think we want to do the simplest thing we can at the start that gets us working and if we miss out scripts we live with it (note: couldn't we reuse the scripts tag from commonjs for some of this?)

@sballesteros
Copy link
Contributor

@rgrp I agree. Currently refactoring a lot of stuff.

@rufuspollock
Copy link
Contributor Author

@maxogden I wonder if allowing for default behaviour like this is making a rod for our backs down the line. Why not oblige data package creators to list resources if they exist ... (its not that hard to add one entry ...)

@max-mapper
Copy link

@rgrp what negatives do you see? one is that if publishers arent required to provide a file manifest that they could publish a package that accidentally doesn't include any data. This is easy to fix though -- we have a pre-publish step that throws if no default data file can be found.

@rufuspollock
Copy link
Contributor Author

@maxogden the downsides I see (in addition to the one you mention) are:

  • forward compatibility - ie. we have to go one supporting these defaults long into the future
  • generally I think explicitness for users here is good
  • plus if someone is adding e.g. a csv resource it would be good to have this in the resources attribute so we get the schema stuff etc ...

That said I do see plus here of making it even easier and faster to publish - a major concern at this early stage :-)

So I'm probably a -0.5 at this point but do see the benefits ...

@rufuspollock
Copy link
Contributor Author

Here's a gif showing off the new feature:

dpm-install

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants