-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/plugins and storage #51
Conversation
|
||
Note: `validate()` validates whether a **schema** is a validate JSON Table Schema. It does **not** validate data against a schema. | ||
|
||
### Export/import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should include this as part of default package. We should refer to the plugins, but not include them by default. We can still have this section in README, but make it more clear as a Plugin section and explain the downloads required for this to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought you both with @vitorbaptista was leaning more for out-of-box solution (next step - #52).
Personally I also more prefer distinction between core
and plugins
where user need to install plugin by himself and this description lives in plugins section and export/import
section having just a general info about it.
We should sync on it with datapackage-py
. So @vitorbaptista's opinion is required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@roll no: definitely stuff like this as plugins, meaning, install them if you need them. The out-of-the-box thing was more about the registry
and validate
stuff, where I finally agreed that @vitorbaptista was right that they are too core to be external modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm +1 with it - I'll update a PR.
Also than datapackage-py
should follow this logic too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should strive for making the "most common path" as easy as possible. For example, I think importing and exporting to a zip file is pretty common with datapackage-py
, so it's on core, but I wouldn't add (say) BigQuery.
Here maybe SQLAlchemy is common enough to be on core, but I don't know. I'm happy with whatever you guys decide.
@roll this is good for me. I also want @vitorbaptista to look at it as we've all been in this conversation since beginning. After his comments we can merge. One note is that I think the original model class I wrote was a bit all over the place, and thinking in wider terms, we should probably break this up into two things soon (ref. here):
This pushes most of the This would pretty much finalise the API of this package as far as I see. |
I need some time to read other thing you've add. But want to write right now it's awesome you've mentioned |
@roll yes. great. Actually, @vitorbaptista probably has most of the required |
I've updated readme in PR to make distinction between core and plugins |
@roll ok great. We'll wait for @vitorbaptista to look early in the week, and then merge. |
Great! For Some notes:
|
with io.open(filepath) as stream: | ||
schema = json.load(stream) | ||
|
||
is_valid = jsontableschema.validate(schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current API jsontableschema.validate
is only used to catch exceptions. What I mean is that is_valid
will never be False
. This method should always be inside a try/catch
block, so I suggest changing this example to something like:
try:
jsontableschema.validate(schema)
except jsontableschema.exceptions.SchemaValidationError as e:
# handle errors
For this reason, the return value of jsontableschema.validate
doesn't matter. I wouldn't return anything to make this clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a part of PR's piece of work but I think it's OK to fix it here - I've updated PR.
I always mix importing and exporting resources. Importing a resource is going through a JSON Table Schema file to (say) a SQL table, and exporting is going from a SQL table to a JTS file, right? Reading the code, I've seen that both while importing and exporting there're files being created. I couldn't figure out why. I'm also concerned with the tests. This PR is mostly about creating methods that delegate to other libraries, so it's OK to assume that these other libraries work and just testing that we're calling them correctly. I'd prefer to have fewer mocks, testing at least that the CSV files were written correctly, but it's OK. There're some naming issues, with the test class named try:
import mock
except ImportError:
import unittest.mock as mock To support both Python versions. There're a few tests missing as well, specially for the Overall, the changes look good. I would just like to improve the tests a bit before merging. |
@roll good points from @vitorbaptista as always. Would be great to priortize this when you return from holidays so we can merge. |
Fixed test name - thanks. About And about |
The most important name - naming =) When user has Here I really don't now how it would be better. But I know we should be consistent accross all our codebase. So what do you (@vitorbaptista @pwalsh) think about import/export naming for:
Because if we have for example May be better way will be to do not use import/export words and use something else. |
Agreed that moving from About naming, as the code is related to a jsontableschema's resource, even though it isn't OOP, the "export" and "import" should be from the point of view of a resource. Exporting makes more sense for me, as we're exporting a resource to another place (DB, ...), but importing is a bit tricky because we're not really importing a resource from someplace (DB) to someplace else (filesystem), but actually converting something (DB) to a resource. Thinking in OOP terms, if I saw In the end, Confusing... |
In our final design (with Other way - release with better naming and mark as unstable part of API. But I don't see a much better naming for now. |
@roll @vitorbaptista how about |
Or it's much more interesting. Across our codebase it could be:
|
I'm happy for this to be merged and work on the naming afterwards. Maybe it'll become clearer when the code is changed to a |
Big downside of But in our situation may be we don't need
and
If we have classes like |
I thought a little bit more about it and I have to agree with Victor - without For example use case - big big resource stored in BigQuery - probably we want to provide interface to work with it without saving to filesystem or memory( So I see two options for now:
|
@roll I'm happy to go with your last points. I'd rather we design |
…data/jsontableschema-py into feature/plugins-and-storage
Changes Unknown when pulling bd21ab4 on feature/plugins-and-storage into * on master*. |
Changes: