-
Notifications
You must be signed in to change notification settings - Fork 1.2k
summon: first draft with python specific implementation #2971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@MrOutis enable the pre-commit to avoid restyled triggered? |
|
@shcheklein , yes, already did :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
dvc/api.py
Outdated
| def summon(name, repo=None, rev=None): | ||
| # 1. Read grimoire.yaml | ||
| # 2. Pull dependencies (TODO) | ||
| # 3. Get the call and parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we suppose to "deserialize" not-python objects when the yml file contains type: CSV for examlpe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is important from an external library developer's point of view - when I develop a library to deserialize CSV or Parquet for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmpetrov , decided not to implement type: csv in this first iteration because it was ambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. But please make sure it won't affect design decisions - see my comment about yml file format and file/method/params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmpetrov dvc.api.summon() is not supposed to deserialize type: python, we agreed to move this functionality into a separate library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor right. But what the library supposes to call from DVC? I expected to have some generalized approach of getting objects. So, we can create the library and custom types easily on top of this DVC call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor , did you mean type: csv ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a question about default name,
Is actually name, what we want to use by default?
Shouldn't it be {something}.dvcsummon? In case of no fname provided we could be gathering all the dvcsummon's and try to find proper object among them by name
@pared, that's interesting. Not sure what we are planning to support. @dmpetrov , @shcheklein , are we supporting this case? ☝️ |
dvc/api.py
Outdated
| old_dir = os.path.abspath(os.curdir) | ||
| os.chdir(path) | ||
| sys.path.insert(0, path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to do both probably. Also, use try/finally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is needed, @Suor, or at least I couldn't rely solely on os.chdir.
Looks like importlib.import_module doesn't care of the current directory (well, sys.path is not being updated when running os.chdir)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, do you mean try/finally instead of the context manager, or inside the context manager?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understood, you can only use one
summon.yamlat the time, so there's no need to search through all summon files as we do with dvcfiles .
We discussed this above - I suggested to change the method signature.
Yes, we'd like to avoid the summon file search. At the same time, we cannot use a predefined file in a repo root dir - it blocks the entire scenario for mono-repo teams. So, summon.yaml is default file that can be changed by the user when user calls the methiod.
dvc/api.py
Outdated
| def summon(name, repo=None, rev=None): | ||
| # 1. Read grimoire.yaml | ||
| # 2. Pull dependencies (TODO) | ||
| # 3. Get the call and parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. But please make sure it won't affect design decisions - see my comment about yml file format and file/method/params.
dvc/api.py
Outdated
| def summon(name, repo=None, rev=None): | ||
| # 1. Read grimoire.yaml | ||
| # 2. Pull dependencies (TODO) | ||
| # 3. Get the call and parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor right. But what the library supposes to call from DVC? I expected to have some generalized approach of getting objects. So, we can create the library and custom types easily on top of this DVC call.
|
@MrOutis I want to clarify a piece of requirements that might be not clear from the description. We need to support custome types such as CSV. I want to emphasize, that we should have all the functions in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a SummonError exception class, use it for all summon errors with appropriate messages. Add a test or a couple, which will catch those.
dvc/api.py
Outdated
| # TODO: Write a meaningful docstring about `summon` | ||
| with _make_repo(repo, rev=rev) as _repo: | ||
|
|
||
| def pull_dependencies(deps): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Might be moved outside.
- This is actually
repo.pull(deps)after granularity patch is in, see dvc: support granularity for fetch/pull/push/status/checkout #3002
dvc/api.py
Outdated
| try: | ||
| objects = ruamel.yaml.safe_load(fobj.read())["objects"] | ||
| objects = schema(objects) | ||
| return next(x for x in objects if x["name"] == name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, need to check there is only one object named name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to raise an error? Isn't it just better to pick the first one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we need to raise an error, otherwise it might be a nasty surprise for a user.
|
|
||
| try: | ||
| os.chdir(path) | ||
| sys.path.insert(0, path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, don't need sys.path.insert() if we already chdired into in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor , left a commend right here: #2971 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only it works as long as you have '.' or '' in sys.path, which is the default. Without that lots of python code won't work. Are you sure imports don't work for you without that? Can you test outside dvc on a simple script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also might be the reason why random tests failed.
dvc/api.py
Outdated
| yield repo | ||
|
|
||
|
|
||
| def summon(name, fname="dvcsummon.yaml", args=None, repo=None, rev=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def summon(name, fname="dvcsummon.yaml", args=None, repo=None, rev=None): | |
| def summon(name, repo=None, rev=None, summon_file="dvcsummon.yaml", args=None): |
| # XXX: Some issues with this approach: | ||
| # * Not thread safe | ||
| # * Import will pollute sys.modules | ||
| # * Weird errors if there is a name clash within sys.modules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no good non complicated way to handle this in Python 2. I will come up with an approach next sprint. Leave as is for now.
* Mark optional keywords * Allow extra entries * Include objects into the schema
dvc/api.py
Outdated
| ] | ||
| }, | ||
| required=True, | ||
| extra=ALLOW_EXTRA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not allow extra everywhere. May not allow it anywhere for now for better future-compatibility. Say only allow objects at the top, and only allow name, meta and summon in the object, all the user data should go under meta, like description, paper, author, etc.
| pull_dependencies(info.get("deps")) | ||
|
|
||
| _args = copy.deepcopy(info.get("args", {})) | ||
| _args.update(args or {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either remove or {} here or a default value to .get() on prev line.
dvc/api.py
Outdated
| # XXX: Prints ugly absolute path. | ||
| raise SummonError("No such YAML file with path: '{}'".format(path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should contain whatever user passed, e.g. file ... not found in repo ...
dvc/api.py
Outdated
| # XXX: Prints ugly absolute path. | ||
| raise SummonError("No such YAML file with path: '{}'".format(path)) | ||
| except ruamel.yaml.YAMLError as exc: | ||
| raise SummonError("Failed to parse YAML correctly", exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to say which file in which repo and a line/message of an error if available.
dvc/api.py
Outdated
| raise SummonError("Failed to parse YAML correctly", exc) | ||
| except Invalid as exc: | ||
| # XXX: Doesn't point out the error. | ||
| raise SummonError("YAML file dosen't match with the schema", exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not just yaml file, this is a "summon file" is broken.
dvc/api.py
Outdated
| # XXX: Doesn't point out the error. | ||
| raise SummonError("YAML file dosen't match with the schema", exc) | ||
| except StopIteration: | ||
| raise SummonError("No such object with name '{}'".format(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, need to add summon_file and repo. We might actually add it outside where this info is available. This will fix all of the above and wouldn't need those two be passed here.
|
|
||
| try: | ||
| os.chdir(path) | ||
| sys.path.insert(0, path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only it works as long as you have '.' or '' in sys.path, which is the default. Without that lots of python code won't work. Are you sure imports don't work for you without that? Can you test outside dvc on a simple script?
|
It looks like something is wrong with Python 2.7. Besides that looks good) |
|
@MrOutis it looks like you had a great time on Christmas eve (I'm not sure which emoji to put here) |
|
Are all the remaining fails unrelated? |
No, @Suor , looks like I've introduced some bugs 🤔 |
Looks like |
TODO:
summondvcsummon.yamlandobjectsfor nowartifacts.yaml→dvcsummon.yamlartifact→objectdvcsummon.yaml→formulas.yamlobject→data_objectsummonkeyworddvcsummon.yaml(Error handling)voluptuoustype: python, show error otherwhiseCollect all the cache first and then(Not, that important, let's skip it)pullparams→argsimport_stringinstead ofimportlibsys.pathinstead of current directory withos.chdirsummon()Things omitted on this PR:
python.