Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Dec 18, 2019

TODO:

  • Support overriding the default parameters through summon
  • Find a good name for the YAML and its contents (no magic references 🙁)
    • Staying with dvcsummon.yaml and objects for now
  • artifacts.yamldvcsummon.yaml
  • artifactobject
  • dvcsummon.yamlformulas.yaml
  • objectdata_object
  • Isolate summon parameters under the summon keyword
  • Handle unsafe situations when reading dvcsummon.yaml (Error handling)
    • Check schema with voluptuous
    • Define optional and required keys
    • Add single exception for Invalid schema
    • Add a key for type: python, show error otherwhise
  • Collect all the cache first and then pull (Not, that important, let's skip it)
  • Change paramsargs
  • Use import_string instead of importlib
  • Change sys.path instead of current directory with os.chdir
  • Add parameter for custom YAML file name in summon()
  • Write beautiful code 💅

Things omitted on this PR:

  • Support for different types other than python.
  • Checking requirements.

@shcheklein
Copy link
Member

@MrOutis enable the pre-commit to avoid restyled triggered?

@ghost
Copy link
Author

ghost commented Dec 18, 2019

@shcheklein , yes, already did :)

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

dvc/api.py Outdated
def summon(name, repo=None, rev=None):
# 1. Read grimoire.yaml
# 2. Pull dependencies (TODO)
# 3. Get the call and parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we suppose to "deserialize" not-python objects when the yml file contains type: CSV for examlpe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important from an external library developer's point of view - when I develop a library to deserialize CSV or Parquet for example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmpetrov , decided not to implement type: csv in this first iteration because it was ambiguous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. But please make sure it won't affect design decisions - see my comment about yml file format and file/method/params.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmpetrov dvc.api.summon() is not supposed to deserialize type: python, we agreed to move this functionality into a separate library.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor right. But what the library supposes to call from DVC? I expected to have some generalized approach of getting objects. So, we can create the library and custom types easily on top of this DVC call.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor , did you mean type: csv ?

@ghost ghost changed the title [wip] summon: ugly hack [wip] summon: first draft with python specific implementation Dec 18, 2019
@ghost ghost requested review from Suor, efiop and pared December 18, 2019 23:27
Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a question about default name,
Is actually name, what we want to use by default?
Shouldn't it be {something}.dvcsummon? In case of no fname provided we could be gathering all the dvcsummon's and try to find proper object among them by name

@ghost
Copy link
Author

ghost commented Dec 19, 2019

I've got a question about default name,
Is actually name, what we want to use by default?
Shouldn't it be {something}.dvcsummon? In case of no fname provided we could be gathering all the dvcsummon's and try to find proper object among them by name

@pared, that's interesting. Not sure what we are planning to support.
As far as I understood, you can only use one summon.yaml at the time, so there's no need to search through all summon files as we do with dvcfiles .

@dmpetrov , @shcheklein , are we supporting this case? ☝️

dvc/api.py Outdated
Comment on lines 121 to 123
old_dir = os.path.abspath(os.curdir)
os.chdir(path)
sys.path.insert(0, path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to do both probably. Also, use try/finally

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is needed, @Suor, or at least I couldn't rely solely on os.chdir.
Looks like importlib.import_module doesn't care of the current directory (well, sys.path is not being updated when running os.chdir)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, do you mean try/finally instead of the context manager, or inside the context manager?

Copy link
Member

@dmpetrov dmpetrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understood, you can only use one summon.yaml at the time, so there's no need to search through all summon files as we do with dvcfiles .

We discussed this above - I suggested to change the method signature.

Yes, we'd like to avoid the summon file search. At the same time, we cannot use a predefined file in a repo root dir - it blocks the entire scenario for mono-repo teams. So, summon.yaml is default file that can be changed by the user when user calls the methiod.

dvc/api.py Outdated
def summon(name, repo=None, rev=None):
# 1. Read grimoire.yaml
# 2. Pull dependencies (TODO)
# 3. Get the call and parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. But please make sure it won't affect design decisions - see my comment about yml file format and file/method/params.

dvc/api.py Outdated
def summon(name, repo=None, rev=None):
# 1. Read grimoire.yaml
# 2. Pull dependencies (TODO)
# 3. Get the call and parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor right. But what the library supposes to call from DVC? I expected to have some generalized approach of getting objects. So, we can create the library and custom types easily on top of this DVC call.

@dmpetrov
Copy link
Member

@MrOutis I want to clarify a piece of requirements that might be not clear from the description. We need to support custome types such as CSV. I want to emphasize, that we should have all the functions in dvc.api to implement a custom type. So, dvc.api.summon should be general enough (we can have a separate general dvc.api.summon_ex and python specific dvc.api.summon) OR we should have a set of methods.

@ghost ghost requested a review from Suor December 24, 2019 08:18
Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a SummonError exception class, use it for all summon errors with appropriate messages. Add a test or a couple, which will catch those.

dvc/api.py Outdated
# TODO: Write a meaningful docstring about `summon`
with _make_repo(repo, rev=rev) as _repo:

def pull_dependencies(deps):
Copy link
Contributor

@Suor Suor Dec 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Might be moved outside.
  2. This is actually repo.pull(deps) after granularity patch is in, see dvc: support granularity for fetch/pull/push/status/checkout #3002

dvc/api.py Outdated
try:
objects = ruamel.yaml.safe_load(fobj.read())["objects"]
objects = schema(objects)
return next(x for x in objects if x["name"] == name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, need to check there is only one object named name.

Copy link
Author

@ghost ghost Dec 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to raise an error? Isn't it just better to pick the first one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we need to raise an error, otherwise it might be a nasty surprise for a user.


try:
os.chdir(path)
sys.path.insert(0, path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, don't need sys.path.insert() if we already chdired into in.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor , left a commend right here: #2971 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only it works as long as you have '.' or '' in sys.path, which is the default. Without that lots of python code won't work. Are you sure imports don't work for you without that? Can you test outside dvc on a simple script?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also might be the reason why random tests failed.

dvc/api.py Outdated
yield repo


def summon(name, fname="dvcsummon.yaml", args=None, repo=None, rev=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def summon(name, fname="dvcsummon.yaml", args=None, repo=None, rev=None):
def summon(name, repo=None, rev=None, summon_file="dvcsummon.yaml", args=None):

Comment on lines +147 to +150
# XXX: Some issues with this approach:
# * Not thread safe
# * Import will pollute sys.modules
# * Weird errors if there is a name clash within sys.modules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no good non complicated way to handle this in Python 2. I will come up with an approach next sprint. Leave as is for now.

dvc/api.py Outdated
]
},
required=True,
extra=ALLOW_EXTRA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not allow extra everywhere. May not allow it anywhere for now for better future-compatibility. Say only allow objects at the top, and only allow name, meta and summon in the object, all the user data should go under meta, like description, paper, author, etc.

pull_dependencies(info.get("deps"))

_args = copy.deepcopy(info.get("args", {}))
_args.update(args or {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either remove or {} here or a default value to .get() on prev line.

dvc/api.py Outdated
Comment on lines 138 to 139
# XXX: Prints ugly absolute path.
raise SummonError("No such YAML file with path: '{}'".format(path))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should contain whatever user passed, e.g. file ... not found in repo ...

dvc/api.py Outdated
# XXX: Prints ugly absolute path.
raise SummonError("No such YAML file with path: '{}'".format(path))
except ruamel.yaml.YAMLError as exc:
raise SummonError("Failed to parse YAML correctly", exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to say which file in which repo and a line/message of an error if available.

dvc/api.py Outdated
raise SummonError("Failed to parse YAML correctly", exc)
except Invalid as exc:
# XXX: Doesn't point out the error.
raise SummonError("YAML file dosen't match with the schema", exc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not just yaml file, this is a "summon file" is broken.

dvc/api.py Outdated
# XXX: Doesn't point out the error.
raise SummonError("YAML file dosen't match with the schema", exc)
except StopIteration:
raise SummonError("No such object with name '{}'".format(name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, need to add summon_file and repo. We might actually add it outside where this info is available. This will fix all of the above and wouldn't need those two be passed here.


try:
os.chdir(path)
sys.path.insert(0, path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only it works as long as you have '.' or '' in sys.path, which is the default. Without that lots of python code won't work. Are you sure imports don't work for you without that? Can you test outside dvc on a simple script?

@Suor
Copy link
Contributor

Suor commented Dec 25, 2019

It looks like something is wrong with Python 2.7. Besides that looks good)

@dmpetrov
Copy link
Member

dmpetrov commented Dec 25, 2019

@MrOutis it looks like you had a great time on Christmas eve (I'm not sure which emoji to put here)
🎄👨‍💻?

@Suor
Copy link
Contributor

Suor commented Dec 26, 2019

Are all the remaining fails unrelated?

@ghost
Copy link
Author

ghost commented Dec 27, 2019

Are all the remaining fails unrelated?

No, @Suor , looks like I've introduced some bugs 🤔
Tests are running correctly on master but my branch fails to run the test suite without errors

@ghost
Copy link
Author

ghost commented Dec 27, 2019

No, @Suor , looks like I've introduced some bugs thinking
Tests are running correctly on master but my branch fails to run the test suite without errors

Looks like sys.path.pop wasn't on the correct index

@ghost ghost changed the title [wip] summon: first draft with python specific implementation summon: first draft with python specific implementation Dec 27, 2019
@efiop efiop merged commit 1038cfc into iterative:master Dec 27, 2019
@ghost ghost deleted the summon branch December 27, 2019 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants