Skip to content

Conversation

@gurobokum
Copy link
Contributor

@gurobokum gurobokum commented Jan 14, 2020

Close #2920

Additional details

I would like to suggest to define and use such terms for avoiding ambiguity

  • config - global repo config
  • settings - dict of config section
    Hence refactor RemoteBase from
    def __init__(self, repo, config):  ->  def __init__(self, repo, settings):

with all changes as config -> settings

Then get some setting as:

  1. Check first into component settings
  2. Check in config if requires
  3. Return default

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • 📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JIoJIaJIu , sorry for the confusion and extra work, but we are in the middle of a migration from unittest to pytest.
Could you rewrite your tests to pytest? Happy to help with further questions about this)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rewrite your tests to pytest?

Have I got right your request?
gurobokum@42c34d7

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JIoJIaJIu, you did) Thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to have both, core and remote checksum_jobs? Why not going remote-only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented fallback logic as was described in the comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JIoJIaJIu Got it, thanks! Actually, thinking about it, if checksum_jobs didn't work anyway, we could consider just getting rid of it, as it doesn't make much sense in many cases. E.g. if core.checksum_jobs is set for 200, then it makes sense for local remote or s3, but will break everything for ssh remote as we will run out of connections. But that will break the backward compatibility, so not worth bothering with right now. So your current solution is correct, let's stick with it for now 👍

Copy link

@ghost ghost Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efiop , so are we staying with both core and remote for backward compatibility issues?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrOutis Yep, at least for now.

@ghost
Copy link

ghost commented Jan 14, 2020

Thanks for the PR, @JIoJIaJIu, left some comments :)

Is your suggestion about renaming dvc.config.config to dvc.config.settings?

@gurobokum
Copy link
Contributor Author

gurobokum commented Jan 14, 2020

Thanks for the PR, @JIoJIaJIu, left some comments :)

Is your suggestion about renaming dvc.config.config to dvc.config.settings?

Hello @MrOutis
No, I meant the part when some component is initialized with partial of the dvc.config
Here for example it has names as a settings -

return _get(settings)(repo, settings)

But further in RemoteBASE it has naming as config that introduces ambiguity -
def __init__(self, repo, config):

I suggest just to define such terms as

# whole config
...
[core]  # specific section - core
# everything under section are settings
checksum_jobs = 100

['remote "aws"'] # specific section - remote aws
# everything under section are settings
url = s3://bucket/name 
...

And change

def __init__(self, repo, config):
to

def __init__(self, repo, settings):

So it means that component (Remote) can be initialized with settings which have priority1 and use config if needs as fallback rather then use 2 configs where the one is a child of the second
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to pass repo, you have self.repo set already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why I decided to pass repo as argument for avoiding initialization dependency - otherwise needs to care that self.repo is initialized above. But I'm not sure that it's a good way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not sure I follow. Could you elaborate/rephrase, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that needs first init self.repo like

def __init__(...):
   self.repo = ...
   self.checksum_jobs = ...

introduce complexity that self.checksum_jobs should be initialized after self.repo, so in terms of refactoring the order can change and it breaks the constructor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, but it will break in an obvious way with an attribute error, as self.repo won't be set. I would understand your logic if you would've made _get_checksum_jobs a static method or a helper function, but you've made it a full-blown method so using self.repo there would be totally fine. But current implementation works too. I guess another way to make it obvious would be to use _get_checksum_jobs(repo.config if repo else {}, config) 🙂But, again, these are non-vital details and neat-picking, so your call 🙂 Current implementation for this is mergeable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify what AttributeError is for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are places in tests that pass None or custom Repo() to RemoteBASE so such way I am checking that case.
https://github.com/JIoJIaJIu/dvc/blob/42c34d7269c24187a09bbebe0469647862a75fdd/tests/unit/remote/test_gdrive.py#L27

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not if self.repo then? Attribute error might be too generic, as it will catch something that might happen deeper. Your first implementation had both if self.repo as well as except AttributeError, right? Let's keep only if self.repo then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not if self.repo then

Cause here there is a repo but not config, https://github.com/JIoJIaJIu/dvc/blob/42c34d7269c24187a09bbebe0469647862a75fdd/tests/unit/remote/test_gdrive.py#L27
I didn't dare to touch the test
so I need to check self.repo and self.repo can be without config. I can change it to

self.repo.get('config', {})

Should I?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That specific test will still have repo.config though, because of https://github.com/iterative/dvc/blob/master/dvc/repo/__init__.py#L82 . Maybe some other test is not mocking it properly, then it should be fixed, unless it is unreasonably hard, of course 🙂

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JIoJIaJIu , I thought I had left a comment about AttributeError, but I couldn't find it.
Do you remember if I posted it? Am I going cuckoo 🙈 ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For avoiding checking config attribute existence I extended Repo class in GDrive tests
gurobokum@aec7781#diff-b780ac32bee8290565619b80d7051c85R18
Could you please double check that it doesn't break anything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrOutis I've resolved it marking that the changes were applied. Is it ok or better to keep all threads open?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JIoJIaJIu, it's fine to resolve them, I do the same) Just couldn't find it this time 😅

@efiop
Copy link
Contributor

efiop commented Jan 14, 2020

@JIoJIaJIu Looks good!

Good point about settings! Could you please create an issue for that?

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Please see one minor comment above.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @JIoJIaJIu !

@efiop efiop merged commit 6753e7b into iterative:master Jan 15, 2020
@efiop
Copy link
Contributor

efiop commented Jan 15, 2020

@JIoJIaJIu Oops, looks like we forgot about docs. Could you please either create an issue in dvc.org or, even better, submit a PR to dvc.org?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

core.checksum_jobs doesn't work for remotes

2 participants