Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc: pretty slow on Mac #2582

Closed
jorgeorpinel opened this issue Oct 6, 2019 · 16 comments · Fixed by #2628 or #2642
Closed

dvc: pretty slow on Mac #2582

jorgeorpinel opened this issue Oct 6, 2019 · 16 comments · Fixed by #2628 or #2642
Assignees
Labels
bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. research

Comments

@jorgeorpinel
Copy link
Contributor

I experience around 10 seconds before and another 5 to 10 after dvc version output (below) on my Macbook Air.

DVC version: 0.61.2
Python version: 3.7.3
Platform: Darwin-18.7.0-x86_64-i386-64bit
Binary: False
Cache: reflink - True, hardlink - True, symlink - True
Filesystem type (cache directory): ('apfs', '/dev/disk1s1')
Filesystem type (workspace): ('apfs', '/dev/disk1s1')

Attached are a couple logs of python3 -mcProfile -s cumtime -m dvc version, the second one after opting out from analytics:

python3 -mcProfile -s cumtime -m dvc version (with analytics).log
python3 -mcProfile -s cumtime -m dvc version (without analytics).log

@jorgeorpinel jorgeorpinel changed the title DVC is slow on Mac Basic DVC commands are pretty slow on my Mac Oct 6, 2019
@jorgeorpinel
Copy link
Contributor Author

p.s. My connection is pretty fast, no VPN, and I've experienced this on several different networks recently (residential, WeWork, cafes, hotel).

@efiop efiop added p0-critical Critical issue. Needs to be fixed ASAP. bug Did we break something? labels Oct 8, 2019
@efiop efiop assigned efiop and unassigned efiop Oct 8, 2019
@efiop efiop added the research label Oct 8, 2019
@efiop
Copy link
Member

efiop commented Oct 10, 2019

Tried adding sleep(100) to places where requests is called on both mac and linux and was not able to reproduce. @jorgeorpinel Are you still able to reproduce? If so, could you check if there are any dvc processes running in the background? E.g. ps -xaf | grep dvc?

@efiop efiop self-assigned this Oct 10, 2019
@jorgeorpinel
Copy link
Contributor Author

Yeah still happening on my machine with the latest DVC version. I ran ps -xaf | grep dvc a bunch of times on a separate terminal and It's just ...Python /usr/local/bin/dvc version for the first 10 seconds (before output) and then both that and ...Python /usr/local/bin/dvc daemon -q analytics ... for the last 10 seconds after output.

But oh well, as long as this is a problem only on my machine then it's not really an issue, right? Feel fee to close unless you want to investigate further. Glad to give you SSH access or something.

@efiop
Copy link
Member

efiop commented Oct 11, 2019

@jorgeorpinel No reason to close this, we don't know if it is happening to anyone else, but they are just stopping using it without reporting the issue :) Let's have a meeting together, so I could take a look, maybe it is something obvious that I'm missing. I'll ping you in PM's. Thank you 🙂

@efiop
Copy link
Member

efiop commented Oct 14, 2019

We've had a debugging session with @jorgeorpinel and found that it is flufl.lock -> socket.getfqdn -> socket.gethostbyaddr that is slow. On further inspection, we've found that he has some pretty unusual dns settings that seem to be the cause of this. He has now adjusted his dns settings and we are waiting for new results.

@efiop
Copy link
Member

efiop commented Oct 14, 2019

Ok, so turns out on Mac "Computer name" that you set in Settings -> Sharing is not recognisable by anyone until you enable "Remote Login" service. So even ping Ruslans-MacBook-Pro.local will hang and then fail. The moment you enable that service, everything starts to work instantly. /etc/hosts is not updated by that operation, so not quite sure yet about the precise mechanism. But I am able to reproduce now and I am investigating.

@efiop
Copy link
Member

efiop commented Oct 14, 2019

Ok, so it is a cpython bug https://bugs.python.org/issue5004 . There is a patch for it from like 9 years ago. We can either monkeypatch it or patch it in flufl.lock and or try to push this fix into cpython as well. Working on this right now.

@jorgeorpinel
Copy link
Contributor Author

Thanks for looking at this Ruslan! Glad there's a workaround we can use in lock.py so Mac users don't have a bad experience with DVC (I guess most people don't enable Remote login) even if that cpython bug doesn't get addressed soon.

@efiop
Copy link
Member

efiop commented Oct 14, 2019

For the record, submitted a PR https://gitlab.com/warsaw/flufl.lock/merge_requests/12 . @jorgeorpinel confirmed that it works for him now.

@jorgeorpinel jorgeorpinel changed the title Basic DVC commands are pretty slow on my Mac dvc: pretty slow on Mac Oct 14, 2019
efiop added a commit to efiop/dvc that referenced this issue Oct 18, 2019
Workaround for slow and obsoleted gethostbyaddr used in
`socket.getfqdn()`. See [1], [2] and [3] for more info.

[1] https://bugs.python.org/issue5004
[2] iterative#2582
[3] https://gitlab.com/warsaw/flufl.lock/merge_requests/12

Flufl.lock guys seem unresponsive, so we have to monkeypatch
this by ourselves, at least for now.

Fixes iterative#2582
efiop added a commit that referenced this issue Oct 18, 2019
Workaround for slow and obsoleted gethostbyaddr used in
`socket.getfqdn()`. See [1], [2] and [3] for more info.

[1] https://bugs.python.org/issue5004
[2] #2582
[3] https://gitlab.com/warsaw/flufl.lock/merge_requests/12

Flufl.lock guys seem unresponsive, so we have to monkeypatch
this by ourselves, at least for now.

Fixes #2582
@efiop
Copy link
Member

efiop commented Oct 18, 2019

@jorgeorpinel 0.63.3 is out. Please upgrade 🙂

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 20, 2019

Nice! It's fast now before output but still around 13 seconds to get back to the console prompt AFTER the command output, whether with or without analytics... Here's the profiler output: https://pastebin.com/isBDYC4E

@jorgeorpinel jorgeorpinel reopened this Oct 20, 2019
@efiop
Copy link
Member

efiop commented Oct 20, 2019

@jorgeorpinel Ok, this time this seems to be OS X fault for not caching dns requests rust-lang/rust#31665 . Will prepare a workaround ASAP.

@efiop
Copy link
Member

efiop commented Oct 20, 2019

Aaaand I can fully reproduce the original issue but now with getaddrinfo on my conda on mac 😬 Looking into that...

efiop added a commit to efiop/dvc that referenced this issue Oct 21, 2019
This is basically Lock.__init__ copy-paste, except that
instead of using `socket.getfqdn()` we use `socket.gethostname()`
to speed this up. We've seen [1] `getfqdn()` take ~5sec to return
anything, which is way too slow. `gethostname()` is actually a
fallback for `getfqdn()` when it is not able to resolve a
canonical hostname through network. The claimfile that uses
`self._hostname` is still usable, as it uses `pid` and random
number to generate the resulting lock file name, which is unique
enough for our application.

[1] iterative#2582

Fixes iterative#2582
@efiop
Copy link
Member

efiop commented Oct 21, 2019

@jorgeorpinel Could you please install

pip install --force-reinstall git+https://github.com/efiop/dvc@2582

and see if that helps? It works for me, but I just want to make sure.

@jorgeorpinel
Copy link
Contributor Author

Ran that (with pip3 in my case) and it's fast now! Under 1 sec to output dvc version and immediately back to the prompt. The installation did run into these errors though, pasting here in case it's relevant:

ERROR: awscli 1.16.215 has requirement colorama<=0.3.9,>=0.2.5, but you'll have colorama 0.4.1 which is incompatible.
ERROR: docker-compose 1.23.2 has requirement requests!=2.11.0,!=2.12.2,!=2.18.0,<2.21,>=2.6.1, but you'll have requests 2.22.0 which is incompatible.
ERROR: awsebcli 3.15.3 has requirement colorama<0.4.0,>=0.3.9, but you'll have colorama 0.4.1 which is incompatible.
ERROR: awsebcli 3.15.3 has requirement future<0.17.0,>=0.16.0, but you'll have future 0.18.1 which is incompatible.
ERROR: awsebcli 3.15.3 has requirement pathspec==0.5.9, but you'll have pathspec 0.6.0 which is incompatible.
ERROR: awsebcli 3.15.3 has requirement requests<2.21,>=2.20.1, but you'll have requests 2.22.0 which is incompatible.
ERROR: awsebcli 3.15.3 has requirement six<1.12.0,>=1.11.0, but you'll have six 1.12.0 which is incompatible.
ERROR: awsebcli 3.15.3 has requirement urllib3<1.25,>=1.24.1, but you'll have urllib3 1.25.6 which is incompatible.

@efiop
Copy link
Member

efiop commented Oct 21, 2019

@jorgeorpinel Thanks! I think we have a separate issue for those.

efiop added a commit that referenced this issue Oct 21, 2019
This is basically Lock.__init__ copy-paste, except that
instead of using `socket.getfqdn()` we use `socket.gethostname()`
to speed this up. We've seen [1] `getfqdn()` take ~5sec to return
anything, which is way too slow. `gethostname()` is actually a
fallback for `getfqdn()` when it is not able to resolve a
canonical hostname through network. The claimfile that uses
`self._hostname` is still usable, as it uses `pid` and random
number to generate the resulting lock file name, which is unique
enough for our application.

[1] #2582

Fixes #2582
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. research
Projects
None yet
2 participants