-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pull from external repo: support --j to limit the number of parallel connections #3396
Comments
@pommedeterresautee I was not able to reproduce it 🤔 could you please run also, when you run |
I see quite a lot of progress bars, too many for my terminal which crazily scroll the output: dvc pull -j 1 |
@iterative/engineering someone who is using Linux, could you please check really quick that @pommedeterresautee could you if you have anything in you DVC config file related to the number of jobs? It's |
Had a quick look; not sure if this is the issue but |
@casperdcl great catch! I just realized that it @efiop it looks like the reason for this is clear, can we prioritize and add this? |
Never underestimate the debugging power of debian on a phone :) |
We are introducing
DVC
in our company and were quite happy until we started using it on a large project containing few hundred of thousands of files representing approximatively 300 Gb.We use S3 as storage.
When someone from our team did a
dvc pull
of this project, it sucked the whole internet bandwidth of our office.We tried to mitigate the issue by limiting the number of concurrent jobs to 1 (option
-j 1
) but it was not enough.Our IT Ops team told us that
dvc
has opened hundred of concurrent connections to download files from ourS3
bucket, and that it explains why we have been able to suck most of the bandwidth.Is there other option than
--jobs
to limit the number of parallel connections we should take care of?Is there some existing workaround for this situation?
The text was updated successfully, but these errors were encountered: