-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why does mamba often redownload conda-forge/osx-64 channel index rather than checking/using cached version? #2021
Comments
I‘m on mobile so pardon the short response: IIUC there are 3 scenarios:
If an index is downloaded every time it probably means it doesn’t send some HTTP headers relevant for caching. There is an option that I don’t remember right now that controls the maximum frequency for steps 2/3. I have it set to a couple of hours. |
Thanks! What's the difference between 1. and 2.? In 1 you don't even make a request? How do you know that local cache is up to date then? Could it be that the package index is faulty and updated too often despite no fundamental changes? |
Yes essentially the HTTP cache headers say: don‘t check for changes in the next X seconds. Eg for conda-forge that would be something like 20 minutes because that‘s the frequency of their repodata updates. Then when you check for updates it might still be the case that there are none. You send the server your local cache‘s timestamp and it will respond with HTTP 304 „Not Modified“. In that case we don‘t download and update anything but we still made the request. If the index has bad or no cache headers then Mamba will have to check every time you install anything. And/or if the server does not support responding with 304 then whenever Mamba makes a request it will have to perform a download. |
xref #1504 |
Actually, I think what was really confusing me is how slow the download is. I have a 20-50MB/s, connection, yet the indeces only download at 1 MB/s... Instead of 8 seconds, this should take 1 second, then this wouldn't matter. Is this throttling from anaconda infrastructure? Quite surprising that this is so slow - modern infrastructure should be faster... |
Try this as a benchmark |
Hm, interesting. I think Mamba doesn't download the What's your What version of Mamba are you on? Can you try a newer/older one? |
Aha, that may be it! It could be that the gzip compression takes time on the server! Well the mamba 0.27.0 Happy to try out lower/higher version. Can you reproduce? I'm happy to try out stuff but would be good to know whether this is just me. How do I find download_threads? Couldn't find anything in docs and info etc. |
I don't think so, it's very likely pre-compressed |
I'm not sure you can compare this. Does curl report the speed relative to the compressed or uncompressed file size? |
I guess we just rely on whatever compression curl negotiates with the server. |
Probably the raw size. Even if not, it is definitely much faster. It takes only 3 seconds even if downloading uncompressed. Vs 10 seconds for mamba. Verbose curl logs
|
For me it's roughly the same duration of download with curl and Mamba:
Baseline:
The |
For |
I guess it would be good to change to the |
Right, so you can reproduce that mamba is very slow in downloading - or at least, it shows you download speed of compressed yet downloads at speed of uncompressed?). Maybe that's where the confusion comes in. How exciting that we may have figured out a way to make mamba updates much faster 😄 |
Yeah unfortunately not everything is documented yet. You can put it into your
|
So upping the threads to >= no of channels is another good idea. :) |
Yeah that seems to be the key point here. Mamba and curl report different numbers. But the duration that's being reported seems accurate and similar to what curl reports. |
Upping threads to 20 was already an improvement for envs with 8 channels! Thanks for the tip! |
Yeah we might want to consider increasing that number. It isn't actually the number of threads (xref #1963) and the curl default is 50 if you use |
Hmm, I always thought that Anaconda was somehow limiting the total download speed to ~3 Mb/s for repodata.json but maybe that isn't the case :)
to
which will add all libcurl-supported encodings automatically. Don't really think it will change much though. I am on not-great hotel wifi right now -- could someone compare:
I am not a big fan of the added complexity of using Also, the upcoming The |
|
So at least it looks like it's not "mamba"'s fault :) It could be possible that the Anaconda CDN servers aren't caching the gzipped response and it's slowly encoded on the fly?! Or Anaconda could limit the bandwidth / speed on purpose for this file?! Idk |
The Also bandwidth doesn't seem to be limited:
So this actually seems to be a server problem. Where can we raise this? |
This didn't age well @corneliusroemer. Another lesson in: Never make assumptions, and if you do, verify them. |
The conda slack or the infra repo under conda or conda-incubator |
Sometimes not being a pro has advantages @jonashaag This does indeed look like a server issue - but why don't you just move to bz2? |
@wolfv @jonashaag , I also opened a parallel issue in the conda-forge repo, where @jakirkham has now asked to open in conda/infra: conda-forge/conda-forge.github.io#1835 |
Yes, I think the |
I've opened something here: conda/infrastructure#637 |
I am going to close this for the moment. The repodata.zst support should land with the next release (optionally enable with |
Great news :) Very happy to see that |
As far as I'm aware, Do you plan on including it in |
I'm using mamba 0.27.0 on an M1 mac.
I'm confused why mamba often seems to redownload the entire package index (~25MB) instead of simply checking whether there have been any changes, e.g. using etag or time of last modification.
Where can I read more about the package index cache, and how to configure it?
There seem to be 3 different modes:
For some reason the index seems to be downloaded for certain channels but not for others. This is odd.
What's the difference between "Using cache" and "No change"? The operation leading to "No change" is slower, does this mean the whole index is redownloaded and compared, instead of using some faster way like a hash? How does "Using cache" work? Is there a time to live of the cache or do you check some hash every time?
It seems that some indexes are updated every few seconds, is that realistic? Or is it a bug in mamba and or the index server?
The text was updated successfully, but these errors were encountered: