-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify download for antenna models / proposal tables / shower library with additional fallback servers #673
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is great. You only missed one line (see in-line comment).
Let's discuss in our meeting about the automatic ordering. It is fairly rare to download something as it is always locally buffered, so it is not very important to optimize it.
More important might be a safeguard against multiple downloads of the same file at the same time from cluster jobs.
if r.status_code != requests.codes.ok: | ||
logger.error("error in download of proposal tables") | ||
raise IOError | ||
"downloading pre-calculated proposal tables for {}. This can take a while...".format(config_file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you forgot to delete this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I intended to leave it here as is. The logger message command starts the line above, so just this line starting with a string looks odd here.
NuRadioReco/utilities/dataservers.py
Outdated
from datetime import datetime | ||
from geolite2 import geolite2 | ||
|
||
geo = geolite2.reader() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than using geolocation, why not just store the server time zone along with the server names? I don't think we're going to have significantly more than two ever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the response time test is more robust anyway though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I removed the function entirely. Just having response time around is enough, if we are not enabling the option for now anyways (we agreed with @cg-laser that since download is only once for new files, we'd not request ordering, but leave DESY the "master")
response_times = [] | ||
available_dataservers = [] | ||
|
||
for dataserver in dataservers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it could make sense to parallelize this, though in practice it probably doesn't matter very much with just two servers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah... right.. I don't expect that we will ever have >> 2 servers. I suggest to leave it as is.
@@ -0,0 +1,115 @@ | |||
import requests | |||
import os | |||
import filelock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filelock claims it requires python >= 3.8, is that a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the tests download without problems. We require 3.6 for NuRadioMC currently. Do you know alternatives? Or would you fix filelock to v3.4.1, when they dropped python 3.6 support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess pip should do the right thing and automatically download an older version if needed (and it looks like the old versions still exist on pypi), so I think this should be fine.
if os.path.isfile(target_path): | ||
logger.warning(f"{target_path} already exists. Maybe download was already completed by another instance?") | ||
return | ||
elif unpack_tarball and (len(glob(os.path.dirname(target_path) + "/*.dat")) > 0): #just check if any .dat files present (similar to NuRadioProposal.py) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the data files ever updated? Or will the name get changed in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The download function just gets called if the file needs download (either because the hash sum is not the latest version or the file is not there. The if
/elif
here just catches the case when 1000 jobs are started and the first that aquired the lock did already perform the download...
code.write(r.content) | ||
logger.warning("...download finished.") | ||
|
||
if unpack_tarball and target_path.endswith(".tar.gz"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI unpack archive supports additional archive formats. Not sure if it would make sense to support .tar.xz to save some bandwidth in some cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, but I don't think this is really needed. the proposal tarballs are the only tarballed files, and we chose to have them as tar.gz... they are not so big, and you only need to download once for a NuRadioMC installation.
If tests are running fine pip must be resolving the correct version automatically so I think it's fine to merge? |
NuRadioReco.utilities.dataservers.py
Things one could discuss:
dataservers.py
?./NuRadioMC/SignalGen/ARZ/ARZ.py: URL = 'http://arianna.ps.uci.edu/~arianna/data/ce_shower_library/ARZ_library_v{:d}.{:d}.pkl'.format(*self._version)
I think this server does not exist any more?