New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add protections for the get_data utility #293
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused as to why all of the tools now call out to gwdatafind
. It looks like the returned file lists aren't being reused (which would be a good reason to call find_urls
upfront), so can they not just let gwpy do the gwdatafind
call for them?
gwdetchar/io/datafind.py
Outdated
# read from frames or NDS | ||
if source: | ||
if source is not None: | ||
if isinstance(channel, (list, tuple)): | ||
channel = remove_missing_channels(channel, source) | ||
return series_class.read( | ||
source, channel, start=start, end=end, nproc=nproc, | ||
verbose=verbose, **kwargs) | ||
elif isinstance(channel, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it occurs to me that this should probably be
elif not isinstance(channel, (list, tuple))
to match the condition above that declares series_class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @duncanmmacleod, I've changed this in the latest commit. I also restored the logic that attempts to load a cache using gwdatafind
because I remembered that it's much simpler that way, but added some protection against non-existent frame files.
3ba8ff0
to
2407255
Compare
57f7b0e
to
f047644
Compare
3 similar comments
55ff36f
to
be4e43b
Compare
…the unit tests for get_data
@duncanmmacleod, I just saw your longer comment, sorry about that. Response below: I prefer to have For example, |
gwdetchar/io/datafind.py
Outdated
if isinstance(channel, (list, tuple)): | ||
channel = remove_missing_channels(channel, source) | ||
return series_class.read( | ||
source, channel, start=start, end=end, nproc=nproc, | ||
verbose=verbose, **kwargs) | ||
elif isinstance(channel, str): | ||
return series_class.fetch( | ||
except (HTTPError, TypeError): # frame files not found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain when an HTTPError
would be raised?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duncanmmacleod, the HTTPError
is raised whenever gwdatafind.find_urls
can't locate frame files (you get a 400 Bad Request
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if you don't pass obs
and the first letter of frametype
is not a meaningful observatory code (e.g. for SenseMon frames), this will also lead to HTTPError
. The TypeError
case comes when you pass neither frametype
nor source
. In all these cases, the idea is that it would then fall back to TimeSeries.get
with no protection against missing channels in frame files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duncanmmacleod, I stand corrected, if you attempt to read SenseMon frames with frametype='SenseMonitor_hoft_L1_M'
and obs=None
then the result is an IORegistryError: Format could not be identified.
Passing frametype=None
falls back to TimeSeries.get
, and passing both frametype
and obs
successfully uses TimeSeries.read
.
@duncanmmacleod, I've fixed this so that the I've also updated the docstring to [hopefully] make it clearer to the user which arguments do what, and when. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aye, looks good.
This PR fixes a bug in the
get_data
utility by adding protections against nonexistent frame files when attempting to build a frame file cache, and falling back toTimeSeries{Dict}.get
rather thanfetch
when that fails. Unit tests are also updated to reflect these changes.Other changes include:
obs
keyword argument fromget_data
, and infer the observatory from theframetype
(when applicable) using a regular expressionobs
from a few scriptsgwdetchar-conlog
cc @duncanmmacleod