Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some JSOC data fails to download using Fido #3735

Closed
vit1-irk opened this issue Jan 27, 2020 · 19 comments
Closed

Some JSOC data fails to download using Fido #3735

vit1-irk opened this issue Jan 27, 2020 · 19 comments
Labels
Bug Probably a bug Effort Medium Requires a moderate time investment net Affects the net submodule Package Intermediate Requires some knowledge of the internal structure of SunPy Priority Medium Non-urgent action required
Projects

Comments

@vit1-irk
Copy link

Description

I'm trying to download SDO HMI magnetic data for selected time ranges. Sometimes there are MISSING entries in the search results. If you try to download those "corrupted" UnifiedResponse objects, Fido fails

Expected behavior

Working download without any corrupted objects or ability to filter them out.

Actual behavior

Fido.fetch() crashes with error (see in the gist below)

Steps to Reproduce

This gist: https://gist.github.com/vit1-irk/19d9ebbc69281fd142316f52a2f019ce

Errors also happen when downloading from website

See: https://jsoc.stanford.edu/ajax/lookdata.html
Looks like the problem is on JSOC website, but maybe Sunpy should handle these MISSING entries itself

System Details

Included in the gist. Btw, sorry, but this time additional debugging info (like query dumps) didn't work out properly

@Cadair
Copy link
Member

Cadair commented Jan 27, 2020

yeah looks like we should be skipping over missing files rather than raising an Exception and stopping.

@Cadair Cadair added Bug Probably a bug Effort Medium Requires a moderate time investment net Affects the net submodule Package Intermediate Requires some knowledge of the internal structure of SunPy labels Jan 27, 2020
@twentyse7en
Copy link

Hey , I would like to work on this.

@nabobalis
Copy link
Contributor

Go ahead @twentyse7en

@samaloney
Copy link
Contributor

yeah looks like we should be skipping over missing files rather than raising an Exception and stopping.

I don't think we can skip over files as we create a drms export request using the entire time range and other arguments, also I don't think jsoc even supports this. I do know that if you include an extra query argument in the export request [? QUALITY >= 0 ?] you can get around this by excluding the missing files but I don't know if this is desired as then the search and downloaded files list will be different and also the user won't know the data isn't being downloaded etc.

@samaloney
Copy link
Contributor

@vit1-irk
Copy link
Author

vit1-irk commented Feb 3, 2020

@samaloney

Not a good idea but it does 'fix' the problem

Maybe this hack is enough, thanks for mentioning. Waiting until upstream has any other suggestions

@Cadair
Copy link
Member

Cadair commented Feb 3, 2020

Do we have the ability to search by Quality? if not we could add it as an attr and then at least this work around would be part of the search API 😀

@ejm4567
Copy link
Contributor

ejm4567 commented Feb 3, 2020

Yes, ideally that skipping should be done on the VSO side in the GetData function that generates the list of URLs. Better also would be more info returned as to why files are marked as "missing".
Question: The segments part of the Fido fetch, are they handled as logical OR or logical AND in the drms module?

@Cadair
Copy link
Member

Cadair commented Feb 4, 2020

@ejm4567 To be clear this is our direct JSOC / drms client here, not going through VSO at all.

The segments here:

segments = a.jsoc.Segment('field') & a.jsoc.Segment('inclination') & a.jsoc.Segment('azimuth') & a.jsoc.Segment('disambig')

are being handled as logical AND (& operator) we have a logical OR (|) operator as well.

@ejm4567
Copy link
Contributor

ejm4567 commented Feb 4, 2020

Thanks, I figured it was using the drms module. Given that the query was for an HMI series, the string of segment field names asked for implies to me that one, or more, of them are either not populated, or missing in the DRMS DB itself. The fix is best in the C code in DRMS before it returns to drms.

@pscherrer
Copy link

One should always include a query clause: [?QUALITY >= 0?] to avoid records with no files associated.

@Cadair
Copy link
Member

Cadair commented Feb 5, 2020

Thanks for the advice @pscherrer 😄

I think that the best way to solve this would be to:

  • Implement a new jsoc.attrs.Quality attr.
  • Add support for that where we build our query, and default it to >= 0
  • Document this so it doesn't catch people out.

samaloney pushed a commit to samaloney/sunpy that referenced this issue Feb 5, 2020
* Add new Quality attr to jsoc
* Add default Quality('? QUALITY >=0 ?) to all jsoc quers
samaloney pushed a commit to samaloney/sunpy that referenced this issue Feb 5, 2020
* Add new Quality attr to jsoc.attrs
* Add default equivalent to Quality('>=0') to all jsoc queries
@pscherrer
Copy link

pscherrer commented Feb 5, 2020 via email

@pscherrer
Copy link

pscherrer commented Feb 5, 2020 via email

@ayshih
Copy link
Member

ayshih commented Feb 5, 2020

Who is best person to contact.

Me! I'll write you an email.

samaloney pushed a commit to samaloney/sunpy that referenced this issue Mar 18, 2020
* Add new Quality attr to jsoc.attrs
* Add default equivalent to Quality('>=0') to all jsoc queries
@samaloney
Copy link
Contributor

Some very applicable information on sunpy/drms#37

@nabobalis nabobalis added the Priority Medium Non-urgent action required label Jul 11, 2020
@samaloney
Copy link
Contributor

I think the only we can handle this cleanly is to have instrument specific missing data attributes e.g attrs.jsoc.SDO_AIA_MISSING and attrs.jsoc.SDO_HMI_MISSING as I don't think it's possible to support general queries on the quality flag with the supported operations.

@pscherrer
Copy link

pscherrer commented Jul 15, 2020 via email

@dstansby dstansby added this to New reports in sunpy bugs via automation Oct 17, 2020
@dstansby
Copy link
Member

dstansby commented Aug 6, 2022

The code I've pasted below works for me, so I think this is no longer an issue. If anyone disagrees, feel free to comment/re-open, or open a fresh issue.

import astropy.units as u
import astropy.time
import sunpy
from sunpy.net import jsoc, fido_factory, Fido, attrs as a
import drms

series_name = "hmi.B_720s"
notifier = a.jsoc.Notify("d.stansby@ucl.ac.uk")
series = a.jsoc.Series(series_name)
segments = a.jsoc.Segment('field') & a.jsoc.Segment('inclination') & a.jsoc.Segment('azimuth') & a.jsoc.Segment('disambig')

attrs_time = a.Time('2017/09/06 05:40', '2017/09/06 06:30')
res = Fido.search(attrs_time, series, notifier, segments)

print(res)
# Takes a while and downloads ~180MB
dl_files = Fido.fetch(res)

@dstansby dstansby closed this as completed Aug 6, 2022
sunpy bugs automation moved this from Live bugs to Fixed Aug 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Probably a bug Effort Medium Requires a moderate time investment net Affects the net submodule Package Intermediate Requires some knowledge of the internal structure of SunPy Priority Medium Non-urgent action required
Projects
sunpy bugs
Resolved
Development

Successfully merging a pull request may close this issue.

9 participants