Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functionality to client to download new GOES 16/17 data and reprocessed 13/14/15 #4394

Merged
merged 34 commits into from Nov 21, 2020

Conversation

hayesla
Copy link
Member

@hayesla hayesla commented Jul 27, 2020

Description

This is adding the functionality to the XRSClient to get access to the GOES XRS 16 and 17 data.

It also now updates the client so that the default GOES13/14/15 data queried will return the new reprocessed data from the NOAA site rather than the old data. The old data can still be accessed through a new attrs a.goes.VersionData. I'm not too sure if this is how we want to go about this - but I think we should still provide an option to be able to download the old data (i.e. the data that this client used to return).

This is for the high time resolution data (2s for 13/14/15 and 1s for 16/17)

Fixes #4173
Fixes #4376

These new data files are in netCDF format so at the moment do not work with sunpy.timeseries (which will be in another PR)

@hayesla hayesla requested a review from a team as a code owner July 27, 2020 21:24
@pep8speaks
Copy link

pep8speaks commented Jul 27, 2020

Hello @hayesla! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 19:101: E501 line too long (104 > 100 characters)

Line 63:101: E501 line too long (117 > 100 characters)
Line 69:101: E501 line too long (124 > 100 characters)
Line 71:101: E501 line too long (108 > 100 characters)
Line 74:101: E501 line too long (116 > 100 characters)
Line 75:101: E501 line too long (109 > 100 characters)
Line 77:101: E501 line too long (106 > 100 characters)
Line 134:101: E501 line too long (131 > 100 characters)
Line 141:101: E501 line too long (124 > 100 characters)
Line 143:40: E225 missing whitespace around operator
Line 145:101: E501 line too long (128 > 100 characters)
Line 161:101: E501 line too long (103 > 100 characters)

Line 62:101: E501 line too long (113 > 100 characters)
Line 76:101: E501 line too long (113 > 100 characters)

Line 279:101: E501 line too long (120 > 100 characters)

Comment last updated at 2020-11-20 22:52:53 UTC

@nabobalis nabobalis added this to the 2.1 milestone Jul 27, 2020
@nabobalis nabobalis added the net Affects the net submodule label Jul 27, 2020
@hayesla hayesla requested a review from a team as a code owner July 28, 2020 03:08
@wtbarnes
Copy link
Member

This is great! 🎉

Copy link
Member

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as needs changes as per #4394 (review)

@Cadair
Copy link
Member

Cadair commented Sep 10, 2020

@hayesla #4321 has been merged so this needs a re-work 😄

@hayesla
Copy link
Member Author

hayesla commented Sep 16, 2020

@hayesla #4321 has been merged so this needs a re-work 😄

on it 🌞

@hayesla
Copy link
Member Author

hayesla commented Sep 17, 2020

This is still a work in progress - the main issue being how we deal with the old GOES 13, 14 and 15 data that's on the NASA server. Really people should be using the new re-processed science quality data from the NOAA site (in netcdf), but maybe we should still allow access to the old fits files? The only difference is the scaling factor which is removed in the new netcdf data (which will change the peak flux value/GOES class).

At the moment I've added a a.goes.VersionData attrs - but I'm not married to that idea. Actually its pretty messy.

What are peoples thoughts on this? @wtbarnes @Cadair

@wtbarnes
Copy link
Member

Seems reasonable to me. Would this be something that the user always has to specify? Or does it take on a default value to always return the latest (netCDF) data?

@kakirastern
Copy link

Regarding the build_docs check, we have the warnings:


    goes_ts = sunpy.timeseries.TimeSeries(goes_files, source='XRS', concatenate=True)
  File "/home/vsts/work/1/s/.tox/build_docs/lib/python3.8/site-packages/sunpy/timeseries/timeseries_factory.py", line 406, in __call__
    new_ts = self._check_registered_widgets(filepath=filepath, **kwargs)
  File "/home/vsts/work/1/s/.tox/build_docs/lib/python3.8/site-packages/sunpy/timeseries/timeseries_factory.py", line 529, in _check_registered_widgets
    data, meta, units = WidgetType._parse_file(filepath)
  File "/home/vsts/work/1/s/.tox/build_docs/lib/python3.8/site-packages/sunpy/timeseries/sources/goes.py", line 165, in _parse_file
    hdus = sunpy.io.read_file(filepath)
  File "/home/vsts/work/1/s/.tox/build_docs/lib/python3.8/site-packages/sunpy/io/file_tools.py", line 89, in read_file
    readername = _detect_filetype(filepath)
  File "/home/vsts/work/1/s/.tox/build_docs/lib/python3.8/site-packages/sunpy/io/file_tools.py", line 214, in _detect_filetype
    raise UnrecognizedFileTypeError("The requested filetype is not currently "
sunpy.io.file_tools.UnrecognizedFileTypeError: The requested filetype is not currently supported by SunPy.


/home/vsts/work/1/s/examples/time_series/goes_hek_m25.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/vsts/work/1/s/examples/time_series/goes_hek_m25.py", line 24, in <module>
    goes = TimeSeries(files)
  File "/home/vsts/work/1/s/.tox/build_docs/lib/python3.8/site-packages/sunpy/timeseries/timeseries_factory.py", line 406, in __call__
    new_ts = self._check_registered_widgets(filepath=filepath, **kwargs)
  File "/home/vsts/work/1/s/.tox/build_docs/lib/python3.8/site-packages/sunpy/timeseries/timeseries_factory.py", line 529, in _check_registered_widgets
    data, meta, units = WidgetType._parse_file(filepath)
TypeError: cannot unpack non-iterable NotImplementedType object

@hayesla
Copy link
Member Author

hayesla commented Nov 17, 2020

The build_doc is failing as its waiting on #4592

@kakirastern
Copy link

The build_doc is failing as its waiting on #4592

I see. Thanks for the heads up!

@Cadair
Copy link
Member

Cadair commented Nov 18, 2020

Oh fun, the goes server is giving us urllib.error.HTTPError: HTTP Error 429: Too Many Requests

@Cadair
Copy link
Member

Cadair commented Nov 18, 2020

Also happened on the online tests, I wonder if this is the scraper making too many calls to the server or similar. The first thing we should check would be to make sure we aren't being inefficient with our use of the scraper or at any other point in the high level code. The next thing we could do is catch a 409 response and implement a delay of a few hundred ms and retry.

@@ -471,7 +471,7 @@

__all__ = 'parse search findall with_pattern'.split()

log = logging.getLogger(__name__)
log = logging.getLogger("parse")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is annoying, as we now have to remember to change this on every update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For context, not doing this was causing an error to be raised when the log level was set to debug. I assume because the name of this logger started with sunpy it was being sent through some of the astropy logging infrastructure in a way that it wasn't supposed to be and throwing a traceback.

Copy link
Member

@Cadair Cadair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I think there is some deduplication that can happen on the calls to scraper though.

@@ -22,7 +22,22 @@ class XRSClient(GenericClient):
"""
Provides access to the GOES XRS fits files archive.

Searches data hosted by the `Solar Data Analysis Center <https://umbra.nascom.nasa.gov/goes/fits/>`__.
Searches data for the GOES XRS data both on NASA servers prior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both? I thought we defaulted to just new servers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep - but for the data prior to GOES 13 the NASA server data is correct (NOAA haven't provided re-processed data)

sunpy/net/dataretriever/sources/goes.py Outdated Show resolved Hide resolved
sunpy/net/dataretriever/sources/goes.py Outdated Show resolved Hide resolved
sunpy/net/dataretriever/sources/goes.py Outdated Show resolved Hide resolved
sunpy/net/dataretriever/sources/goes.py Outdated Show resolved Hide resolved
sunpy/net/dataretriever/sources/goes.py Outdated Show resolved Hide resolved
2.1 automation moved this from Review in progress to Reviewer approved Nov 21, 2020
@nabobalis nabobalis removed the request for review from dstansby November 21, 2020 11:13
@nabobalis nabobalis merged commit e3017be into sunpy:master Nov 21, 2020
2.1 automation moved this from Reviewer approved to Done Nov 21, 2020
@Cadair
Copy link
Member

Cadair commented Nov 21, 2020

Thanks @hayesla this and #4592 was an awesome effort 🎉

@kakirastern
Copy link

🎉

@dstansby dstansby removed the Needs Review Needs reviews before merge label Jan 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
net Affects the net submodule
Projects
No open projects
2.1
Done
Development

Successfully merging this pull request may close these issues.

GOES 13-15 XRS data out of date New GOES XRS (16- 17) data will be in netcdf format
8 participants