Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_goes_event_list doesn't return for TimeRange > several months #40

Open
jmason86 opened this issue Jan 25, 2018 · 6 comments
Open

get_goes_event_list doesn't return for TimeRange > several months #40

jmason86 opened this issue Jan 25, 2018 · 6 comments

Comments

@jmason86
Copy link
Sponsor

get_goes_event_list queries the HEK. @Cadair says that the HEK doesn't seem to do well with TimeRange values measured in large numbers of months. I've presently been waiting 16 hours for the event list return for a 4 year time range for flares > C1.0 (I assume it is actually hung). The equivalent method (rd_gev.pro) in IDL solarsoft returns in a few seconds for the same inputs. That implementation predates the HEK's existence by a couple decades. It requires that your local solarsoft database includes the GOES events files, so to stay up to date, your system would need to be running sswdb update periodically.

But there is another way. NOAA posts the event list directly on their site. I note that at the moment, it hasn't been updated in 6 months though. I'm asking around to see if there's another site. Scraping this site, of course, would not have a guarantee of long-term reliability as evidenced by the fact that my bookmark to the page was out of date just now and it did not redirect me to the link above. Still, scraping should be much quicker than going through the HEK. So this new feature could be an alternative implementation that triggers on, e.g., a timeout, try/catch, or by user specified kwarg.

Implementation parsing note:
The format that NOAA uses for the date in those files is very wonky. An example: 31777170628, which seems to be
[31777] = some sort of static identifier
[170628] = 2017-06-28

@jmason86
Copy link
Sponsor Author

jmason86 commented Jan 25, 2018

Here's a NOAA site for events that is more up to date, but doesn't stretch back as far: ftp://ftp.ngdc.noaa.gov/STP/swpc_products/daily_reports/solar_event_reports/

@jmason86
Copy link
Sponsor Author

After contacting some people at NOAA, it looks like there's no super clean way to do this but it's not that bad either. The interface in sunpy will be clean.

On the backend,

  1. First check the NGDC site (same one I originally linked, available as http or ftp). That data ranges from 1975 to (hopefully) present (but may not be up to date, which is the case right now).

  2. If the dates requested by the user go further forward in time than what's available on the NGDC site, check the SWPC site instead (ftp only). That data ranges from 2015 to present (and is up to date right now).

NGDC ftp: ftp://ftp.ngdc.noaa.gov/STP/space-weather/solar-data/solar-features/solar-flares/x-rays/goes/xrs/

SWPC ftp: ftp://ftp.swpc.noaa.gov/pub/indices/events/

@getsanjeev

This comment has been minimized.

@dpshelio
Copy link
Member

@jmason86 - All that data is up to date in HELIO Event Catalog - SunPy has an interface to load them, but it's not too intuitive. (Oh... and it seems you can only do time queries :( ) - in any case, we should make a fido like to query HELIO, HEK, etc...

@jmason86
Copy link
Sponsor Author

Ooh, that's a nice reference! Fido could be a good place for this to go, as long as it subsumed get_goes_event_list, although it may not be as intuitive for how to specify you want the event list as opposed to the GOES/XRS irradiance data. get_goes_event_list is a perfect function name. Choosing between the two interfaces and the options for backend data acquisition is beyond me.

@aringlis
Copy link
Member

aringlis commented Aug 2, 2018

@jmason86 - I tested the get_goes_event_list routine to try to understand why it's so slow. I used a 4 year time range tr = TimeRange('2011-01-01','2015-01-01')

The following query to the HEK

result = client.search(hek.attrs.Time(tr.start, tr.end),
                          hek.attrs.EventType('FL'),
                          hek.attrs.FL.GOESCls > 'C1',
                          hek.attrs.OBS.Observatory == 'GOES')

completed in around 5 minutes, with 6290 entries in result. That's quite slow, I'm not really sure why the HEK response takes that long. Running get_goes_event_list produced about the same run times. But on more than one occasion the code seemed to hang indefinitely instead of completing.

For sanity, I manually checked the run time of the loop in the get_goes_event_list function:

goes_event_list = []

for r in result:
    goes_event = {
        'event_date': parse_time(r['event_starttime']).date().strftime(
            '%Y-%m-%d'),
        'start_time': parse_time(r['event_starttime']),
        'peak_time': parse_time(r['event_peaktime']),
        'end_time': parse_time(r['event_endtime']),
        'goes_class': str(r['fl_goescls']),
        'goes_location': (r['event_coord1'], r['event_coord2']),
        'noaa_active_region': r['ar_noaanum']
        }
    goes_event_list.append(goes_event)

It completed almost instantly, so that's not a source of delay. It seems to be entirely down to how long the HEK response takes. For some reason, the request to the HEK also seems prone to hang/timeout. I have no idea as to why, unfortunately.

@nabobalis nabobalis transferred this issue from sunpy/sunpy Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants