Improve performance of timerange extraction in scraper #6101

dstansby · 2022-04-24T08:56:49Z

I was digging into why some SUVI tests take so long to run (even with vcrpy, ie. ignoring the remote response times), and when there are a lot of URLs to parse, get_timerange_from_exdict is the bottleneck, as it is run on every URL. Most of the time was spent in astropy.time.Time logic, but this is unnesseary as this code doesn't use any special features of Time, so:

switch to just using datetime
pre-compute the deltas
re-order the many if statements so they're not all executed if seconds are specified in the timestamp

This leads to a ~50% improvement in running the scraper for me. On some SUVI searches in the current test suite, this reudces their runtime by ~3 seconds.

samaloney · 2022-04-25T16:14:23Z

This looks great - I have a vague recollection of something about leap seconds which was why we use astropy.time.Time but I can't remember or see how it would make a difference.

nabobalis · 2022-04-26T05:02:27Z

pre-commit.ci autofix

for more information, see https://pre-commit.ci

nabobalis · 2022-04-26T05:54:54Z

The SRS remote tests are failing on main. Otherwise the other online fails are flaky ones.

dstansby force-pushed the scraper-perf branch 2 times, most recently from 4d48739 to a55f245 Compare April 24, 2022 09:09

Improve performance of timerange extraction in scraper

56b8969

dstansby force-pushed the scraper-perf branch from a55f245 to 56b8969 Compare April 24, 2022 17:07

dstansby marked this pull request as ready for review April 25, 2022 11:29

dstansby requested a review from a team as a code owner April 25, 2022 11:29

dstansby added the backport 4.0 label Apr 25, 2022

Add changelog

fab8253

nabobalis approved these changes Apr 26, 2022

View reviewed changes

nabobalis added net Affects the net submodule CodeFix labels Apr 26, 2022

nabobalis added the Merge When CI Passes Hit that merge button when it's all green! label Apr 26, 2022

[pre-commit.ci] auto fixes from pre-commit.com hooks

174d1a7

for more information, see https://pre-commit.ci

nabobalis merged commit 4b3ca5f into sunpy:main Apr 26, 2022

sunpy-backport bot mentioned this pull request Apr 26, 2022

[Backport 4.0] Improve performance of timerange extraction in scraper #6108

Merged

dstansby deleted the scraper-perf branch April 26, 2022 17:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of timerange extraction in scraper #6101

Improve performance of timerange extraction in scraper #6101

dstansby commented Apr 24, 2022 •

edited

samaloney commented Apr 25, 2022

nabobalis commented Apr 26, 2022

nabobalis commented Apr 26, 2022

Improve performance of timerange extraction in scraper #6101

Improve performance of timerange extraction in scraper #6101

Conversation

dstansby commented Apr 24, 2022 • edited

samaloney commented Apr 25, 2022

nabobalis commented Apr 26, 2022

nabobalis commented Apr 26, 2022

dstansby commented Apr 24, 2022 •

edited