Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up Fermi GBM data parsing for TimeSeries #5943

Merged
merged 9 commits into from Apr 13, 2022

Conversation

hayesla
Copy link
Member

@hayesla hayesla commented Mar 8, 2022

PR Description

This fixes #5942 for which reading in fermi GBM files was taking an age and a half

PR Checklist

  • I have followed the guidelines in the Contributing document
  • Changes follow the coding style of this project
  • Changes have been formatted and linted
  • Changes pass pytest style unit tests (and pytest passes).
  • Changes include any required corresponding changes to the documentation
  • Changes have been rebased and squashed to the minimal number (typically 1) of relevant commits
  • Changes have a descriptive commit message with a short title
  • I have added a Fixes #XXXX - or Closes #XXXX - comment to auto-close the issue that your PR addresses

@hayesla hayesla requested a review from a team as a code owner March 8, 2022 20:26
@hayesla
Copy link
Member Author

hayesla commented Mar 8, 2022

for context - it now takes ~0.2 and ~2s to read the CSPEC and CTIME files, respectively compared to ~12, 200s beforehand.

Copy link
Member

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the figure images that have changed (https://75296-2165383-gh.circle-artifacts.com/0/.tmp/py38-figure/figure_test_images/fig_comparison.html) it seems like the data being returned in the 800-2000 keV range is different now? If that's bugfix it should be documented in the changelog; if it's a newly introduced bug it should be fixed 😄

@hayesla
Copy link
Member Author

hayesla commented Mar 9, 2022

yes sorry I forgot to mention, I've slightly changed the energy edges used for binning (instead of using np.searchsort) and that's the differences here - but it is what we want. I'll update the changelog now! :)

@nabobalis nabobalis added BugFix timeseries Affects the timeseries submodule labels Mar 9, 2022
@nabobalis nabobalis requested a review from dstansby March 9, 2022 19:29
Copy link
Member

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this fixes existing behaviour, approving. It seems odd to me that we're doing this kind of inaccurate reduction to the data though, instead of just returning what's in the file...

Copy link
Member

@ayshih ayshih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... The speed-up is great, but I'm a little uncomfortable with this change being a backported bugfix because it does change the approach to binning the data (which wasn't necessary for the speed-up). While the prior approach wasn't documented and could be considered nonintuitive, this PR's approach also isn't documented and could also be considered nonintuitive. At the very least, a note should be added to class's docstring to describe what precisely is being done.

Plus, someone should implement a proper rebinning that splits the data bins, and then the binning approach will change again.

@hayesla
Copy link
Member Author

hayesla commented Mar 11, 2022

ok yes, I agree - how about I change this PR to just be a speed up, and then another one that properly deals with the binning?

@nabobalis
Copy link
Contributor

How about we backport a separate PR with just the speed fix and leave the bin changes for 4.0?

@nabobalis nabobalis added No Backport A PR that isn't to be backported to any release branch. (To be used as a flag to other maintainers) and removed backport 3.1 labels Mar 11, 2022
@nabobalis
Copy link
Contributor

We can create a seperate changelog for the bin changes and label that breaking for 4.0?

@nabobalis nabobalis requested review from ayshih and removed request for ayshih March 16, 2022 22:57
Copy link
Member

@ayshih ayshih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dosctring for GBMSummaryTimeSeries ought to elaborate about the rebinning approach. The existing statement "Note that the data is re-binned from the original 128..." does not have enough detail to discriminate between the old approach and the new approach. Given the lack of detail, a user might reasonably assume a rebinning approach that includes splitting bins, which is neither the old or the new approaches.

@nabobalis nabobalis added this to the 4.0.0 milestone Mar 23, 2022
@dstansby dstansby marked this pull request as draft April 5, 2022 08:33
Copy link
Member

@ayshih ayshih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are the changes for a speed improvement alone, without modifying the rebinning scheme, so it is safe to backport. Given that the speed improvement is ~2 orders of magnitude, the changelog entry should probably be bugfix instead of trivial.

sunpy/timeseries/sources/fermi_gbm.py Show resolved Hide resolved
sunpy/timeseries/sources/fermi_gbm.py Outdated Show resolved Hide resolved
sunpy/timeseries/sources/fermi_gbm.py Outdated Show resolved Hide resolved
changelog/5943.breaking.rst Outdated Show resolved Hide resolved
sunpy/timeseries/sources/fermi_gbm.py Outdated Show resolved Hide resolved
Copy link
Member

@ayshih ayshih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add another bugfix changelog entry:

Fixed bugs with the rebinning and per-keV calculation for Fermi/GBM summary lightcurves.

sunpy/timeseries/sources/fermi_gbm.py Outdated Show resolved Hide resolved
sunpy/timeseries/sources/fermi_gbm.py Outdated Show resolved Hide resolved
sunpy/timeseries/sources/fermi_gbm.py Outdated Show resolved Hide resolved
@ayshih
Copy link
Member

ayshih commented Apr 13, 2022

It turns out that the per-keV calculation is currently wrong, so since we have to bugfix the lightcurves already, I'm now on board with rolling in the better rebinning at the same time. Let's backport this

@ayshih ayshih added backport 3.0 and removed No Backport A PR that isn't to be backported to any release branch. (To be used as a flag to other maintainers) labels Apr 13, 2022
Co-authored-by: Albert Y. Shih <ayshih@gmail.com>
@nabobalis nabobalis marked this pull request as ready for review April 13, 2022 05:30
@nabobalis nabobalis requested a review from ayshih April 13, 2022 16:10
Copy link
Member

@ayshih ayshih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tweaks to the changelog entries, otherwise looks good (but, I wrote most of it...)

changelog/5943.bugfix.1.rst Outdated Show resolved Hide resolved
changelog/5943.bugfix.2.rst Outdated Show resolved Hide resolved
Co-authored-by: Albert Y. Shih <ayshih@gmail.com>
@nabobalis nabobalis added the Merge When CI Passes Hit that merge button when it's all green! label Apr 13, 2022
@nabobalis
Copy link
Contributor

Doc fails are examples that all I can see.

@nabobalis nabobalis merged commit a4d6a75 into sunpy:main Apr 13, 2022
@sunpy-backport
Copy link

The backport to 3.0 failed:

Commits ["e445aa6651f22af9687afd195577ac4d3c4aa56d","f347b3fb29f0ef440283a9c774acf81a22f10560","5355f42ee97abd1b4053e97416b56ba5cc2e1d05","1e345e405339891eaea96fa7b18332b04cc5efae","03df401e118991e39bc2a93bb0fd363ae1eb5004","8f784408633ff409684822747e5bc74912e3f25c","123c3d90cf98feaa9327642c6fb44b7541eaa00d","ffe12f02c0b718a5ca805d2fe60afb4ae1f317ef","4038ac4c925f5eec375f841f764f076703c463f0"] could not be cherry-picked on top of 3.0

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub.
git fetch
# Create new working tree.
git worktree add .worktrees/backport 3.0
# Navigate to the new directory.
cd .worktrees/backport
# Cherry-pick all the commits of this pull request and resolve the likely conflicts.
git cherry-pick e445aa6651f22af9687afd195577ac4d3c4aa56d f347b3fb29f0ef440283a9c774acf81a22f10560 5355f42ee97abd1b4053e97416b56ba5cc2e1d05 1e345e405339891eaea96fa7b18332b04cc5efae 03df401e118991e39bc2a93bb0fd363ae1eb5004 8f784408633ff409684822747e5bc74912e3f25c 123c3d90cf98feaa9327642c6fb44b7541eaa00d ffe12f02c0b718a5ca805d2fe60afb4ae1f317ef 4038ac4c925f5eec375f841f764f076703c463f0
# Create a new branch with these backported commits.
git checkout -b backport-5943-to-3.0
# Push it to GitHub.
git push --set-upstream origin backport-5943-to-3.0
# Go back to the original working tree.
cd ../..
# Delete the working tree.
git worktree remove .worktrees/backport

Then, create a pull request where the base branch is 3.0 and the compare/head branch is backport-5943-to-3.0.

@sunpy-backport
Copy link

The backport to 3.1 failed:

Commits ["e445aa6651f22af9687afd195577ac4d3c4aa56d","f347b3fb29f0ef440283a9c774acf81a22f10560","5355f42ee97abd1b4053e97416b56ba5cc2e1d05","1e345e405339891eaea96fa7b18332b04cc5efae","03df401e118991e39bc2a93bb0fd363ae1eb5004","8f784408633ff409684822747e5bc74912e3f25c","123c3d90cf98feaa9327642c6fb44b7541eaa00d","ffe12f02c0b718a5ca805d2fe60afb4ae1f317ef","4038ac4c925f5eec375f841f764f076703c463f0"] could not be cherry-picked on top of 3.1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub.
git fetch
# Create new working tree.
git worktree add .worktrees/backport 3.1
# Navigate to the new directory.
cd .worktrees/backport
# Cherry-pick all the commits of this pull request and resolve the likely conflicts.
git cherry-pick e445aa6651f22af9687afd195577ac4d3c4aa56d f347b3fb29f0ef440283a9c774acf81a22f10560 5355f42ee97abd1b4053e97416b56ba5cc2e1d05 1e345e405339891eaea96fa7b18332b04cc5efae 03df401e118991e39bc2a93bb0fd363ae1eb5004 8f784408633ff409684822747e5bc74912e3f25c 123c3d90cf98feaa9327642c6fb44b7541eaa00d ffe12f02c0b718a5ca805d2fe60afb4ae1f317ef 4038ac4c925f5eec375f841f764f076703c463f0
# Create a new branch with these backported commits.
git checkout -b backport-5943-to-3.1
# Push it to GitHub.
git push --set-upstream origin backport-5943-to-3.1
# Go back to the original working tree.
cd ../..
# Delete the working tree.
git worktree remove .worktrees/backport

Then, create a pull request where the base branch is 3.1 and the compare/head branch is backport-5943-to-3.1.

@nabobalis nabobalis added the Still Needs Manual Backport This PR needs manually backporting label Apr 13, 2022
@nabobalis nabobalis removed the Still Needs Manual Backport This PR needs manually backporting label Apr 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merge When CI Passes Hit that merge button when it's all green! timeseries Affects the timeseries submodule
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing GBM files for GBMSummaryTimeSeries takes too long
4 participants