New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speeding up Fermi GBM data parsing for TimeSeries #5943
Conversation
for context - it now takes ~0.2 and ~2s to read the CSPEC and CTIME files, respectively compared to ~12, 200s beforehand. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the figure images that have changed (https://75296-2165383-gh.circle-artifacts.com/0/.tmp/py38-figure/figure_test_images/fig_comparison.html) it seems like the data being returned in the 800-2000 keV range is different now? If that's bugfix it should be documented in the changelog; if it's a newly introduced bug it should be fixed 😄
yes sorry I forgot to mention, I've slightly changed the energy edges used for binning (instead of using np.searchsort) and that's the differences here - but it is what we want. I'll update the changelog now! :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this fixes existing behaviour, approving. It seems odd to me that we're doing this kind of inaccurate reduction to the data though, instead of just returning what's in the file...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... The speed-up is great, but I'm a little uncomfortable with this change being a backported bugfix because it does change the approach to binning the data (which wasn't necessary for the speed-up). While the prior approach wasn't documented and could be considered nonintuitive, this PR's approach also isn't documented and could also be considered nonintuitive. At the very least, a note should be added to class's docstring to describe what precisely is being done.
Plus, someone should implement a proper rebinning that splits the data bins, and then the binning approach will change again.
ok yes, I agree - how about I change this PR to just be a speed up, and then another one that properly deals with the binning? |
How about we backport a separate PR with just the speed fix and leave the bin changes for 4.0? |
We can create a seperate changelog for the bin changes and label that breaking for 4.0? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dosctring for GBMSummaryTimeSeries
ought to elaborate about the rebinning approach. The existing statement "Note that the data is re-binned from the original 128..." does not have enough detail to discriminate between the old approach and the new approach. Given the lack of detail, a user might reasonably assume a rebinning approach that includes splitting bins, which is neither the old or the new approaches.
4e78b1b
to
c30c4e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the changes for a speed improvement alone, without modifying the rebinning scheme, so it is safe to backport. Given that the speed improvement is ~2 orders of magnitude, the changelog entry should probably be bugfix
instead of trivial
.
Co-authored-by: Albert Y. Shih <ayshih@gmail.com>
9b913f6
to
123c3d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add another bugfix
changelog entry:
Fixed bugs with the rebinning and per-keV calculation for Fermi/GBM summary lightcurves.
It turns out that the per-keV calculation is currently wrong, so since we have to bugfix the lightcurves already, I'm now on board with rolling in the better rebinning at the same time. Let's backport this |
ee2afe4
to
e956ecd
Compare
e956ecd
to
190cf85
Compare
Co-authored-by: Albert Y. Shih <ayshih@gmail.com>
190cf85
to
ffe12f0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweaks to the changelog entries, otherwise looks good (but, I wrote most of it...)
Co-authored-by: Albert Y. Shih <ayshih@gmail.com>
Doc fails are examples that all I can see. |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub.
git fetch
# Create new working tree.
git worktree add .worktrees/backport 3.0
# Navigate to the new directory.
cd .worktrees/backport
# Cherry-pick all the commits of this pull request and resolve the likely conflicts.
git cherry-pick e445aa6651f22af9687afd195577ac4d3c4aa56d f347b3fb29f0ef440283a9c774acf81a22f10560 5355f42ee97abd1b4053e97416b56ba5cc2e1d05 1e345e405339891eaea96fa7b18332b04cc5efae 03df401e118991e39bc2a93bb0fd363ae1eb5004 8f784408633ff409684822747e5bc74912e3f25c 123c3d90cf98feaa9327642c6fb44b7541eaa00d ffe12f02c0b718a5ca805d2fe60afb4ae1f317ef 4038ac4c925f5eec375f841f764f076703c463f0
# Create a new branch with these backported commits.
git checkout -b backport-5943-to-3.0
# Push it to GitHub.
git push --set-upstream origin backport-5943-to-3.0
# Go back to the original working tree.
cd ../..
# Delete the working tree.
git worktree remove .worktrees/backport Then, create a pull request where the |
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub.
git fetch
# Create new working tree.
git worktree add .worktrees/backport 3.1
# Navigate to the new directory.
cd .worktrees/backport
# Cherry-pick all the commits of this pull request and resolve the likely conflicts.
git cherry-pick e445aa6651f22af9687afd195577ac4d3c4aa56d f347b3fb29f0ef440283a9c774acf81a22f10560 5355f42ee97abd1b4053e97416b56ba5cc2e1d05 1e345e405339891eaea96fa7b18332b04cc5efae 03df401e118991e39bc2a93bb0fd363ae1eb5004 8f784408633ff409684822747e5bc74912e3f25c 123c3d90cf98feaa9327642c6fb44b7541eaa00d ffe12f02c0b718a5ca805d2fe60afb4ae1f317ef 4038ac4c925f5eec375f841f764f076703c463f0
# Create a new branch with these backported commits.
git checkout -b backport-5943-to-3.1
# Push it to GitHub.
git push --set-upstream origin backport-5943-to-3.1
# Go back to the original working tree.
cd ../..
# Delete the working tree.
git worktree remove .worktrees/backport Then, create a pull request where the |
PR Description
This fixes #5942 for which reading in fermi GBM files was taking an age and a half
PR Checklist
pytest
passes).Fixes #XXXX -
orCloses #XXXX -
comment to auto-close the issue that your PR addresses