Conversation
There is no obvious advice to stop using this, but from reading around, the general tone seems to be to start moving away from its use.
As there are thousands of isotopes but only dozens of half-life units, if we pre-compute a value for each unit and store it, we can reference that rather than call the conversion function for each isotope.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #26 +/- ##
==========================================
- Coverage 99.73% 99.73% -0.01%
==========================================
Files 12 12
Lines 751 749 -2
==========================================
- Hits 749 747 -2
Misses 2 2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
We will use this simple script to quantify progress import timeit
from nuclearmasses.mass_table import MassTable
runs = 5
t = timeit.repeat("MassTable()", globals=globals(), number=1, repeat=runs)
print(f"{runs} runs: min={min(t):.6f}s, max={max(t):.6f}s, avg={sum(t)/runs:.6f}s")With the HEAD of main (ced0658) before the branch: |
|
This was used in early development for debugging. We don't use it now so remove. We can add back in if required.
As there are only ~20 different time units and we create a dictionary anyway, importing astropy just for that seemed like overkill. We have left the conversion function as it is and set up a hard coded dictionary that the function now uses.
The columns are misaligned for 92Br in the 2016 table so the values are not parsed correctly. Fix manually.
|
Use pyinstrument for targeted optimisations. Much simpler script from nuclearmasses.mass_table import MassTable
MassTable()Then run with pyinstrument -r html ./my_profile.py |
A large fraction of time in the initial read was doing the removal of any and all instances of the # character. If we use the fact that only string and object column types will contain this character after parsing with read_fwf we can run on fewer columns and don't need to use regex, saving time.
|
Can't replicate above values, particularly their ratios. Probably related to what else my machine was doing at the time. Wrapper script to checkout hash, run the timer and return to where we were for a list of commits. This should hopefully give a consistent set of outputs. #!/usr/bin/env bash
set -euo pipefail
HASHES=(
"ced0658"
"8535832"
"6524f56"
)
for h in "${HASHES[@]}"; do
git checkout "${h}" >/dev/null 2>&1
if output=$(python "${HOME}/tmp/nm/timer.py" 2>/dev/null); then
# pyinstrument -r html "${HOME}/tmp/nm/my_profile.py"
:
else
output="ERROR"
fi
git checkout - >/dev/null 2>&1
echo "$h $output"
doneCan't have the timer or profile scripts with the repo (unless I commit them) as it causes issues with the checkout. Running with the latest commit (6524f56) ced0658 5 runs: min=1.094489s, max=1.226982s, avg=1.149681s
8535832 5 runs: min=0.876762s, max=0.946023s, avg=0.893084s
6524f56 5 runs: min=0.732890s, max=0.746327s, avg=0.738968s |
The dictionaries are constant so can be shared by any and all instances of this class. Moving them into the class definition, but outside of the __init__ allows that to happen.
We were only using it for np.nan as a return value, we now use None.
The module does not need pytest to run, so do not list is as a required dependency. A user will need ruff to check any changes they make so while we are updating this file, add ruff as another optional dev dependency.
PR #26 has grown slightly so add more details of what has been done.
|
I think we have found all of the obvious optimisations, and made some good gains with an almost 30% speed-up. Final comparison |
Start looking for any easy wins to speed things up.
A few seconds to read 25 separate 3-6k line files isn't too bad, but now we have the functionality set up, working and tested, let's see if we can make some optimisations.