-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential memory leak using lightkurve.read
#1388
Comments
The problem exists for TESS SPOC lightcurves:
But the slightly older lightkurve
|
PR #1299 seems to have introduced the problem. Workarounds:
For lightkurve/src/lightkurve/io/generic.py Lines 37 to 39 in f8e8c16
Replace it with else:
hdulist = fits.open(filename) Essentially, it revert back to the old logic. The old logic has some issues. But I think they mostly manifest in edge cases. I don't understand why the new logic cause memory leak though. |
Confirmed that the memory leak indeed stems from the A sample script that demonstrates the problem. from glob import glob
import tracemalloc
import astropy
from astropy.io import fits
import lightkurve as lk
from copy import deepcopy
# import gc
# Show lk.read memory leak issues in
# https://github.com/lightkurve/lightkurve/issues/1388
print('lightkurve', lk.__version__)
print('astropy', astropy.__version__)
def read_lite(filename):
# simulate the logic in
# https://github.com/lightkurve/lightkurve/pull/1299
hdulist = None
with fits.open(filename) as hdulist:
hdulist = deepcopy(hdulist)
return hdulist
# files is a list of sample SPOC TESS lc
files = sorted(glob("spoc_samples/*_lc.fits"))
tracemalloc.start()
for i, f in enumerate(files[:]):
l = read_lite((f))
# print memory used so far in MB
a, b = tracemalloc.get_traced_memory()
print(i, a / 1e6, b / 1e6)
# del l
# gc.collect() Similar memory leak:
|
- revert PR#1299
* Fixed memleak for lc in #1388 * Fixed memleak for tpf in #1388 * add test for read HDUList * Explicit tests for read memory leaks (LC & TPF) - Run in memtest workflow in CI (pytest -m memtest --remote-data) * Test tpf.from_fits_images() to ensure no unclosed file handles * Revert lc.hdu change in PR #1299 * Revert raising ResourceWarning as error during tests in PR #1299 - For it to actually work (to ensure no unclosed files), "error::pytest.PytestUnraisableExceptionWarning" wil also be needed - but it'll create many false alarms. - Explicit tests on unclosed file handles is done in specific tests instead. * add changelog [skip ci]
commit 5e4c619 Author: Sam Lee <orionlee@users.noreply.github.com> Date: Tue Dec 5 08:21:02 2023 -0800 pytests: isolate astropy cache from user defaults (lightkurve#1391) commit eabc909 Author: Sam Lee <orionlee@users.noreply.github.com> Date: Tue Dec 5 08:20:24 2023 -0800 Support QLP changes in sectors 56+ (lightkurve#1392) * QLP sector 56+: handle default flux_err column * handle QLP-specific quality bitmask * docstring updates for QLP sectors 56+ * add changelog [skip ci] commit 68fdf03 Author: Sam Lee <orionlee@users.noreply.github.com> Date: Tue Dec 5 08:14:18 2023 -0800 Fix memory leak in reading LC/TPF (lightkurve#1390) * Fixed memleak for lc in lightkurve#1388 * Fixed memleak for tpf in lightkurve#1388 * add test for read HDUList * Explicit tests for read memory leaks (LC & TPF) - Run in memtest workflow in CI (pytest -m memtest --remote-data) * Test tpf.from_fits_images() to ensure no unclosed file handles * Revert lc.hdu change in PR lightkurve#1299 * Revert raising ResourceWarning as error during tests in PR lightkurve#1299 - For it to actually work (to ensure no unclosed files), "error::pytest.PytestUnraisableExceptionWarning" wil also be needed - but it'll create many false alarms. - Explicit tests on unclosed file handles is done in specific tests instead. * add changelog [skip ci] commit f8e8c16 Author: Christina Hedges <christina.l.hedges@nasa.gov> Date: Fri Nov 3 11:12:48 2023 -0400 updating to v2.5.0dev [skip ci] commit 1a6b7c2 Author: Christina Hedges <christina.l.hedges@nasa.gov> Date: Fri Nov 3 11:02:06 2023 -0400 release v2.4.2 [skip ci] commit 47cbfcf Author: Christina Hedges <christina.l.hedges@nasa.gov> Date: Fri Nov 3 11:01:05 2023 -0400 releasing v2.4.2 commit e3bd292 Author: Christina Hedges <14965634+christinahedges@users.noreply.github.com> Date: Fri Nov 3 01:59:04 2023 -0400 Revert "Update the stylefile 💅 (lightkurve#1311)" (lightkurve#1382) This reverts commit 3b9a0af. commit 7be1a0f Author: Christina Hedges <14965634+christinahedges@users.noreply.github.com> Date: Thu Nov 2 20:51:37 2023 -0400 fix changelog and version number commit 3b9a0af Author: Daniel <38233719+danhey@users.noreply.github.com> Date: Thu Nov 2 10:33:31 2023 -1000 Update the stylefile 💅 (lightkurve#1311) * update stylefile * mark edge colors * changed lightkurve plotting style * update merge conflict [skip ci] --------- Co-authored-by: Christina Hedges <christina.l.hedges@nasa.gov> commit 7a7dc2a Author: Sam Lee <orionlee@users.noreply.github.com> Date: Thu Nov 2 12:02:17 2023 -0700 Fix download error due to no dataURL in MAST result (lightkurve#1380) * Fix downlod error due to missing dataURL in MAST result. * handle changes in MAST for some Kepler search due to extra KBONUS-BKG - some (but not all) Kepler search returns an extra KBONUS-BKG * fixing searchresult ordering and HLSP URL * update changelog [skip ci] --------- Co-authored-by: Christina Hedges <christina.l.hedges@nasa.gov> commit b098e81 Author: Rebekah <rebekahhounsell@gmail.com> Date: Thu Nov 2 14:28:28 2023 -0400 Changed flux_raw to flux_corr for TASOC files (lightkurve#1333) * updated to corr * trying to re-initiate checks commit ce610e8 Author: Nschanche <nschanch@umd.edu> Date: Thu Nov 2 14:23:19 2023 -0400 Update searching-for-data-products (lightkurve#1370) Modified text to reflect search results commit dde5582 Author: H. Arda Güler <80536083+arda-guler@users.noreply.github.com> Date: Thu Nov 2 21:22:48 2023 +0300 Remove redundant conditional (lightkurve#1374) in regressioncorrector.py commit 4f2dbed Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Date: Thu Oct 5 10:31:37 2023 -0400 Bump actions/checkout from 3 to 4 (lightkurve#1367) Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v3...v4) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> commit 382fd3a Author: Christina Hedges <christina.l.hedges@nasa.gov> Date: Wed Sep 6 09:43:50 2023 -0400 fix log commit 8b24061 Author: Nschanche <nicole.e.schanche@nasa.gov> Date: Tue Sep 5 17:45:42 2023 -0400 Updated filename check to fix issue lightkurve#1358 (lightkurve#1364) commit 5a1d1d1 Author: Christina Hedges <14965634+christinahedges@users.noreply.github.com> Date: Tue Sep 5 17:45:01 2023 -0400 Jupyterhub support (lightkurve#1363) * Modify show/interact functions to automatically supply a callable for the notebook_url parameter to adapt to operating behind the a JupyterHub proxy with randomly generated ports for the Bokeh server. --------- Co-authored-by: jaytmiller <jmiller@stsci.edu> commit 6176eb0 Author: Christina Hedges <christina.l.hedges@nasa.gov> Date: Tue Sep 5 15:17:26 2023 -0400 fix numpy bug in search commit 7d485b6 Author: Zé Vinícius <jvmirca@gmail.com> Date: Wed Aug 23 04:42:53 2023 +0800 Raise a RuntimeError in case arclength cant be computed (lightkurve#1331) * check if arclength can be computed, else raise an error * handle both quantity and numpy arrays --------- Co-authored-by: Christina Hedges <christina.l.hedges@nasa.gov> commit e751c16 Author: Christina Hedges <14965634+christinahedges@users.noreply.github.com> Date: Tue Aug 22 16:42:35 2023 -0400 fixing CDIPs stitching bug (lightkurve#1361) commit 394246f Author: Zé Vinícius <jvmirca@gmail.com> Date: Wed Aug 23 03:41:00 2023 +0800 Expose n_iters to the pca method in DesignMatrix (lightkurve#1334) * expose n_iters from fbpca so that users can control accuracy of optimality * Update src/lightkurve/correctors/designmatrix.py Co-authored-by: Dan Foreman-Mackey <dfm@dfm.io> * updated changelog --------- Co-authored-by: Dan Foreman-Mackey <dfm@dfm.io> Co-authored-by: Christina Hedges <christina.l.hedges@nasa.gov> commit edcdd65 Author: Daniel <38233719+danhey@users.noreply.github.com> Date: Tue Aug 22 21:13:39 2023 +0200 fix outlier removal bug (lightkurve#1313) * fix outlier removal bug * update test * add nan to test * update comment --------- Co-authored-by: Christina Hedges <14965634+christinahedges@users.noreply.github.com> commit 8962d85 Author: Christina Hedges <14965634+christinahedges@users.noreply.github.com> Date: Tue Aug 22 15:12:38 2023 -0400 Updated lightkurve aperture functions to be compliant with numpy 1.25.0 (lightkurve#1360) * updated aperture function * fixing @Nschanche comments [skip ci] * update changelog [skip ci]
Problem description
I am using
lightkurve.read
to read the Kepler Bonus light curves from my filesystem. The total memory used increases with each new file, even as the variable the light curve is read into is overwritten. After reading a few hundred light curves, not even doing any light curve operations, several GB of memory are used. The behavior persists even when I delete the light curve after reading it in, and when forcing garbage collection.Example
prints the following to the terminal:
Expected behavior
I expect the total memory usage to increase for the first few files, but then level off as memory is freed up and reallocated. This is the output when I replace
lk.read
withastropy.table.Table.read
:Here, the maximum memory used never goes above about 11 GB.
Environment
The text was updated successfully, but these errors were encountered: