Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure FITS filestream is closed on generic read #1299

Merged
merged 2 commits into from
May 16, 2023

Conversation

bmorris3
Copy link
Contributor

The generic light curve reader opens FITS files to determine how to parse the contents. The call to fits.open leaves a file stream open, which can prevent later and/or downstream calls from opening that same file. This workaround ensures the filestream is closed, and simply makes a copy of the list hdulist.

@bmorris3
Copy link
Contributor Author

bmorris3 commented Apr 18, 2023

With further testing, it looks like this particular failure may be the result of a mismatch between the astropy and lightkurve versions. If I update to astropy dev, no fix is needed. I'll keep investigating.

@Nschanche
Copy link
Collaborator

This may not work as a change. I think the data stream needs to remain open as we still need access to the data.
For example, line 49 calls Table.read(hdulist[ext], format="fits"). With the proposed change, this fails with the error, "ValueError: I/O operation on closed file".

@christinahedges
Copy link
Collaborator

@Nschanche @bmorris3 there are parts of the LightCurve and TargetPixelFile classes that rely on these files being open, as @Nschanche points out.

We could change the functionality so that the hdu attribute always reopens the file, rather than leaving it open. @Nschanche what do you think?

@bmorris3
Copy link
Contributor Author

I was wrong that my implementation was making a copy of the hdulist, which I've corrected in the latest commit (e50c2e2).

I see the hdu attribute challenge in the LC/TPF classes. I'll make another commit in a minute with a similar fix there. It doesn't look like an open file stream is ever needed, since lightkurve doesn't modify the original file on disk, only makes copies.

@bmorris3
Copy link
Contributor Author

I think in ab001e7 I've closed the remaining open file streams. If #1299 (comment) isn't correct and you do need open file streams anywhere, I'm happy to revise again.

@Nschanche
Copy link
Collaborator

Yes, I agree they don't need to be open, the data just needs to be accessible. The deep copy you added looks good to me.

@bmorris3
Copy link
Contributor Author

Thanks @Nschanche! Let me know if there's anything else you'd like added @christinahedges.

@christinahedges christinahedges removed the 🥜 easy to close This is easy to close! label Apr 25, 2023
@christinahedges
Copy link
Collaborator

Hey @bmorris3 I am a little confused on this PR and I want to be sure I understand, can you post a comment here with an example showing what the issue is you're encountering, and why you're suggesting this is a fix?

@dfm
Copy link
Collaborator

dfm commented May 1, 2023

I'm finding that this PR introduces one regression (something to do with pickling) that I'll try to track down, but otherwise I think this should be merged to catch memory leaks!

Edit: I'd say that it could be better to refactor such that we don't need to carry around a full copy of the HDUList (which should contain all the data in memory), but that's a much larger refactor than what is suggested here.

bmorris3 and others added 2 commits May 1, 2023 14:14
@bmorris3
Copy link
Contributor Author

bmorris3 commented May 1, 2023

@christinahedges I opened this PR because if you read a local FITS file with lightkurve, it keeps the file stream open. If other packages/scripts read that same FITS file after lightkurve has opened but not closed the file stream, they can crash.

This exact problem was occurring specifically in spacetelescope/lcviz#10, which caused two tests to fail. We opted to merge that PR after commenting out the failing tests, with the plan to uncomment them when this lightkurve PR is merged.

@bmorris3
Copy link
Contributor Author

bmorris3 commented May 5, 2023

@christinahedges Does that sound ok?

@tylerapritchard tylerapritchard added the 🥜 easy to close This is easy to close! label May 16, 2023
@christinahedges
Copy link
Collaborator

This looks good to me, I think @dfm is right that we need to eventually have a better solution for carrying around this fits file.

@christinahedges christinahedges merged commit 769baeb into lightkurve:main May 16, 2023
8 checks passed
@orionlee
Copy link
Collaborator

orionlee commented Nov 19, 2023

@bmorris3 What are the use cases that leaving the file stream open become a problem?

My understanding is that normally, once a LightCurve object is longer in use, the file stream will be closed. Simultaneous read from multiple clients on the same underlying file is not a problem (at least on Windows / Linux).

The workaround here, with deepcopy(hdulist), causes memory leak in typical (if not all) cases. See #1388 (comment).


One scenario that leaving the file stream open would cause problem is in opening corrupted FITS file on Windows platform. For that specific case, another solution can be used (ensure the generic reader close hdulist upon any exception during read).

@bmorris3
Copy link
Contributor Author

@orionlee The problematic case is explained here: #1299 (comment).

@orionlee
Copy link
Collaborator

@bmorris3 Just to be sure, are you referring to the test failures in #1299 (comment) , specifically, the ResourceWarning about unclosed file? E.g.

78: PytestUnraisableExceptionWarning: Exception ignored in: <_io.FileIO [closed]>

  Traceback (most recent call last):
    File "C:\dev\lightkurve\src\lightkurve\io\kepler.py", line 34, in read_kepler_lightcurve     
      lc = read_generic_lightcurve(
  ResourceWarning: unclosed file <_io.BufferedReader name='C:\\Users\\SL\\.astropy\\cache\\download\\url\\0dd9ac829437a6265e20d71bc15c9e18\\contents'>

orionlee added a commit to orionlee/lightkurve that referenced this pull request Nov 23, 2023
orionlee added a commit to orionlee/lightkurve that referenced this pull request Nov 23, 2023
orionlee added a commit to orionlee/lightkurve that referenced this pull request Nov 23, 2023
…#1299

- For it to actually work (to ensure no unclosed files), "error::pytest.PytestUnraisableExceptionWarning" wil also be needed
- but it'll create many false alarms.
- Explicit tests on unclosed file handles is done in specific tests instead.
orionlee added a commit to orionlee/lightkurve that referenced this pull request Nov 25, 2023
orionlee added a commit to orionlee/lightkurve that referenced this pull request Nov 25, 2023
…#1299

- For it to actually work (to ensure no unclosed files), "error::pytest.PytestUnraisableExceptionWarning" wil also be needed
- but it'll create many false alarms.
- Explicit tests on unclosed file handles is done in specific tests instead.
orionlee added a commit that referenced this pull request Dec 5, 2023
* Fixed memleak for lc in #1388

* Fixed memleak for tpf in #1388

* add test for read HDUList

* Explicit tests for read memory leaks (LC & TPF)
- Run in memtest workflow in CI (pytest -m memtest --remote-data)

* Test tpf.from_fits_images() to ensure no unclosed file handles

* Revert lc.hdu change in PR #1299

* Revert raising ResourceWarning as error during tests in PR #1299
- For it to actually work (to ensure no unclosed files), "error::pytest.PytestUnraisableExceptionWarning" wil also be needed
- but it'll create many false alarms.
- Explicit tests on unclosed file handles is done in specific tests instead.

* add changelog  [skip ci]
danhey added a commit to danhey/lightkurve that referenced this pull request Dec 13, 2023
commit 5e4c619
Author: Sam Lee <orionlee@users.noreply.github.com>
Date:   Tue Dec 5 08:21:02 2023 -0800

    pytests: isolate astropy cache from user defaults (lightkurve#1391)

commit eabc909
Author: Sam Lee <orionlee@users.noreply.github.com>
Date:   Tue Dec 5 08:20:24 2023 -0800

    Support QLP changes in sectors 56+ (lightkurve#1392)

    * QLP sector 56+: handle default flux_err column

    * handle QLP-specific quality bitmask

    * docstring updates for QLP sectors 56+

    * add changelog [skip ci]

commit 68fdf03
Author: Sam Lee <orionlee@users.noreply.github.com>
Date:   Tue Dec 5 08:14:18 2023 -0800

    Fix memory leak in reading LC/TPF  (lightkurve#1390)

    * Fixed memleak for lc in lightkurve#1388

    * Fixed memleak for tpf in lightkurve#1388

    * add test for read HDUList

    * Explicit tests for read memory leaks (LC & TPF)
    - Run in memtest workflow in CI (pytest -m memtest --remote-data)

    * Test tpf.from_fits_images() to ensure no unclosed file handles

    * Revert lc.hdu change in PR lightkurve#1299

    * Revert raising ResourceWarning as error during tests in PR lightkurve#1299
    - For it to actually work (to ensure no unclosed files), "error::pytest.PytestUnraisableExceptionWarning" wil also be needed
    - but it'll create many false alarms.
    - Explicit tests on unclosed file handles is done in specific tests instead.

    * add changelog  [skip ci]

commit f8e8c16
Author: Christina Hedges <christina.l.hedges@nasa.gov>
Date:   Fri Nov 3 11:12:48 2023 -0400

    updating to v2.5.0dev [skip ci]

commit 1a6b7c2
Author: Christina Hedges <christina.l.hedges@nasa.gov>
Date:   Fri Nov 3 11:02:06 2023 -0400

    release v2.4.2 [skip ci]

commit 47cbfcf
Author: Christina Hedges <christina.l.hedges@nasa.gov>
Date:   Fri Nov 3 11:01:05 2023 -0400

    releasing v2.4.2

commit e3bd292
Author: Christina Hedges <14965634+christinahedges@users.noreply.github.com>
Date:   Fri Nov 3 01:59:04 2023 -0400

    Revert "Update the stylefile 💅 (lightkurve#1311)" (lightkurve#1382)

    This reverts commit 3b9a0af.

commit 7be1a0f
Author: Christina Hedges <14965634+christinahedges@users.noreply.github.com>
Date:   Thu Nov 2 20:51:37 2023 -0400

    fix changelog and version number

commit 3b9a0af
Author: Daniel <38233719+danhey@users.noreply.github.com>
Date:   Thu Nov 2 10:33:31 2023 -1000

    Update the stylefile 💅 (lightkurve#1311)

    * update stylefile

    * mark edge colors

    * changed lightkurve plotting style

    * update merge conflict [skip ci]

    ---------

    Co-authored-by: Christina Hedges <christina.l.hedges@nasa.gov>

commit 7a7dc2a
Author: Sam Lee <orionlee@users.noreply.github.com>
Date:   Thu Nov 2 12:02:17 2023 -0700

    Fix download error due to no dataURL in MAST result (lightkurve#1380)

    * Fix downlod error due to missing dataURL in MAST result.

    * handle changes in MAST for some Kepler search due to extra KBONUS-BKG
    - some (but not all) Kepler search returns an extra KBONUS-BKG

    * fixing searchresult ordering and HLSP URL

    * update changelog [skip ci]

    ---------

    Co-authored-by: Christina Hedges <christina.l.hedges@nasa.gov>

commit b098e81
Author: Rebekah <rebekahhounsell@gmail.com>
Date:   Thu Nov 2 14:28:28 2023 -0400

    Changed flux_raw to flux_corr for TASOC files (lightkurve#1333)

    * updated to corr

    * trying to re-initiate checks

commit ce610e8
Author: Nschanche <nschanch@umd.edu>
Date:   Thu Nov 2 14:23:19 2023 -0400

    Update searching-for-data-products (lightkurve#1370)

    Modified text to reflect search results

commit dde5582
Author: H. Arda Güler <80536083+arda-guler@users.noreply.github.com>
Date:   Thu Nov 2 21:22:48 2023 +0300

    Remove redundant conditional (lightkurve#1374)

    in regressioncorrector.py

commit 4f2dbed
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Thu Oct 5 10:31:37 2023 -0400

    Bump actions/checkout from 3 to 4 (lightkurve#1367)

    Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4.
    - [Release notes](https://github.com/actions/checkout/releases)
    - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
    - [Commits](actions/checkout@v3...v4)

    ---
    updated-dependencies:
    - dependency-name: actions/checkout
      dependency-type: direct:production
      update-type: version-update:semver-major
    ...

    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

commit 382fd3a
Author: Christina Hedges <christina.l.hedges@nasa.gov>
Date:   Wed Sep 6 09:43:50 2023 -0400

    fix log

commit 8b24061
Author: Nschanche <nicole.e.schanche@nasa.gov>
Date:   Tue Sep 5 17:45:42 2023 -0400

    Updated filename check to fix issue lightkurve#1358 (lightkurve#1364)

commit 5a1d1d1
Author: Christina Hedges <14965634+christinahedges@users.noreply.github.com>
Date:   Tue Sep 5 17:45:01 2023 -0400

    Jupyterhub support (lightkurve#1363)

    * Modify show/interact functions to automatically supply a callable
    for the notebook_url parameter to adapt to operating behind the a
    JupyterHub proxy with randomly generated ports for the Bokeh server.
    ---------

    Co-authored-by: jaytmiller <jmiller@stsci.edu>

commit 6176eb0
Author: Christina Hedges <christina.l.hedges@nasa.gov>
Date:   Tue Sep 5 15:17:26 2023 -0400

    fix numpy bug in search

commit 7d485b6
Author: Zé Vinícius <jvmirca@gmail.com>
Date:   Wed Aug 23 04:42:53 2023 +0800

    Raise a RuntimeError in case arclength cant be computed (lightkurve#1331)

    * check if arclength can be computed, else raise an error

    * handle both quantity and numpy arrays

    ---------

    Co-authored-by: Christina Hedges <christina.l.hedges@nasa.gov>

commit e751c16
Author: Christina Hedges <14965634+christinahedges@users.noreply.github.com>
Date:   Tue Aug 22 16:42:35 2023 -0400

    fixing CDIPs stitching bug (lightkurve#1361)

commit 394246f
Author: Zé Vinícius <jvmirca@gmail.com>
Date:   Wed Aug 23 03:41:00 2023 +0800

    Expose n_iters to the pca method in DesignMatrix (lightkurve#1334)

    * expose n_iters from fbpca so that users can control accuracy of optimality

    * Update src/lightkurve/correctors/designmatrix.py

    Co-authored-by: Dan Foreman-Mackey <dfm@dfm.io>

    * updated changelog

    ---------

    Co-authored-by: Dan Foreman-Mackey <dfm@dfm.io>
    Co-authored-by: Christina Hedges <christina.l.hedges@nasa.gov>

commit edcdd65
Author: Daniel <38233719+danhey@users.noreply.github.com>
Date:   Tue Aug 22 21:13:39 2023 +0200

    fix outlier removal bug (lightkurve#1313)

    * fix outlier removal bug

    * update test

    * add nan to test

    * update comment

    ---------

    Co-authored-by: Christina Hedges <14965634+christinahedges@users.noreply.github.com>

commit 8962d85
Author: Christina Hedges <14965634+christinahedges@users.noreply.github.com>
Date:   Tue Aug 22 15:12:38 2023 -0400

    Updated lightkurve aperture functions to be compliant with numpy 1.25.0 (lightkurve#1360)

    * updated aperture function

    * fixing @Nschanche comments [skip ci]

    * update changelog [skip ci]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🥜 easy to close This is easy to close!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants