Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce RAM cost for large datasets #21

Open
taylorbell57 opened this issue Jun 12, 2020 · 1 comment
Open

Reduce RAM cost for large datasets #21

taylorbell57 opened this issue Jun 12, 2020 · 1 comment
Labels
enhancement New feature or request

Comments

@taylorbell57
Copy link
Collaborator

Investigate the possibility of using dask when performing photometry to avoid the need to simultaneously load all the data into RAM (especially for gigantic datasets like HD 189733b and HD 209458b). Could use dask.array.clip combined with a median and std function as something that approximates astropy's sigma clipping. Might be too time costly to be the default, but should be allowed for analyses on weaker computers. This is a lower priority though since only 4 previously data sets have so much data that they would overflow 8 GB of RAM.

If this is affecting you and you would like to use SPCA for your work, please reach out and we can prioritize this!

@taylorbell57 taylorbell57 added the enhancement New feature or request label Jun 12, 2020
@taylorbell57
Copy link
Collaborator Author

Lines 441-449 of Photometry_Common.py cause a RAM spike - this is caused by sorting on line 449; can just do this after running the photometry since we really don't care about the order of the frames before binning or doing a highpass filter. Setting results=None also does nothing. Sigma clipping on line 455 also causes a large RAM spike - changing this to a nested for-loop to sigma clip one pixel index at a time would be a bit slower and/or more CPU intensive but doesn't use any more RAM than necessary. Subtracting the background causes an even larger RAM spike which is what eventually crashes the code for the 55 Cnc e observations. Unclear why that RAM spike happens, but it happens when calling Pool(ncpu) which doesn't make sense to me. I'll continue looking into this later

taylorbell57 added a commit that referenced this issue Feb 17, 2021
Switching to a nested for-loop rather than an array operation for sigma clipping to significantly reduce RAM usage while only marginally increasing runtime. This change partially addresses Issue #21 by removing one of the largest RAM spikes. Still need to move array sorting to the photometry routines and investigate background subtraction for RAM spikes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant