Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_data/store_data should copy arrays by default #654

Open
jameswilburlewis opened this issue Nov 27, 2023 · 3 comments
Open

get_data/store_data should copy arrays by default #654

jameswilburlewis opened this issue Nov 27, 2023 · 3 comments
Assignees
Labels
bug Something isn't working Priority 1 Highest task priority pytplot Issues involving the pytplot package

Comments

@jameswilburlewis
Copy link
Contributor

As things stand now, get_data returns a reference, rather than a copy, of the time and data arrays. So any modifications to the returned arrays will affect the stored values in the tplot variable, which is not at all the desired behavior.

There may be cases where the user knows the data won't be modified, and prefers a reference rather than a copy, to avoid the memory and performance penalty of making the copies. We can add a keyword to specify references rather than copies.

@jameswilburlewis jameswilburlewis added bug Something isn't working pytplot Issues involving the pytplot package Priority 1 Highest task priority labels Nov 27, 2023
@jameswilburlewis jameswilburlewis self-assigned this Nov 27, 2023
@xandrd
Copy link
Collaborator

xandrd commented Nov 27, 2023

As I said during the meeting, I assumed that it was a default behavior.
I had to use deepcopy at lease here

d = deepcopy(get_data(tvar)) # NOTE: Indexes are not the same as in SPEDAS, e.g. 27888x2x5

from copy import deepcopy
d = deepcopy(get_data(tvar))

There could be other places in the code and in other peoples' projects. I recommend to include a flag for deepcopy and during the major release have a note that the default behavior is changing. But it might brake things and it will definitely affect the performance.

@nickssl
Copy link
Contributor

nickssl commented Nov 27, 2023

Another problem with store_data is that it crashes when it is used with two tplot variables and one of them does not exist. IDL handles this correctly.

@jameswilburlewis
Copy link
Contributor Author

As mentioned at the meeting yesterday, we'll need to identify any code (e.g. mms utilities) that relies on the current behavior, and change those get_data calls to get direct references and not copies. This is a somewhat tricky problem...I can think of 3 approaches we might be able to use: 1) Static analysis (with help from ChatGPT). 2) Add some hooks to store_data/get_data/del_data to detect modifications to the stored data. 3) Ask Eric if we're stumped after the first two methods...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Priority 1 Highest task priority pytplot Issues involving the pytplot package
Projects
None yet
Development

No branches or pull requests

3 participants