-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace a bunch of independent collections with a dataclass #716
Conversation
In principle I like this more than lists, but do you know how this manages memory etc internally? In the original version instead of having separate lists and a dict for computes it was simply one large dict. I changed it to separate lists because I am unsure how the memory layout of a dictionary and similar structures works, so I feared that having a dictionary/class would require continuous memory and therefore copying around some very large data on ram rather often. |
If we could clarify the memory concerns we could directly use a DataContainer instead of a dictionary to hold all the lists if that is the plan for all in and output data. |
I'm not sure if I correctly understand, but values in dictionaries are not continuous in memory. The dict just stores pointers to the original values you assign to it. No copying takes place and the only overhead of the dictionary is a small piece of (continuous memory) for its buckets where the pointers to values are kept. You can check this by a = np.linspace(0, 1, 10_000)
d = {'a': a, 'b': a}
assert a is d['a'] is d['b']
assert id(a) == id(d['a']) == id(d['b']) |
Than nothing speaks against exchanging the independent lists with a dict / DataContainer imo. Is there an advantage of using the class proposed by @liamhuber compared to a datacontainer/dictionary? What I could imagine to be helpful is to store everything in dicts/DataContainers/classes based on the necessary treatment. What I do right now is to manually call the necessary np.matmul etc. for each part individually. What I could imagine is something like: As a first workaround for my memory problem I wanted to reduce the amount of steps within dump using OVITO and than noticed that pyiron uses and parses only direct coordinates, while ovito uses the cartesian values. Probably not the most common use case but right now parsing will fail if someone defines the dump command without direct unwrapped coordinates f.e. |
Yes, in general python only stores pointers and also lists are not continuous in memory. Therefore, |
Agreed, I don't see any hard barrier between the different implementations
IMO:
I'm not sure I totally understand. For now it looks like both this PR and the original PR are compatible with that, e.g. to_transform = [
dump.unwrapped_positions,
dump.forces,
...
]
no_ops = [
v for v in dump.computes.values()
] I could also imagine adding some sort of transformation method right to the dataclass, either by making separate dataclasses for each case and modifying |
Advantages:
DataContainer
to hold input and output a bit easierDisadvantages: