Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use _ensure_str_array_size in FlattenedStorage.extend #1411

Merged
merged 2 commits into from
Apr 17, 2024
Merged

Use _ensure_str_array_size in FlattenedStorage.extend #1411

merged 2 commits into from
Apr 17, 2024

Conversation

pmrv
Copy link
Contributor

@pmrv pmrv commented Apr 15, 2024

Should fix #1407.

@Leimeroth Can you give this a try and add your example as a test case?

@pmrv pmrv requested a review from Leimeroth April 15, 2024 10:10
@pmrv pmrv added the format_black reformat the code using the black standard label Apr 15, 2024
@Leimeroth
Copy link
Member

It works, but I am actually confused why. I am adding the EAM example for the publication notebook and the trainingcontainer has a byte encoded str (dtype.char S). I guess it is accessed in a way that decodes it when reading it from the hdf5 container. However, I am not sure whether this would always be the case, as right now the problem is only the 'identifier' which is defined as a property also.

@pmrv
Copy link
Contributor Author

pmrv commented Apr 15, 2024

HDF5 wants to store unicode as UTF-8, but numpy does UCS-32 (dtype 'U') internally, so the to_hdf/from_hdf convert raw UTF-8 encoded bytestrings (dtype 'S') for the file.

Copy link
Member

@jan-janssen jan-janssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@jan-janssen
Copy link
Member

Does it make sense to submit this fix to h5io?

@jan-janssen
Copy link
Member

@pmrv Can this be merged? I would like to release a new pyiron_base version and would like to include this pull request.

@pmrv pmrv merged commit 00d7b44 into main Apr 17, 2024
25 checks passed
@pmrv pmrv deleted the flatte branch April 17, 2024 16:47
@pmrv
Copy link
Contributor Author

pmrv commented Apr 21, 2024

Does it make sense to submit this fix to h5io?

Not sure what you mean, this way of storing unicode string arrays in HDF in general should be in h5io?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
format_black reformat the code using the black standard
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flattened storage extend cutting off str
4 participants