Convert dtype=object arrays if possible #518

niklassiemer · 2021-11-15T17:53:53Z

No description provided.

pyiron_base/generic/hdfio.py

niklassiemer · 2021-11-15T19:57:25Z

pyiron_base/generic/hdfio.py

+    #TODO: remove this function upon 1.0.0 release
+    @staticmethod
+    def _convert_dtype_obj_array(obj: np.ndarray):
+        result = np.array(obj.tolist())


If someone has a better function to do this, I would be happy. Especially, since the docstring states

Notes ----- The array may be recreated via ``a = np.array(a.tolist())``, although this may sometimes lose precision.

I wouldn't be too worried. I think they mean that you might get a conversion from int64 down to int32. AFAIK we don't have any users who care about using long int/floats (or short ones for more memory efficiency), so any sloppiness here should be perfectly safe.

liamhuber

lgtm

liamhuber · 2021-11-16T08:14:34Z

pyiron_base/generic/hdfio.py

+    #TODO: remove this function upon 1.0.0 release
+    @staticmethod
+    def _convert_dtype_obj_array(obj: np.ndarray):
+        result = np.array(obj.tolist())


I wouldn't be too worried. I think they mean that you might get a conversion from int64 down to int32. AFAIK we don't have any users who care about using long int/floats (or short ones for more memory efficiency), so any sloppiness here should be perfectly safe.

liamhuber · 2021-11-16T08:15:08Z

pyiron_base/generic/hdfio.py

+                                 f"Returned array was converted from dtype='O' to dtype={result.dtype} "
+                                 f"via `np.array(result.tolist())`.\n"
+                                 f"Please run rewrite_hdf5() to update this data! "
+                                 f"To update all your data run update tool.")


On the PR that introduces the tool we need to remember to come back and reference it here.

pmrv · 2021-11-16T17:12:38Z

Am currently working on identifying affected projects and testing the update bot. I can verify that normal lammps jobs submitted with 0.3.8 are affected, e.g. when you do

>>> !h5ls -r {j.project_hdf5.file_name}/{j.name}/output/generic
/cells                   Group
/cells/data              Dataset {3003, 3}
/cells/index             Dataset {1001}
/energy_pot              Dataset {1001}
/energy_tot              Dataset {1001}
/forces                  Group
/forces/data             Dataset {54054, 3}
/forces/index            Dataset {1001}
/indices                 Group
/indices/data            Dataset {54054}
/indices/index           Dataset {1001}
/positions               Group
/positions/data          Dataset {54054, 3}
/positions/index         Dataset {1001}
/pressures               Group
/pressures/data          Dataset {3003, 3}
/pressures/index         Dataset {1001}
/steps                   Dataset {1001}
/temperature             Dataset {1001}
/unwrapped_positions     Group
/unwrapped_positions/data Dataset {54054, 3}
/unwrapped_positions/index Dataset {1001}
/velocities              Group
/velocities/data         Dataset {54054, 3}
/velocities/index        Dataset {1001}
/volume                  Dataset {1001}

instead of this with this branch

>>> !h5ls -r {j.project_hdf5.file_name}/{j.name}/output/generic
/cells                   Dataset {1001, 3, 3}
/energy_pot              Dataset {1001}
/energy_tot              Dataset {1001}
/forces                  Dataset {1001, 54, 3}
/indices                 Dataset {1001, 54}
/positions               Dataset {1001, 54, 3}
/pressures               Dataset {1001, 3, 3}
/steps                   Dataset {1001}
/temperature             Dataset {1001}
/unwrapped_positions     Dataset {1001, 54, 3}
/velocities              Dataset {1001, 54, 3}
/volume                  Dataset {1001}

pmrv · 2021-11-16T17:16:44Z

Then with the update script I just pushed I get the 'correct' thing again

>>> !python /u/zora/software/pyiron_base/update_scripts/pyiron_base_0.3_to_0.4.py {pr.path}

100%|█████████████████████████████████████████████| 1/1 [00:00<00:00,  3.66it/s]

>>> !h5ls -r {j.project_hdf5.file_name}/{j.name}/output/generic

/cells                   Dataset {1001, 3, 3}
/energy_pot              Dataset {1001}
/energy_tot              Dataset {1001}
/forces                  Dataset {1001, 54, 3}
/indices                 Dataset {1001, 54}
/positions               Dataset {1001, 54, 3}
/pressures               Dataset {1001, 3, 3}
/steps                   Dataset {1001}
/temperature             Dataset {1001}
/unwrapped_positions     Dataset {1001, 54, 3}
/velocities              Dataset {1001, 54, 3}
/volume                  Dataset {1001}

pmrv · 2021-11-16T17:30:28Z

I'd like to test it on a larger project, but I'm running into the other bug at #517 with this branch, so it will have to wait until tomorrow.

niklassiemer · 2021-11-16T17:57:52Z

Am currently working on identifying affected projects and testing the update bot. I can verify that normal lammps jobs submitted with 0.3.8 are affected, e.g. when you do

>>> !h5ls -r {j.project_hdf5.file_name}/{j.name}/output/generic
/cells                   Group
/cells/data              Dataset {3003, 3}
/cells/index             Dataset {1001}
/energy_pot              Dataset {1001}
/energy_tot              Dataset {1001}
/forces                  Group
/forces/data             Dataset {54054, 3}
/forces/index            Dataset {1001}
/indices                 Group
/indices/data            Dataset {54054}
/indices/index           Dataset {1001}
/positions               Group
/positions/data          Dataset {54054, 3}
/positions/index         Dataset {1001}
/pressures               Group
/pressures/data          Dataset {3003, 3}
/pressures/index         Dataset {1001}
/steps                   Dataset {1001}
/temperature             Dataset {1001}
/unwrapped_positions     Group
/unwrapped_positions/data Dataset {54054, 3}
/unwrapped_positions/index Dataset {1001}
/velocities              Group
/velocities/data         Dataset {54054, 3}
/velocities/index        Dataset {1001}
/volume                  Dataset {1001}

instead of this with this branch

>>> !h5ls -r {j.project_hdf5.file_name}/{j.name}/output/generic
/cells                   Dataset {1001, 3, 3}
/energy_pot              Dataset {1001}
/energy_tot              Dataset {1001}
/forces                  Dataset {1001, 54, 3}
/indices                 Dataset {1001, 54}
/positions               Dataset {1001, 54, 3}
/pressures               Dataset {1001, 3, 3}
/steps                   Dataset {1001}
/temperature             Dataset {1001}
/unwrapped_positions     Dataset {1001, 54, 3}
/velocities              Dataset {1001, 54, 3}
/volume                  Dataset {1001}

Now I am slightly confused. You are talking about different hdf files produced with 0.3.8 and this branch, which should behave like the current master in this respect? If yes, then I am fine :)

update_scripts/pyiron_base_0.3_to_0.4.py

Co-authored-by: Niklas Siemer <70580458+niklassiemer@users.noreply.github.com>

pmrv · 2021-11-16T19:13:52Z

Am currently working on identifying affected projects and testing the update bot. I can verify that normal lammps jobs submitted with 0.3.8 are affected, e.g. when you do

>>> !h5ls -r {j.project_hdf5.file_name}/{j.name}/output/generic
/cells                   Group
/cells/data              Dataset {3003, 3}
/cells/index             Dataset {1001}
/energy_pot              Dataset {1001}
/energy_tot              Dataset {1001}
/forces                  Group
/forces/data             Dataset {54054, 3}
/forces/index            Dataset {1001}
/indices                 Group
/indices/data            Dataset {54054}
/indices/index           Dataset {1001}
/positions               Group
/positions/data          Dataset {54054, 3}
/positions/index         Dataset {1001}
/pressures               Group
/pressures/data          Dataset {3003, 3}
/pressures/index         Dataset {1001}
/steps                   Dataset {1001}
/temperature             Dataset {1001}
/unwrapped_positions     Group
/unwrapped_positions/data Dataset {54054, 3}
/unwrapped_positions/index Dataset {1001}
/velocities              Group
/velocities/data         Dataset {54054, 3}
/velocities/index        Dataset {1001}
/volume                  Dataset {1001}

instead of this with this branch

>>> !h5ls -r {j.project_hdf5.file_name}/{j.name}/output/generic
/cells                   Dataset {1001, 3, 3}
/energy_pot              Dataset {1001}
/energy_tot              Dataset {1001}
/forces                  Dataset {1001, 54, 3}
/indices                 Dataset {1001, 54}
/positions               Dataset {1001, 54, 3}
/pressures               Dataset {1001, 3, 3}
/steps                   Dataset {1001}
/temperature             Dataset {1001}
/unwrapped_positions     Dataset {1001, 54, 3}
/velocities              Dataset {1001, 54, 3}
/volume                  Dataset {1001}

Now I am slightly confused. You are talking about different hdf files produced with 0.3.8 and this branch, which should behave like the current master in this respect? If yes, then I am fine :)

Yes, everything after #503 should write as in the second example, everything before (and 0.3.8) as in the first example.

niklassiemer · 2021-11-16T21:07:02Z

Although I canceled the windows-latest 3.9 test, the output shows the complete test suite with a ok. Thus, I take this as passing test! Codacy complaint is irrelevant. Therefore, @pmrv merge once you could test it.

niklassiemer · 2021-11-17T19:59:57Z

We really should hurry a bit and merge this and #519 and release 0.4.0 tomorrow. We need the writing of dtype=object arrays to be fixed and the h5io issue solved also on our conda release!

update_scripts/pyiron_base_0.3_to_0.4.py

Co-authored-by: Niklas Siemer <70580458+niklassiemer@users.noreply.github.com>

niklassiemer · 2021-11-18T16:04:06Z

@pmrv Do you want to add something to the robot already or leave it as it is and make it faster in the next release?

Convert dtype=object arrays if possible

496bbe6

niklassiemer commented Nov 15, 2021

View reviewed changes

Update pyiron_base/generic/hdfio.py

ca73ecc

niklassiemer requested review from pmrv and liamhuber November 16, 2021 06:09

liamhuber approved these changes Nov 16, 2021

View reviewed changes

Add script that rewrites all HDF5 files in a project

65a482c

Update warning to reference update script

1a96671

niklassiemer commented Nov 16, 2021

View reviewed changes

update_scripts/pyiron_base_0.3_to_0.4.py Outdated Show resolved Hide resolved

Correct version number

c205038

Co-authored-by: Niklas Siemer <70580458+niklassiemer@users.noreply.github.com>

niklassiemer mentioned this pull request Nov 17, 2021

Maintenance update #525

Closed

niklassiemer commented Nov 17, 2021

View reviewed changes

update_scripts/pyiron_base_0.3_to_0.4.py Outdated Show resolved Hide resolved

Fix version number

d9b45b9

Co-authored-by: Niklas Siemer <70580458+niklassiemer@users.noreply.github.com>

pmrv mentioned this pull request Nov 18, 2021

job_table() returning empty dataframe #517

Closed

pmrv added 3 commits November 18, 2021 18:08

Use tqdm to show progress; do not abort on error of a single job

14508dc

Check if bug is present before rewriting & read file size before rewrite

7aeff22

Catch error when file does not exist; track skipped jobs

81874ad

niklassiemer merged commit 9253669 into master Nov 18, 2021

delete-merged-branch bot deleted the backwards_hdf_dtype_obj branch November 18, 2021 17:17

liamhuber mentioned this pull request Jun 21, 2023

Numpy update fails tests...what was intended? #1143

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert dtype=object arrays if possible #518

Convert dtype=object arrays if possible #518

niklassiemer commented Nov 15, 2021

niklassiemer Nov 15, 2021

liamhuber Nov 16, 2021

liamhuber left a comment

liamhuber Nov 16, 2021

liamhuber Nov 16, 2021

liamhuber Nov 16, 2021

pmrv commented Nov 16, 2021 •

edited

Loading

pmrv commented Nov 16, 2021 •

edited

Loading

pmrv commented Nov 16, 2021

niklassiemer commented Nov 16, 2021

pmrv commented Nov 16, 2021

niklassiemer commented Nov 16, 2021

niklassiemer commented Nov 17, 2021

niklassiemer commented Nov 18, 2021

Convert dtype=object arrays if possible #518

Convert dtype=object arrays if possible #518

Conversation

niklassiemer commented Nov 15, 2021

niklassiemer Nov 15, 2021

Choose a reason for hiding this comment

liamhuber Nov 16, 2021

Choose a reason for hiding this comment

liamhuber left a comment

Choose a reason for hiding this comment

liamhuber Nov 16, 2021

Choose a reason for hiding this comment

liamhuber Nov 16, 2021

Choose a reason for hiding this comment

liamhuber Nov 16, 2021

Choose a reason for hiding this comment

pmrv commented Nov 16, 2021 • edited Loading

pmrv commented Nov 16, 2021 • edited Loading

pmrv commented Nov 16, 2021

niklassiemer commented Nov 16, 2021

pmrv commented Nov 16, 2021

niklassiemer commented Nov 16, 2021

niklassiemer commented Nov 17, 2021

niklassiemer commented Nov 18, 2021

pmrv commented Nov 16, 2021 •

edited

Loading

pmrv commented Nov 16, 2021 •

edited

Loading