From beb66c7f6abca6dfa41d79ac73932a59f4940256 Mon Sep 17 00:00:00 2001 From: ZekeUnterberg Date: Sun, 13 Dec 2020 19:38:02 -0500 Subject: [PATCH] update fork (#2) * no copy when loading astropy * typo * Bumping to 3.4.4 * Package license and requirements files * Bump to 3.4.5 * Added some tests to Travis CI and AppVeyor that check if hickle can be properly packaged up, distributed and installed. * And maybe I should not forget twine in the requirements_test.txt. * Moved the tests directory from root to ./hickle. * Added missing tests folder to the MANIFEST.in file. * Add dict-type permanently to types_dict. * Subclasses of supported types can now be pickled as well (although not yet with their proper type). * Removed all cases of saving dtypes as a single element list. * Renamed the 'type' attribute to 'base_type' in preparation for adding subclass support. * Also make sure that strings are saved as single elements. * The types_dict now uses tuples with the create-functions and the hkl_dtype key. * All create-functions now take an extra 'base_type' string, that describes what the hkl_type is that will be saved to HDF5. * Groups also obtain their base_type from the create_dataset_lookup()-function now. * The actual type of a hickled object is now saved as well (in pickled form). * Finalized implementing support for subclasses. * Coveralls -> codecov.io * Add codecov.io badge * The order of the dict item keys are now saved as well. Any dict that is loaded will be initialized with the items sorted in that order. For all types that derive from dict, the dict will be initialized using its true type directly (apparently, I had written it that way before already for some reason). This fixes #65. * Hickle's required HDF5 attributes are now solely applied to the data group that contains the hickled Python object, instead of the entire file (this allows for hickled objects to be more easily added to already existing HDF5-files, without messing up the root of that file). * Datasets and groups now solely use 'group_xxx' if there is more than a single item at that level. All 'create_xxx' functions are now passed a string containing the name of the group/dataset that they should create. * Added forgotten base_key_type attribute to PyContainer. * Reverted working tree back to before using 'track_order'. Added missing 'six' requirement. Added a test for testing the dumping and loading of an OrderedDict. * The root of a hickled group is no longer read now, as it is not necessary. Removed the auxiliary attributes that were required for reading it. The true type of a dict key is now saved as well, even though it is not used for anything (simply saving it now in case we want to use it later). * The version is now stored using a single Python file, whose strings are read using regex. * HDF5 groups can now be given as a file_obj as well when dumping and loading. Providing a path that does not start with '/' will automatically add it to it. Added tests for these functionalities. * Arbitrary-precision integers can now be dumped and loaded properly. * Also make sure that 'long' dtypes are taken into account on Python 2. * make hickle work with pathlib.Path Basically, any package /module that saves to file supports this too (including `h5py`). * make Python 2 compatible * Changed wording. * Added six requirement, and added minimum versions for all requirements. * Now 'dill' is always used for dumping and loading serialized objects. Added a test for dumping and loading a local session function. * Add support for actually loading serialized data in Python 2. * Add new test to main as well. * Make sure new changes are also used for Python 2 * Update file_opener re #123 * Fixed documentation of `dump` and `load` to be NumPy doc style (and look a bit better). Replaced broken `pickle` documentation link with a proper one. * Only lists and tuples are now recognized as acceptable iterables. All other iterables are either handled separately, or simply pickled. * Changed the lookup system to use object typing for more consistency. * Added test for detecting the problem raised in telegraphic/hickle#125 * Added support for hickling dicts with slashes in their dict keys. * Make sure that functional backslashes still work properly in dict keys. * Loaders are now only loaded when they are required for dumping or loading a specific object. * Make sure to do proper future import. * Raise an error if a dict item key contains a double backslash. * Only filter out import errors due to the loader not being found. * As Python 2 apparently only reports the last part of the name of a non-importable module, search for something a bit more specific. * Some small QoL changes. * The py_type of a pickled object is no longer saved to HDF5, as it is not necessary to restore the object's original state. * Removed legacy support for v1 and v2. Added start of legacy support for v3. v4 now stores its HDF5 attributes using a 'HICKLE_' prefix, to allow users to add attributes to the group without interference. * Objects can now be hickled to existing HDF5 groups, as long as they don't contain any datasets or groups themselves. * Made sure that v3 always uses relative imports, to make it easier to port any functionality change from v3 to v4. (Even though I am not a fan of relative imports) * The version is now stored using a single Python file, whose strings are read using regex. * Backported change to version store location to v3 as well. Bumped version to 3.4.7 to include the latest changes. * Removed support for Python 2. Added legacy support for hickle v3 (currently uses v3.4.7). * Remove testing for Python 2.7 as well. * Always specify the mode with which to open an HDF5-file. * Test requirements updates. * And make sure to always execute 'pytest'. * Removed basically everything that has to do with Python 2.7 from v4. As legacy_v3 is expected to be able to load files made with Python 2.7, these are not changed. * Many, many QoL changes. Converted all v4 files to be PEP8 compliant. Rewritten 'load_python3' into 'load_builtins'. The 'load_numpy' and 'load_builtins' modules are only loaded when they are required, like all other loaders. Removed the 'containers_type_dict' as the true container type is already saved anyway. Astropy's classes are now reinitialized using their true type instantly. Astropy constants can now be properly dumped and loaded. * Save types of dict keys as a normal string as well. * Some minor improvements. * Added test for opening binary files, and make sure that any 'b' is removed from the file mode. (#131) * Added pytests for all uncovered lines. Removed lines that are never used (and thus cannot be covered). Added 'NoneType' to the dict of acceptable dict key types, as Nones can be used as keys. * Replaced all instances of 'a HDF' with 'an HDF'. * Removed the index.md file in the docs, as it is now simply pointing to the README.md file instead. * Badges! * Added few classifiers to the setup.py file. * Update requirement for pytest. * Removed use of 'track_times'. * NumPy arrays with unicode strings can now be properly dumped and loaded. * NumPy arrays containing non-NumPy objects can now be properly dumped and loaded. * Added all missing tests for full 100% coverage!!! * Make sure that kwargs is always passed to 'create_dataset'. * If hickle fails to save an object to HDF5 using its standard methods, it will fall back to pickling it (and emits a warning saying why that was necessary). Simplified the way in which python scalars are hickled. * Actually mention that the line in parentheses is the reason for serializing. * Use proper development status identifier. * Make sure to preserve the subclass type of a NumPy array when dumping. * make sure that a SkyCoord object will be properly saved and retrieved when it's a scalar or N-D array * revert the change to legacy_v3/load_astropy.py * Updated legacy v3 to v3.4.8. Co-authored-by: Kewei Li Co-authored-by: Danny Price Co-authored-by: Isuru Fernando Co-authored-by: Ellert van der Velden Co-authored-by: Bas Nijholt Co-authored-by: Rui Xue --- .appveyor.yml | 25 +- .travis.yml | 11 +- MANIFEST.in | 14 + README.md | 14 +- docs/source/conf.py | 4 +- docs/source/index.md | 191 ----- docs/source/toc.rst | 8 +- hickle/__init__.py | 11 +- hickle/__version__.py | 13 + hickle/helpers.py | 77 ++- hickle/hickle.py | 650 +++++++++--------- hickle/hickle_legacy.py | 535 -------------- hickle/legacy_v3/__init__.py | 2 + hickle/legacy_v3/__version__.py | 13 + hickle/legacy_v3/helpers.py | 113 +++ .../hickle.py} | 524 ++++++-------- hickle/legacy_v3/loaders/__init__.py | 1 + hickle/legacy_v3/loaders/load_astropy.py | 237 +++++++ hickle/legacy_v3/loaders/load_numpy.py | 145 ++++ hickle/legacy_v3/loaders/load_pandas.py | 4 + .../{ => legacy_v3}/loaders/load_python3.py | 7 +- hickle/legacy_v3/loaders/load_scipy.py | 92 +++ hickle/legacy_v3/lookup.py | 238 +++++++ hickle/loaders/__init__.py | 1 - hickle/loaders/load_astropy.py | 274 ++++---- hickle/loaders/load_builtins.py | 169 +++++ hickle/loaders/load_numpy.py | 173 ++--- hickle/loaders/load_pandas.py | 3 +- hickle/loaders/load_python.py | 141 ---- hickle/loaders/load_scipy.py | 111 +-- hickle/lookup.py | 291 ++++---- {tests => hickle/tests}/__init__.py | 0 .../legacy_hkls/generate_test_hickle.py | 0 .../tests/legacy_hkls/hickle_3_4_8.hkl | Bin 16072 -> 16504 bytes {tests => hickle/tests}/test_astropy.py | 69 +- {tests => hickle/tests}/test_hickle.py | 608 +++++++++------- hickle/tests/test_hickle_helpers.py | 49 ++ hickle/tests/test_legacy_load.py | 37 + {tests => hickle/tests}/test_scipy.py | 14 +- requirements.txt | 7 +- requirements_test.txt | 14 +- setup.cfg | 9 +- setup.py | 40 +- tests/legacy_hkls/hickle_1_1_0.hkl | Bin 7768 -> 0 bytes tests/legacy_hkls/hickle_1_3_2.hkl | Bin 7768 -> 0 bytes tests/legacy_hkls/hickle_1_4_0.hkl | Bin 7768 -> 0 bytes tests/legacy_hkls/hickle_2_1_0.hkl | Bin 16072 -> 0 bytes tests/test_hickle_helpers.py | 63 -- tests/test_legacy_load.py | 30 - 49 files changed, 2668 insertions(+), 2364 deletions(-) create mode 100644 MANIFEST.in delete mode 100644 docs/source/index.md create mode 100644 hickle/__version__.py delete mode 100644 hickle/hickle_legacy.py create mode 100644 hickle/legacy_v3/__init__.py create mode 100644 hickle/legacy_v3/__version__.py create mode 100644 hickle/legacy_v3/helpers.py rename hickle/{hickle_legacy2.py => legacy_v3/hickle.py} (53%) create mode 100644 hickle/legacy_v3/loaders/__init__.py create mode 100644 hickle/legacy_v3/loaders/load_astropy.py create mode 100644 hickle/legacy_v3/loaders/load_numpy.py create mode 100644 hickle/legacy_v3/loaders/load_pandas.py rename hickle/{ => legacy_v3}/loaders/load_python3.py (97%) create mode 100644 hickle/legacy_v3/loaders/load_scipy.py create mode 100644 hickle/legacy_v3/lookup.py create mode 100644 hickle/loaders/load_builtins.py delete mode 100644 hickle/loaders/load_python.py rename {tests => hickle/tests}/__init__.py (100%) rename {tests => hickle/tests}/legacy_hkls/generate_test_hickle.py (100%) rename tests/legacy_hkls/hickle_2_0_5.hkl => hickle/tests/legacy_hkls/hickle_3_4_8.hkl (67%) rename {tests => hickle/tests}/test_astropy.py (68%) rename {tests => hickle/tests}/test_hickle.py (56%) create mode 100644 hickle/tests/test_hickle_helpers.py create mode 100644 hickle/tests/test_legacy_load.py rename {tests => hickle/tests}/test_scipy.py (89%) delete mode 100644 tests/legacy_hkls/hickle_1_1_0.hkl delete mode 100644 tests/legacy_hkls/hickle_1_3_2.hkl delete mode 100644 tests/legacy_hkls/hickle_1_4_0.hkl delete mode 100644 tests/legacy_hkls/hickle_2_1_0.hkl delete mode 100644 tests/test_hickle_helpers.py delete mode 100644 tests/test_legacy_load.py diff --git a/.appveyor.yml b/.appveyor.yml index 49e367f1..3cf76822 100644 --- a/.appveyor.yml +++ b/.appveyor.yml @@ -1,13 +1,5 @@ environment: matrix: - - PYTHON: "C:\\Python27" - PYTHON_VERSION: "2.7.x" - PYTHON_ARCH: "32" - - - PYTHON: "C:\\Python27-x64" - PYTHON_VERSION: "2.7.x" - PYTHON_ARCH: "64" - - PYTHON: "C:\\Python35" PYTHON_VERSION: "3.5.x" PYTHON_ARCH: "32" @@ -32,6 +24,14 @@ environment: PYTHON_VERSION: "3.7.x" PYTHON_ARCH: "64" + - PYTHON: "C:\\Python38" + PYTHON_VERSION: "3.8.x" + PYTHON_ARCH: "32" + + - PYTHON: "C:\\Python38-x64" + PYTHON_VERSION: "3.8.x" + PYTHON_ARCH: "64" + install: # Prepend newly installed Python to the PATH of this build (this cannot be # done from inside the powershell script as it would require to restart @@ -41,11 +41,14 @@ install: # Upgrade pip - "python -m pip install --user --upgrade pip setuptools wheel" - # Install testing requirements and hickle + # Install testing requirements - "pip install -r requirements_test.txt" - - "pip install ." build: false test_script: - - "python setup.py test" + - "check-manifest" + - "python setup.py sdist bdist_wheel" + - "twine check dist/*" + - "pip install ." + - "pytest" diff --git a/.travis.yml b/.travis.yml index 865c1c47..42323a8a 100644 --- a/.travis.yml +++ b/.travis.yml @@ -2,10 +2,10 @@ language: python dist: xenial python: - - "2.7" - "3.5" - "3.6" - "3.7" + - "3.8" # command to install dependencies install: @@ -13,10 +13,13 @@ install: - sudo apt-get install -qq libhdf5-serial-dev - python -m pip install --upgrade pip setuptools wheel - pip install -r requirements_test.txt - - pip install . script: - - python setup.py test + - check-manifest + - python setup.py sdist bdist_wheel + - twine check dist/* + - pip install . + - pytest # Run code coverage -after_success: coveralls +after_success: codecov diff --git a/MANIFEST.in b/MANIFEST.in new file mode 100644 index 00000000..01c0ebf8 --- /dev/null +++ b/MANIFEST.in @@ -0,0 +1,14 @@ +include LICENSE +include *.md +include MANIFEST.in +include requirements*.txt +recursive-include hickle tests * + +exclude docs +recursive-exclude docs * +exclude *.yml +exclude .nojekyll +exclude .pylintrc +exclude paper* +recursive-exclude * __pycache__ +recursive-exclude * *.py[co] diff --git a/README.md b/README.md index bda2a34a..fcf171d8 100644 --- a/README.md +++ b/README.md @@ -1,13 +1,16 @@ -[![Build Status](https://travis-ci.org/telegraphic/hickle.svg?branch=master)](https://travis-ci.org/telegraphic/hickle) -[![Build status](https://ci.appveyor.com/api/projects/status/8cwrkjpwxet5jmgp?svg=true)](https://ci.appveyor.com/project/telegraphic/hickle) +[![PyPI - Latest Release](https://img.shields.io/pypi/v/hickle.svg?logo=pypi&logoColor=white&label=PyPI)](https://pypi.python.org/pypi/hickle) +[![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/hickle.svg?logo=python&logoColor=white&label=Python)](https://pypi.python.org/pypi/hickle) +[![Travis CI - Build Status](https://img.shields.io/travis/com/telegraphic/hickle/master.svg?logo=travis%20ci&logoColor=white&label=Travis%20CI)](https://travis-ci.com/telegraphic/hickle) +[![AppVeyor - Build Status](https://img.shields.io/appveyor/ci/telegraphic/hickle/master.svg?logo=appveyor&logoColor=white&label=AppVeyor)](https://ci.appveyor.com/project/telegraphic/hickle) +[![CodeCov - Coverage Status](https://img.shields.io/codecov/c/github/telegraphic/hickle/master.svg?logo=codecov&logoColor=white&label=Coverage)](https://codecov.io/gh/telegraphic/hickle/branches/master) [![JOSS Status](http://joss.theoj.org/papers/0c6638f84a1a574913ed7c6dd1051847/status.svg)](http://joss.theoj.org/papers/0c6638f84a1a574913ed7c6dd1051847) Hickle ====== -Hickle is a [HDF5](https://www.hdfgroup.org/solutions/hdf5/) based clone of `pickle`, with a twist: instead of serializing to a pickle file, -Hickle dumps to a HDF5 file (Hierarchical Data Format). It is designed to be a "drop-in" replacement for pickle (for common data objects), but is +Hickle is an [HDF5](https://www.hdfgroup.org/solutions/hdf5/) based clone of `pickle`, with a twist: instead of serializing to a pickle file, +Hickle dumps to an HDF5 file (Hierarchical Data Format). It is designed to be a "drop-in" replacement for pickle (for common data objects), but is really an amalgam of `h5py` and `dill`/`pickle` with extended functionality. That is: `hickle` is a neat little way of dumping python variables to HDF5 files that can be read in most programming @@ -95,6 +98,7 @@ These file-level options are abstracted away from the data model. Recent changes -------------- +* June 2020: Major refactor to version 4, and removal of support for Python 2. * December 2018: Accepted to Journal of Open-Source Software (JOSS). * June 2018: Major refactor and support for Python 3. * Aug 2016: Added support for scipy sparse matrices `bsr_matrix`, `csr_matrix` and `csc_matrix`. @@ -153,7 +157,7 @@ Install with `pip` by running `pip install hickle` from the command line. ### Manual install -1. You should have Python 2.7 and above installed +1. You should have Python 3.5 and above installed 2. Install h5py (Official page: http://docs.h5py.org/en/latest/build.html) diff --git a/docs/source/conf.py b/docs/source/conf.py index 2321175c..8b803a5d 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -23,7 +23,7 @@ # -- Project information ----------------------------------------------------- project = u'hickle' -copyright = u'2018, Danny Price' +copyright = u'2018-2020, Danny Price, Ellert van der Velden and contributors' author = u'Danny Price' # The short X.Y version @@ -60,7 +60,7 @@ #source_suffix = '.rst' # The master toctree document. -master_doc = 'index' +master_doc = '../../README' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/docs/source/index.md b/docs/source/index.md deleted file mode 100644 index 099b4c01..00000000 --- a/docs/source/index.md +++ /dev/null @@ -1,191 +0,0 @@ -[![Build Status](https://travis-ci.org/telegraphic/hickle.svg?branch=master)](https://travis-ci.org/telegraphic/hickle) -[![JOSS Status](http://joss.theoj.org/papers/0c6638f84a1a574913ed7c6dd1051847/status.svg)](http://joss.theoj.org/papers/0c6638f84a1a574913ed7c6dd1051847) - - -Hickle -====== - -Hickle is a [HDF5](https://www.hdfgroup.org/solutions/hdf5/) based clone of `pickle`, with a twist: instead of serializing to a pickle file, -Hickle dumps to a HDF5 file (Hierarchical Data Format). It is designed to be a "drop-in" replacement for pickle (for common data objects), but is -really an amalgam of `h5py` and `dill`/`pickle` with extended functionality. - -That is: `hickle` is a neat little way of dumping python variables to HDF5 files that can be read in most programming -languages, not just Python. Hickle is fast, and allows for transparent compression of your data (LZF / GZIP). - -Why use Hickle? ---------------- - -While `hickle` is designed to be a drop-in replacement for `pickle` (or something like `json`), it works very differently. -Instead of serializing / json-izing, it instead stores the data using the excellent [h5py](https://www.h5py.org/) module. - -The main reasons to use hickle are: - - 1. It's faster than pickle and cPickle. - 2. It stores data in HDF5. - 3. You can easily compress your data. - -The main reasons not to use hickle are: - - 1. You don't want to store your data in HDF5. While hickle can serialize arbitrary python objects, this functionality is provided only for convenience, and you're probably better off just using the pickle module. - 2. You want to convert your data in human-readable JSON/YAML, in which case, you should do that instead. - -So, if you want your data in HDF5, or if your pickling is taking too long, give hickle a try. -Hickle is particularly good at storing large numpy arrays, thanks to `h5py` running under the hood. - -Documentation -------------- - -Documentation for hickle can be found at [telegraphic.github.io/hickle/](http://telegraphic.github.io/hickle/). - - -Usage example -------------- - -Hickle is nice and easy to use, and should look very familiar to those of you who have pickled before. - -In short, `hickle` provides two methods: a [hickle.load](http://telegraphic.github.io/hickle/toc.html#hickle.load) -method, for loading hickle files, and a [hickle.dump](http://telegraphic.github.io/hickle/toc.html#hickle.dump) -method, for dumping data into HDF5. Here's a complete example: - -```python -import os -import hickle as hkl -import numpy as np - -# Create a numpy array of data -array_obj = np.ones(32768, dtype='float32') - -# Dump to file -hkl.dump(array_obj, 'test.hkl', mode='w') - -# Dump data, with compression -hkl.dump(array_obj, 'test_gzip.hkl', mode='w', compression='gzip') - -# Compare filesizes -print('uncompressed: %i bytes' % os.path.getsize('test.hkl')) -print('compressed: %i bytes' % os.path.getsize('test_gzip.hkl')) - -# Load data -array_hkl = hkl.load('test_gzip.hkl') - -# Check the two are the same file -assert array_hkl.dtype == array_obj.dtype -assert np.all((array_hkl, array_obj)) -``` - -### HDF5 compression options - -A major benefit of `hickle` over `pickle` is that it allows fancy HDF5 features to -be applied, by passing on keyword arguments on to `h5py`. So, you can do things like: - ```python - hkl.dump(array_obj, 'test_lzf.hkl', mode='w', compression='lzf', scaleoffset=0, - chunks=(100, 100), shuffle=True, fletcher32=True) - ``` -A detailed explanation of these keywords is given at http://docs.h5py.org/en/latest/high/dataset.html, -but we give a quick rundown below. - -In HDF5, datasets are stored as B-trees, a tree data structure that has speed benefits over contiguous -blocks of data. In the B-tree, data are split into [chunks](http://docs.h5py.org/en/latest/high/dataset.html#chunked-storage), -which is leveraged to allow [dataset resizing](http://docs.h5py.org/en/latest/high/dataset.html#resizable-datasets) and -compression via [filter pipelines](http://docs.h5py.org/en/latest/high/dataset.html#filter-pipeline). Filters such as -`shuffle` and `scaleoffset` move your data around to improve compression ratios, and `fletcher32` computes a checksum. -These file-level options are abstracted away from the data model. - -Recent changes --------------- - -* December 2018: Accepted to Journal of Open-Source Software (JOSS). -* June 2018: Major refactor and support for Python 3. -* Aug 2016: Added support for scipy sparse matrices `bsr_matrix`, `csr_matrix` and `csc_matrix`. - -Performance comparison ----------------------- - -Hickle runs a lot faster than pickle with its default settings, and a little faster than pickle with `protocol=2` set: - -```Python -In [1]: import numpy as np - -In [2]: x = np.random.random((2000, 2000)) - -In [3]: import pickle - -In [4]: f = open('foo.pkl', 'w') - -In [5]: %time pickle.dump(x, f) # slow by default -CPU times: user 2 s, sys: 274 ms, total: 2.27 s -Wall time: 2.74 s - -In [6]: f = open('foo.pkl', 'w') - -In [7]: %time pickle.dump(x, f, protocol=2) # actually very fast -CPU times: user 18.8 ms, sys: 36 ms, total: 54.8 ms -Wall time: 55.6 ms - -In [8]: import hickle - -In [9]: f = open('foo.hkl', 'w') - -In [10]: %time hickle.dump(x, f) # a bit faster -dumping to file -CPU times: user 764 µs, sys: 35.6 ms, total: 36.4 ms -Wall time: 36.2 ms -``` - -So if you do continue to use pickle, add the `protocol=2` keyword (thanks @mrocklin for pointing this out). - -For storing python dictionaries of lists, hickle beats the python json encoder, but is slower than uJson. For a dictionary with 64 entries, each containing a 4096 length list of random numbers, the times are: - - - json took 2633.263 ms - uJson took 138.482 ms - hickle took 232.181 ms - - -It should be noted that these comparisons are of course not fair: storing in HDF5 will not help you convert something into JSON, nor will it help you serialize a string. But for quick storage of the contents of a python variable, it's a pretty good option. - -Installation guidelines ------------------------ - -### Easy method -Install with `pip` by running `pip install hickle` from the command line. - -### Manual install - -1. You should have Python 2.7 and above installed - -2. Install h5py -(Official page: http://docs.h5py.org/en/latest/build.html) - -3. Install hdf5 -(Official page: http://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/release_docs/INSTALL) - -4. Download `hickle`: -via terminal: git clone https://github.com/telegraphic/hickle.git -via manual download: Go to https://github.com/telegraphic/hickle and on right hand side you will find `Download ZIP` file - -5. cd to your downloaded `hickle` directory - -6. Then run the following command in the `hickle` directory: - `python setup.py install` - -### Testing - -Once installed from source, run `python setup.py test` to check it's all working. - - -Bugs & contributing --------------------- - -Contributions and bugfixes are very welcome. Please check out our [contribution guidelines](https://github.com/telegraphic/hickle/blob/master/CONTRIBUTING.md) -for more details on how to contribute to development. - - -Referencing hickle ------------------- - -If you use `hickle` in academic research, we would be grateful if you could reference [our paper](http://joss.theoj.org/papers/0c6638f84a1a574913ed7c6dd1051847) in the [Journal of Open-Source Software (JOSS)](http://joss.theoj.org/about). - -``` -Price et al., (2018). Hickle: A HDF5-based python pickle replacement. Journal of Open Source Software, 3(32), 1115, https://doi.org/10.21105/joss.01115 -``` diff --git a/docs/source/toc.rst b/docs/source/toc.rst index 59184598..ac824dcd 100644 --- a/docs/source/toc.rst +++ b/docs/source/toc.rst @@ -7,12 +7,12 @@ Welcome to hickle's documentation! ================================== -Hickle is a HDF5-based clone of `pickle`, with a twist: instead of serializing to a pickle file, -Hickle dumps to a HDF5 file (Hierarchical Data Format). It is designed to be a "drop-in" replacement for pickle (for common data objects), but is +Hickle is an HDF5-based clone of `pickle`, with a twist: instead of serializing to a pickle file, +Hickle dumps to an HDF5 file (Hierarchical Data Format). It is designed to be a "drop-in" replacement for pickle (for common data objects), but is really an amalgam of `h5py` and `dill`/`pickle` with extended functionality. -That is: `hickle` is a neat little way of dumping python variables to HDF5 files that can be read in most programming -languages, not just Python. Hickle is fast, and allows for transparent compression of your data (LZF / GZIP). +That is: `hickle` is a neat little way of dumping python variables to HDF5 files that can be read in most programming +languages, not just Python. Hickle is fast, and allows for transparent compression of your data (LZF / GZIP). diff --git a/hickle/__init__.py b/hickle/__init__.py index 46e2ea2c..03d475e0 100644 --- a/hickle/__init__.py +++ b/hickle/__init__.py @@ -1,4 +1,11 @@ -from .hickle import dump, load -from .hickle import __version__ +# hickle imports +from .__version__ import __version__ +from . import hickle +from .hickle import * +# All declaration +__all__ = ['hickle'] +__all__.extend(hickle.__all__) +# Author declaration +__author__ = "Danny Price, Ellert van der Velden and contributors" diff --git a/hickle/__version__.py b/hickle/__version__.py new file mode 100644 index 00000000..087fa0e3 --- /dev/null +++ b/hickle/__version__.py @@ -0,0 +1,13 @@ +# -*- coding: utf-8 -*- + +""" +Hickle Version +============== +Stores the different versions of the *hickle* package. + +""" + + +# %% VERSIONS +# Default/Latest/Current version +__version__ = '4.0.1' diff --git a/hickle/helpers.py b/hickle/helpers.py index b7e0034e..95e66ed1 100644 --- a/hickle/helpers.py +++ b/hickle/helpers.py @@ -1,20 +1,28 @@ +# %% IMPORTS +# Built-in imports import re -import six + +# Package imports +import dill as pickle + + +# %% FUNCTION DEFINITIONS +def get_type(h_node): + """ Helper function to return the py_type for an HDF node """ + base_type = h_node.attrs['base_type'] + if base_type != b'pickle': + py_type = pickle.loads(h_node.attrs['type']) + else: + py_type = None + return py_type, base_type + def get_type_and_data(h_node): - """ Helper function to return the py_type and data block for a HDF node """ - py_type = h_node.attrs["type"][0] + """ Helper function to return the py_type and data block for an HDF node""" + py_type, base_type = get_type(h_node) data = h_node[()] -# if h_node.shape == (): -# data = h_node.value -# else: -# data = h_node[:] - return py_type, data + return py_type, base_type, data -def get_type(h_node): - """ Helper function to return the py_type for a HDF node """ - py_type = h_node.attrs["type"][0] - return py_type def sort_keys(key_list): """ Take a list of strings and sort it by integer value within string @@ -28,27 +36,26 @@ def sort_keys(key_list): # Py3 h5py returns an irritating KeysView object # Py3 also complains about bytes and strings, convert all keys to bytes - if six.PY3: - key_list2 = [] - for key in key_list: - if isinstance(key, str): - key = bytes(key, 'ascii') - key_list2.append(key) - key_list = key_list2 + key_list2 = [] + for key in key_list: + if isinstance(key, str): + key = bytes(key, 'ascii') + key_list2.append(key) + key_list = key_list2 # Check which keys contain a number numbered_keys = [re.search(br'\d+', key) for key in key_list] # Sort the keys on number if they have it, or normally if not if(len(key_list) and not numbered_keys.count(None)): - to_int = lambda x: int(re.search(br'\d+', x).group(0)) - return(sorted(key_list, key=to_int)) + return(sorted(key_list, + key=lambda x: int(re.search(br'\d+', x).group(0)))) else: return(sorted(key_list)) def check_is_iterable(py_obj): - """ Check whether a python object is iterable. + """ Check whether a python object is a built-in iterable. Note: this treats unicode and string as NON ITERABLE @@ -58,17 +65,9 @@ def check_is_iterable(py_obj): Returns: iter_ok (bool): True if item is iterable, False is item is not """ - if six.PY2: - string_types = (str, unicode) - else: - string_types = (str, bytes, bytearray) - if isinstance(py_obj, string_types): - return False - try: - iter(py_obj) - return True - except TypeError: - return False + + # Check if py_obj is an accepted iterable and return + return(isinstance(py_obj, (tuple, list, set))) def check_is_hashable(py_obj): @@ -96,18 +95,22 @@ def check_iterable_item_type(iter_obj): Returns: iter_type: type of item contained within the iterable. If - the iterable has many types, a boolean False is returned instead. + the iterable has many types, a boolean False is returned instead. References: - http://stackoverflow.com/questions/13252333/python-check-if-all-elements-of-a-list-are-the-same-type + http://stackoverflow.com/questions/13252333 """ + iseq = iter(iter_obj) try: first_type = type(next(iseq)) except StopIteration: return False - except Exception as ex: + except Exception: # pragma: no cover return False else: - return first_type if all((type(x) is first_type) for x in iseq) else False + if all([type(x) is first_type for x in iseq]): + return(first_type) + else: + return(False) diff --git a/hickle/hickle.py b/hickle/hickle.py index 24b38c3e..5b13d1bf 100644 --- a/hickle/hickle.py +++ b/hickle/hickle.py @@ -4,9 +4,9 @@ Created by Danny Price 2016-02-03. -Hickle is a HDF5 based clone of Pickle. Instead of serializing to a pickle -file, Hickle dumps to a HDF5 file. It is designed to be as similar to pickle in -usage as possible, providing a load() and dump() function. +Hickle is an HDF5 based clone of Pickle. Instead of serializing to a pickle +file, Hickle dumps to an HDF5 file. It is designed to be as similar to pickle +in usage as possible, providing a load() and dump() function. ## Notes @@ -22,202 +22,138 @@ """ -from __future__ import absolute_import, division, print_function -import sys -import os -from pkg_resources import get_distribution, DistributionNotFound -from ast import literal_eval - -import numpy as np -import h5py as h5 - - -from .helpers import get_type, sort_keys, check_is_iterable, check_iterable_item_type -from .lookup import types_dict, hkl_types_dict, types_not_to_sort, \ - container_types_dict, container_key_types_dict -from .lookup import check_is_ndarray_like - -try: - from exceptions import Exception - from types import NoneType -except ImportError: - pass # above imports will fail in python3 - -from six import PY2, PY3, string_types, integer_types +# %% IMPORTS +# Built-in imports import io +from pathlib import Path +import sys +import warnings -# Make several aliases for Python2/Python3 compatibility -if PY3: - file = io.TextIOWrapper +# Package imports +import dill as pickle +import h5py as h5 +import numpy as np -# Import a default 'pickler' -# Not the nicest import code, but should work on Py2/Py3 -try: - import dill as pickle -except ImportError: - try: - import cPickle as pickle - except ImportError: - import pickle +# hickle imports +from hickle import __version__ +from hickle.helpers import ( + get_type, sort_keys, check_is_iterable, check_iterable_item_type) +from hickle.lookup import ( + types_dict, hkl_types_dict, types_not_to_sort, dict_key_types_dict, + check_is_ndarray_like, load_loader) -import warnings +# All declaration +__all__ = ['dump', 'load'] -try: - __version__ = get_distribution('hickle').version -except DistributionNotFound: - __version__ = '0.0.0 - please install via pip/setup.py' +# %% CLASS DEFINITIONS ################## # Error handling # ################## class FileError(Exception): """ An exception raised if the file is fishy """ - def __init__(self): - return - - def __str__(self): - return ("Cannot open file. Please pass either a filename " - "string, a file object, or a h5py.File") + pass class ClosedFileError(Exception): """ An exception raised if the file is fishy """ - def __init__(self): - return - - def __str__(self): - return ("HDF5 file has been closed. Please pass either " - "a filename string, a file object, or an open h5py.File") - - -class NoMatchError(Exception): - """ An exception raised if the object type is not understood (or - supported)""" - def __init__(self): - return - - def __str__(self): - return ("Error: this type of python object cannot be converted into a " - "hickle.") + pass -class ToDoError(Exception): +class ToDoError(Exception): # pragma: no cover """ An exception raised for non-implemented functionality""" - def __init__(self): - return - def __str__(self): return "Error: this functionality hasn't been implemented yet." -class SerializedWarning(UserWarning): - """ An object type was not understood - - The data will be serialized using pickle. - """ - pass - - -###################### -# H5PY file wrappers # -###################### - -class H5GroupWrapper(h5.Group): - """ Group wrapper that provides a track_times kwarg. - - track_times is a boolean flag that can be set to False, so that two - files created at different times will have identical MD5 hashes. - """ - def create_dataset(self, *args, **kwargs): - kwargs['track_times'] = getattr(self, 'track_times', True) - return super(H5GroupWrapper, self).create_dataset(*args, **kwargs) - - def create_group(self, *args, **kwargs): - group = super(H5GroupWrapper, self).create_group(*args, **kwargs) - group.__class__ = H5GroupWrapper - group.track_times = getattr(self, 'track_times', True) - return group - - -class H5FileWrapper(h5.File): - """ Wrapper for h5py File that provides a track_times kwarg. - - track_times is a boolean flag that can be set to False, so that two - files created at different times will have identical MD5 hashes. +# %% FUNCTION DEFINITIONS +def file_opener(f, path, mode='r'): """ - def create_dataset(self, *args, **kwargs): - kwargs['track_times'] = getattr(self, 'track_times', True) - return super(H5FileWrapper, self).create_dataset(*args, **kwargs) - - def create_group(self, *args, **kwargs): - group = super(H5FileWrapper, self).create_group(*args, **kwargs) - group.__class__ = H5GroupWrapper - group.track_times = getattr(self, 'track_times', True) - return group - - -def file_opener(f, mode='r', track_times=True): - """ A file opener helper function with some error handling. This can open - files through a file object, a h5py file, or just the filename. - - Args: - f (file, h5py.File, or string): File-identifier, e.g. filename or file object. - mode (str): File open mode. Only required if opening by filename string. - track_times (bool): Track time in HDF5; turn off if you want hickling at - different times to produce identical files (e.g. for MD5 hash check). + A file opener helper function with some error handling. + This can open files through a file object, an h5py file, or just the + filename. + + Parameters + ---------- + f : file object, str or :obj:`~h5py.Group` object + File to open for dumping or loading purposes. + If str, `file_obj` provides the path of the HDF5-file that must be + used. + If :obj:`~h5py._hl.group.Group`, the group (or file) in an open + HDF5-file that must be used. + path : str + Path within HDF5-file or group to dump to/load from. + mode : str, optional + Accepted values are 'r' (read only), 'w' (write; default) or 'a' + (append). + Ignored if file is a file object. """ # Assume that we will have to close the file after dump or load close_flag = True + # Make sure that the given path always starts with '/' + if not path.startswith('/'): + path = '/%s' % (path) + # Were we handed a file object or just a file name string? - if isinstance(f, (file, io.TextIOWrapper)): + if isinstance(f, (io.TextIOWrapper, io.BufferedWriter)): filename, mode = f.name, f.mode f.close() + mode = mode.replace('b', '') h5f = h5.File(filename, mode) - elif isinstance(f, string_types): + elif isinstance(f, (str, Path)): filename = f h5f = h5.File(filename, mode) - elif isinstance(f, (H5FileWrapper, h5._hl.files.File)): + elif isinstance(f, h5._hl.group.Group): try: - filename = f.filename + filename = f.file.filename except ValueError: - raise ClosedFileError - h5f = f + raise ClosedFileError("HDF5 file has been closed. Please pass " + "either a filename string, a file object, or" + "an open HDF5-file") + path = ''.join([f.name, path]) + h5f = f.file + + if path.endswith('/'): + path = path[:-1] + # Since this file was already open, do not close the file afterward close_flag = False + else: print(f.__class__) - raise FileError + raise FileError("Cannot open file. Please pass either a filename " + "string, a file object, or a h5py.File") - h5f.__class__ = H5FileWrapper - h5f.track_times = track_times - return(h5f, close_flag) + return(h5f, path, close_flag) ########### # DUMPERS # ########### +# Get list of dumpable dtypes +dumpable_dtypes = [bool, complex, bytes, float, int, str] -def _dump(py_obj, h_group, call_id=0, **kwargs): - """ Dump a python object to a group within a HDF5 file. + +def _dump(py_obj, h_group, call_id=None, **kwargs): + """ Dump a python object to a group within an HDF5 file. This function is called recursively by the main dump() function. Args: py_obj: python object to dump. h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - # Get list of dumpable dtypes - dumpable_dtypes = [] - for lst in [[bool, complex, bytes, float], string_types, integer_types]: - dumpable_dtypes.extend(lst) + # Check if we have a unloaded loader for the provided py_obj + load_loader(py_obj) # Firstly, check if item is a numpy array. If so, just dump it. if check_is_ndarray_like(py_obj): @@ -232,14 +168,14 @@ def _dump(py_obj, h_group, call_id=0, **kwargs): item_type = check_iterable_item_type(py_obj) # item_type == False implies multiple types. Create a dataset - if item_type is False: + if not item_type: h_subgroup = create_hkl_group(py_obj, h_group, call_id) for ii, py_subobj in enumerate(py_obj): _dump(py_subobj, h_subgroup, call_id=ii, **kwargs) # otherwise, subitems have same type. Check if subtype is an iterable - # (e.g. list of lists), or not (e.g. list of ints, which should be treated - # as a single dataset). + # (e.g. list of lists), or not (e.g. list of ints, which should be + # treated as a single dataset). else: if item_type in dumpable_dtypes: create_hkl_dataset(py_obj, h_group, call_id, **kwargs) @@ -253,21 +189,31 @@ def _dump(py_obj, h_group, call_id=0, **kwargs): create_hkl_dataset(py_obj, h_group, call_id, **kwargs) -def dump(py_obj, file_obj, mode='w', track_times=True, path='/', **kwargs): - """ Write a pickled representation of obj to the open file object file. +def dump(py_obj, file_obj, mode='w', path='/', **kwargs): + """ + Write a hickled representation of `py_obj` to the provided `file_obj`. + + Parameters + ---------- + py_obj : object + Python object to hickle to HDF5. + file_obj : file object, str or :obj:`~h5py.Group` object + File in which to store the object. + If str, `file_obj` provides the path of the HDF5-file that must be + used. + If :obj:`~h5py._hl.group.Group`, the group (or file) in an open + HDF5-file that must be used. + mode : str, optional + Accepted values are 'r' (read only), 'w' (write; default) or 'a' + (append). + Ignored if file is a file object. + path : str, optional + Path within HDF5-file or group to save data to. + Defaults to root ('/'). + kwargs : keyword arguments + Additional keyword arguments that must be provided to the + :meth:`~h5py._hl.group.Group.create_dataset` method. - Args: - obj (object): python object o store in a Hickle - file: file object, filename string, or h5py.File object - file in which to store the object. A h5py.File or a filename is also - acceptable. - mode (str): optional argument, 'r' (read only), 'w' (write) or 'a' (append). - Ignored if file is a file object. - compression (str): optional argument. Applies compression to dataset. Options: None, gzip, - lzf (+ szip, if installed) - track_times (bool): optional argument. If set to False, repeated hickling will produce - identical files. - path (str): path within hdf5 file to save data to. Defaults to root / """ # Make sure that file is not closed unless modified @@ -276,31 +222,28 @@ def dump(py_obj, file_obj, mode='w', track_times=True, path='/', **kwargs): try: # Open the file - h5f, close_flag = file_opener(file_obj, mode, track_times) - h5f.attrs["CLASS"] = b'hickle' - h5f.attrs["VERSION"] = get_distribution('hickle').version - h5f.attrs["type"] = [b'hickle'] + h5f, path, close_flag = file_opener(file_obj, path, mode) + # Log which version of python was used to generate the hickle file pv = sys.version_info py_ver = "%i.%i.%i" % (pv[0], pv[1], pv[2]) - h5f.attrs["PYTHON_VERSION"] = py_ver - - h_root_group = h5f.get(path) - if h_root_group is None: + # Try to create the root group + try: h_root_group = h5f.create_group(path) - h_root_group.attrs["type"] = [b'hickle'] + + # If that is not possible, check if it is empty + except ValueError as error: + # Raise error if this group is not empty + if len(h5f[path]): + raise error + else: + h_root_group = h5f.get(path) + + h_root_group.attrs["HICKLE_VERSION"] = __version__ + h_root_group.attrs["HICKLE_PYTHON_VERSION"] = py_ver _dump(py_obj, h_root_group, **kwargs) - except NoMatchError: - fname = h5f.filename - h5f.close() - try: - os.remove(fname) - except OSError: - warnings.warn("Dump failed. Could not remove %s" % fname) - finally: - raise NoMatchError finally: # Close the file if requested. # Closing a file twice will not cause any problems @@ -309,7 +252,7 @@ def dump(py_obj, file_obj, mode='w', track_times=True, path='/', **kwargs): def create_dataset_lookup(py_obj): - """ What type of object are we trying to pickle? This is a python + """ What type of object are we trying to hickle? This is a python dictionary based equivalent of a case statement. It returns the correct helper function for a given data type. @@ -318,47 +261,88 @@ def create_dataset_lookup(py_obj): Returns: match: function that should be used to dump data to a new dataset + base_type: the base type of the data that will be dumped """ - t = type(py_obj) - types_lookup = {dict: create_dict_dataset} - types_lookup.update(types_dict) - match = types_lookup.get(t, no_match) + # Obtain the MRO of this object + mro_list = py_obj.__class__.mro() - return match + # Create a type_map + type_map = map(types_dict.get, mro_list) + # Loop over the entire type_map until something else than None is found + for type_item in type_map: + if type_item is not None: + return(type_item) -def create_hkl_dataset(py_obj, h_group, call_id=0, **kwargs): +def create_hkl_dataset(py_obj, h_group, call_id=None, **kwargs): """ Create a dataset within the hickle HDF5 file Args: py_obj: python object to dump. h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - #lookup dataset creator type based on python object type - create_dataset = create_dataset_lookup(py_obj) + # lookup dataset creator type based on python object type + create_dataset, base_type = create_dataset_lookup(py_obj) - # do the creation - create_dataset(py_obj, h_group, call_id, **kwargs) + # Set the name of this dataset + name = 'data%s' % ("_%i" % (call_id) if call_id is not None else '') + + # Try to create the dataset + try: + h_subgroup = create_dataset(py_obj, h_group, name, **kwargs) + # If that fails, pickle the object instead + except Exception as error: + # Make sure builtins loader is loaded + load_loader(object) + + # Obtain the proper dataset creator and base type + create_dataset, base_type = types_dict[object] + + # Make sure that a group/dataset with name 'name' does not exist + try: + del h_group[name] + except Exception: + pass + # Create the pickled dataset + h_subgroup = create_dataset(py_obj, h_group, name, error, **kwargs) -def create_hkl_group(py_obj, h_group, call_id=0): + # Save base type of py_obj + h_subgroup.attrs['base_type'] = base_type + + # Save a pickled version of the true type of py_obj if necessary + if base_type != b'pickle' and 'type' not in h_subgroup.attrs: + h_subgroup.attrs['type'] = np.array(pickle.dumps(py_obj.__class__)) + + +def create_hkl_group(py_obj, h_group, call_id=None): """ Create a new group within the hickle file Args: h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - h_subgroup = h_group.create_group('data_%i' % call_id) - h_subgroup.attrs['type'] = [str(type(py_obj)).encode('ascii', 'ignore')] + + # Set the name of this group + if isinstance(call_id, str): + name = call_id + else: + name = 'data%s' % ("_%i" % (call_id) if call_id is not None else '') + + h_subgroup = h_group.create_group(name) + h_subgroup.attrs['type'] = np.array(pickle.dumps(py_obj.__class__)) + h_subgroup.attrs['base_type'] = create_dataset_lookup(py_obj)[1] return h_subgroup -def create_dict_dataset(py_obj, h_group, call_id=0, **kwargs): +def create_dict_dataset(py_obj, h_group, name, **kwargs): """ Creates a data group for each key in dictionary Notes: @@ -370,43 +354,43 @@ def create_dict_dataset(py_obj, h_group, call_id=0, **kwargs): Args: py_obj: python object to dump; should be dictionary h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - h_dictgroup = h_group.create_group('data_%i' % call_id) - h_dictgroup.attrs['type'] = [str(type(py_obj)).encode('ascii', 'ignore')] - for key, py_subobj in py_obj.items(): - if isinstance(key, string_types): - h_subgroup = h_dictgroup.create_group("%r" % (key)) - else: - h_subgroup = h_dictgroup.create_group(str(key)) - h_subgroup.attrs["type"] = [b'dict_item'] + h_dictgroup = h_group.create_group(name) - h_subgroup.attrs["key_type"] = [str(type(key)).encode('ascii', 'ignore')] + for idx, (key, py_subobj) in enumerate(py_obj.items()): + # Obtain the raw string representation of this key + subgroup_key = "%r" % (key) - _dump(py_subobj, h_subgroup, call_id=0, **kwargs) + # Make sure that the '\\\\' is not in the key, or raise error if so + if '\\\\' in subgroup_key: + del h_group[name] + raise ValueError("Dict item keys containing the '\\\\' string are " + "not supported!") + # Replace any forward slashes with double backslashes + subgroup_key = subgroup_key.replace('/', '\\\\') + h_subgroup = h_dictgroup.create_group(subgroup_key) + h_subgroup.attrs['base_type'] = b'dict_item' -def no_match(py_obj, h_group, call_id=0, **kwargs): - """ If no match is made, raise an exception + h_subgroup.attrs['key_base_type'] = bytes(type(key).__name__, 'ascii') + h_subgroup.attrs['key_type'] = np.array(pickle.dumps(key.__class__)) - Args: - py_obj: python object to dump; default if item is not matched. - h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. - """ - pickled_obj = pickle.dumps(py_obj) - d = h_group.create_dataset('data_%i' % call_id, data=[pickled_obj]) - d.attrs["type"] = [b'pickle'] + h_subgroup.attrs['key_idx'] = idx - warnings.warn("%s type not understood, data have been serialized" % type(py_obj), - SerializedWarning) + _dump(py_subobj, h_subgroup, call_id=None, **kwargs) + return(h_dictgroup) +# Add create_dict_dataset to types_dict +types_dict[dict] = (create_dict_dataset, b"dict") -############# -## LOADERS ## -############# + +########### +# LOADERS # +########### class PyContainer(list): """ A group-like object into which to load datasets. @@ -419,8 +403,10 @@ class PyContainer(list): def __init__(self): super(PyContainer, self).__init__() self.container_type = None + self.container_base_type = None self.name = None self.key_type = None + self.key_base_type = None def convert(self): """ Convert from PyContainer to python core data type. @@ -429,29 +415,47 @@ def convert(self): (or other type specified in lookup.py) """ - if self.container_type in container_types_dict.keys(): - convert_fn = container_types_dict[self.container_type] - return convert_fn(self) - if self.container_type == str(dict).encode('ascii', 'ignore'): - keys = [] + # If this container is a dict, convert its items properly + if self.container_base_type == b"dict": + # Create empty list of items + items = [[]]*len(self) + + # Loop over all items in the container for item in self: - key = item.name.split('/')[-1] - key_type = item.key_type[0] - if key_type in container_key_types_dict.keys(): - to_type_fn = container_key_types_dict[key_type] + # Obtain the name of this item + key = item.name.split('/')[-1].replace('\\\\', '/') + + # Obtain the base type and index of this item's key + key_base_type = item.key_base_type + key_idx = item.key_idx + + # If this key has a type that must be converted, do so + if key_base_type in dict_key_types_dict.keys(): + to_type_fn = dict_key_types_dict[key_base_type] key = to_type_fn(key) - keys.append(key) - items = [item[0] for item in self] - return dict(zip(keys, items)) + # Insert item at the correct index into the list + items[key_idx] = [key, item[0]] + + # Initialize dict using its true type and return + return(self.container_type(items)) + + # In all other cases, return container else: - return self + # If container has a true type defined, convert to that first + if self.container_type is not None: + return(self.container_type(self)) + + # If not, return the container itself + else: + return(self) -def no_match_load(key): + +def no_match_load(key): # pragma: no cover """ If no match is made when loading, need to raise an exception """ raise RuntimeError("Cannot load %s data type" % key) - #pass + def load_dataset_lookup(key): """ What type of object are we trying to unpickle? This is a python @@ -469,82 +473,98 @@ def load_dataset_lookup(key): return match -def load(fileobj, path='/', safe=True): - """ Load a hickle file and reconstruct a python object - Args: - fileobj: file object, h5py.File, or filename string - safe (bool): Disable automatic depickling of arbitrary python objects. - DO NOT set this to False unless the file is from a trusted source. - (see http://www.cs.jhu.edu/~s/musings/pickle.html for an explanation) +def load(file_obj, path='/', safe=True): + """ + Load the Python object stored in `file_obj` at `path` and return it. + + Parameters + ---------- + file_obj : file object, str or :obj:`~h5py.Group` object + File from which to load the object. + If str, `file_obj` provides the path of the HDF5-file that must be + used. + If :obj:`~h5py._hl.group.Group`, the group (or file) in an open + HDF5-file that must be used. + path : str, optional + Path within HDF5-file or group to load data from. + Defaults to root ('/'). + safe : bool, optional + Disable automatic depickling of arbitrary python objects. + DO NOT set this to False unless the file is from a trusted source. + (See https://docs.python.org/3/library/pickle.html for an explanation) + + Returns + ------- + py_obj : object + The unhickled Python object. - path (str): path within hdf5 file to save data to. Defaults to root / """ # Make sure that the file is not closed unless modified # This is to avoid trying to close a file that was never opened close_flag = False + # Try to read the provided file_obj as a hickle file try: - h5f, close_flag = file_opener(fileobj) + h5f, path, close_flag = file_opener(file_obj, path, 'r') h_root_group = h5f.get(path) - try: - assert 'CLASS' in h5f.attrs.keys() - assert 'VERSION' in h5f.attrs.keys() - VER = h5f.attrs['VERSION'] + + # Define attributes h_root_group must have + v3_attrs = ['CLASS', 'VERSION', 'PYTHON_VERSION'] + v4_attrs = ['HICKLE_VERSION', 'HICKLE_PYTHON_VERSION'] + + # Check if the proper attributes for v3 loading are available + if all(map(h_root_group.attrs.get, v3_attrs)): + # Check if group attribute 'CLASS' has value 'hickle + if(h_root_group.attrs['CLASS'] != b'hickle'): # pragma: no cover + # If not, raise error + raise AttributeError("HDF5-file attribute 'CLASS' does not " + "have value 'hickle'!") + + # Obtain version with which the file was made try: - VER_MAJOR = int(VER) - except ValueError: - VER_MAJOR = int(VER[0]) - if VER_MAJOR == 1: - if PY2: - warnings.warn("Hickle file versioned as V1, attempting legacy loading...") - from . import hickle_legacy - return hickle_legacy.load(fileobj, safe) - else: - raise RuntimeError("Cannot open file. This file was likely" - " created with Python 2 and an old hickle version.") - elif VER_MAJOR == 2: - if PY2: - warnings.warn("Hickle file appears to be old version (v2), attempting " - "legacy loading...") - from . import hickle_legacy2 - return hickle_legacy2.load(fileobj, path=path, safe=safe) - else: - raise RuntimeError("Cannot open file. This file was likely" - " created with Python 2 and an old hickle version.") - # There is an unfortunate period of time where hickle 2.1.0 claims VERSION = int(3) - # For backward compatibility we really need to catch this. - # Actual hickle v3 files are versioned as A.B.C (e.g. 3.1.0) - elif VER_MAJOR == 3 and VER == VER_MAJOR: - if PY2: - warnings.warn("Hickle file appears to be old version (v2.1.0), attempting " - "legacy loading...") - from . import hickle_legacy2 - return hickle_legacy2.load(fileobj, path=path, safe=safe) - else: - raise RuntimeError("Cannot open file. This file was likely" - " created with Python 2 and an old hickle version.") - elif VER_MAJOR >= 3: - py_container = PyContainer() - py_container.container_type = 'hickle' - py_container = _load(py_container, h_root_group) - return py_container[0][0] - - except AssertionError: - if PY2: - warnings.warn("Hickle file is not versioned, attempting legacy loading...") - from . import hickle_legacy - return hickle_legacy.load(fileobj, safe) + major_version = int(h_root_group.attrs['VERSION'][0]) + + # If this cannot be done, then this is not a v3 file + except Exception: # pragma: no cover + raise Exception("This file does not appear to be a hickle v3 " + "file.") + + # Else, if the major version is not 3, it is not a v3 file either else: - raise RuntimeError("Cannot open file. This file was likely" - " created with Python 2 and an old hickle version.") + if(major_version != 3): # pragma: no cover + raise Exception("This file does not appear to be a hickle " + "v3 file.") + + # Load file + from hickle import legacy_v3 + warnings.warn("Input argument 'file_obj' appears to be a file made" + " with hickle v3. Using legacy load...") + return(legacy_v3.load(file_obj, path, safe)) + + # Else, check if the proper attributes for v4 loading are available + elif all(map(h_root_group.attrs.get, v4_attrs)): + # Load file + py_container = PyContainer() + py_container = _load(py_container, h_root_group['data']) + return(py_container[0]) + + # Else, raise error + else: # pragma: no cover + raise FileError("HDF5-file does not have the proper attributes!") + + # If this fails, raise error and provide user with caught error message + except Exception as error: # pragma: no cover + raise ValueError("Provided argument 'file_obj' does not appear to be a" + " valid hickle file! (%s)" % (error)) finally: # Close the file if requested. # Closing a file twice will not cause any problems if close_flag: h5f.close() + def load_dataset(h_node): """ Load a dataset, converting into its correct python type @@ -554,14 +574,17 @@ def load_dataset(h_node): Returns: data: reconstructed python object from loaded data """ - py_type = get_type(h_node) + py_type, base_type = get_type(h_node) + load_loader(py_type) + + load_fn = load_dataset_lookup(base_type) + data = load_fn(h_node) + + # If data is not py_type yet, convert to it (unless it is pickle) + if base_type != b'pickle' and type(data) != py_type: + data = py_type(data) + return data - try: - load_fn = load_dataset_lookup(py_type) - return load_fn(h_node) - except: - raise - #raise RuntimeError("Hickle type %s not understood." % py_type) def _load(py_container, h_group): """ Load a hickle file @@ -571,27 +594,30 @@ def _load(py_container, h_group): Args: py_container (PyContainer): Python container to load data into h_group (h5 group or dataset): h5py object, group or dataset, to spider - and load all datasets. + and load all datasets. """ - group_dtype = h5._hl.group.Group - dataset_dtype = h5._hl.dataset.Dataset - - #either a file, group, or dataset - if isinstance(h_group, (H5FileWrapper, group_dtype)): + # Either a file, group, or dataset + if isinstance(h_group, h5._hl.group.Group): py_subcontainer = PyContainer() - try: - py_subcontainer.container_type = bytes(h_group.attrs['type'][0]) - except KeyError: - raise - #py_subcontainer.container_type = '' + py_subcontainer.container_base_type = bytes(h_group.attrs['base_type']) + py_subcontainer.name = h_group.name - if py_subcontainer.container_type == b'dict_item': - py_subcontainer.key_type = h_group.attrs['key_type'] + if py_subcontainer.container_base_type == b'dict_item': + py_subcontainer.key_base_type = h_group.attrs['key_base_type'] + py_obj_type = pickle.loads(h_group.attrs['key_type']) + py_subcontainer.key_type = py_obj_type + py_subcontainer.key_idx = h_group.attrs['key_idx'] + else: + py_obj_type = pickle.loads(h_group.attrs['type']) + py_subcontainer.container_type = py_obj_type + + # Check if we have an unloaded loader for the provided py_obj + load_loader(py_obj_type) - if py_subcontainer.container_type not in types_not_to_sort: + if py_subcontainer.container_base_type not in types_not_to_sort: h_keys = sort_keys(h_group.keys()) else: h_keys = h_group.keys() diff --git a/hickle/hickle_legacy.py b/hickle/hickle_legacy.py deleted file mode 100644 index 61a171fd..00000000 --- a/hickle/hickle_legacy.py +++ /dev/null @@ -1,535 +0,0 @@ -# encoding: utf-8 -""" -# hickle_legacy.py - -Created by Danny Price 2012-05-28. - -Hickle is a HDF5 based clone of Pickle. Instead of serializing to a -pickle file, Hickle dumps to a HDF5 file. It is designed to be as similar -to pickle in usage as possible. - -## Notes - -This is a legacy handler, for hickle v1 files. -If V2 reading fails, this will be called as a fail-over. - -""" - -import os -import sys -import numpy as np -import h5py as h5 - -if sys.version_info.major == 3: - NoneType = type(None) -else: - from types import NoneType - -__version__ = "1.3.0" -__author__ = "Danny Price" - -#################### -## Error handling ## -#################### - - -class FileError(Exception): - """ An exception raised if the file is fishy""" - - def __init__(self): - return - - def __str__(self): - print("Error: cannot open file. Please pass either a filename string, a file object, " - "or a h5py.File") - - -class NoMatchError(Exception): - """ An exception raised if the object type is not understood (or supported)""" - - def __init__(self): - return - - def __str__(self): - print("Error: this type of python object cannot be converted into a hickle.") - - -class ToDoError(Exception): - """ An exception raised for non-implemented functionality""" - - def __init__(self): - return - - def __str__(self): - print("Error: this functionality hasn't been implemented yet.") - - -class H5GroupWrapper(h5.Group): - def create_dataset(self, *args, **kwargs): - kwargs['track_times'] = getattr(self, 'track_times', True) - return super(H5GroupWrapper, self).create_dataset(*args, **kwargs) - - def create_group(self, *args, **kwargs): - group = super(H5GroupWrapper, self).create_group(*args, **kwargs) - group.__class__ = H5GroupWrapper - group.track_times = getattr(self, 'track_times', True) - return group - - -class H5FileWrapper(h5.File): - def create_dataset(self, *args, **kwargs): - kwargs['track_times'] = getattr(self, 'track_times', True) - return super(H5FileWrapper, self).create_dataset(*args, **kwargs) - - def create_group(self, *args, **kwargs): - group = super(H5FileWrapper, self).create_group(*args, **kwargs) - group.__class__ = H5GroupWrapper - group.track_times = getattr(self, 'track_times', True) - return group - - -def file_opener(f, mode='r', track_times=True): - """ A file opener helper function with some error handling. - - This can open files through a file object, a h5py file, or just the filename. - """ - # Were we handed a file object or just a file name string? - if isinstance(f, file): - filename, mode = f.name, f.mode - f.close() - h5f = h5.File(filename, mode) - - elif isinstance(f, h5._hl.files.File): - h5f = f - elif isinstance(f, str): - filename = f - h5f = h5.File(filename, mode) - else: - raise FileError - - h5f.__class__ = H5FileWrapper - h5f.track_times = track_times - return h5f - - -############# -## dumpers ## -############# - -def dump_ndarray(obj, h5f, **kwargs): - """ dumps an ndarray object to h5py file""" - h5f.create_dataset('data', data=obj, **kwargs) - h5f.create_dataset('type', data=['ndarray']) - - -def dump_np_dtype(obj, h5f, **kwargs): - """ dumps an np dtype object to h5py file""" - h5f.create_dataset('data', data=obj) - h5f.create_dataset('type', data=['np_dtype']) - - -def dump_np_dtype_dict(obj, h5f, **kwargs): - """ dumps an np dtype object within a group""" - h5f.create_dataset('data', data=obj) - h5f.create_dataset('_data', data=['np_dtype']) - - -def dump_masked(obj, h5f, **kwargs): - """ dumps an ndarray object to h5py file""" - h5f.create_dataset('data', data=obj, **kwargs) - h5f.create_dataset('mask', data=obj.mask, **kwargs) - h5f.create_dataset('type', data=['masked']) - - -def dump_list(obj, h5f, **kwargs): - """ dumps a list object to h5py file""" - - # Check if there are any numpy arrays in the list - contains_numpy = any(isinstance(el, np.ndarray) for el in obj) - - if contains_numpy: - _dump_list_np(obj, h5f, **kwargs) - else: - h5f.create_dataset('data', data=obj, **kwargs) - h5f.create_dataset('type', data=['list']) - - -def _dump_list_np(obj, h5f, **kwargs): - """ Dump a list of numpy objects to file """ - - np_group = h5f.create_group('data') - h5f.create_dataset('type', data=['np_list']) - - ii = 0 - for np_item in obj: - np_group.create_dataset("%s" % ii, data=np_item, **kwargs) - ii += 1 - - -def dump_tuple(obj, h5f, **kwargs): - """ dumps a list object to h5py file""" - - # Check if there are any numpy arrays in the list - contains_numpy = any(isinstance(el, np.ndarray) for el in obj) - - if contains_numpy: - _dump_tuple_np(obj, h5f, **kwargs) - else: - h5f.create_dataset('data', data=obj, **kwargs) - h5f.create_dataset('type', data=['tuple']) - - -def _dump_tuple_np(obj, h5f, **kwargs): - """ Dump a tuple of numpy objects to file """ - - np_group = h5f.create_group('data') - h5f.create_dataset('type', data=['np_tuple']) - - ii = 0 - for np_item in obj: - np_group.create_dataset("%s" % ii, data=np_item, **kwargs) - ii += 1 - - -def dump_set(obj, h5f, **kwargs): - """ dumps a set object to h5py file""" - obj = list(obj) - h5f.create_dataset('data', data=obj, **kwargs) - h5f.create_dataset('type', data=['set']) - - -def dump_string(obj, h5f, **kwargs): - """ dumps a list object to h5py file""" - h5f.create_dataset('data', data=[obj], **kwargs) - h5f.create_dataset('type', data=['string']) - - -def dump_none(obj, h5f, **kwargs): - """ Dump None type to file """ - h5f.create_dataset('data', data=[0], **kwargs) - h5f.create_dataset('type', data=['none']) - - -def dump_unicode(obj, h5f, **kwargs): - """ dumps a list object to h5py file""" - dt = h5.special_dtype(vlen=unicode) - ll = len(obj) - dset = h5f.create_dataset('data', shape=(ll, ), dtype=dt, **kwargs) - dset[:ll] = obj - h5f.create_dataset('type', data=['unicode']) - - -def _dump_dict(dd, hgroup, **kwargs): - for key in dd: - if type(dd[key]) in (str, int, float, unicode, bool): - # Figure out type to be stored - types = {str: 'str', int: 'int', float: 'float', - unicode: 'unicode', bool: 'bool', NoneType: 'none'} - _key = types.get(type(dd[key])) - - # Store along with dtype info - if _key == 'unicode': - dd[key] = str(dd[key]) - - hgroup.create_dataset("%s" % key, data=[dd[key]], **kwargs) - hgroup.create_dataset("_%s" % key, data=[_key]) - - elif type(dd[key]) in (type(np.array([1])), type(np.ma.array([1]))): - - if hasattr(dd[key], 'mask'): - hgroup.create_dataset("_%s" % key, data=["masked"]) - hgroup.create_dataset("%s" % key, data=dd[key].data, **kwargs) - hgroup.create_dataset("_%s_mask" % key, data=dd[key].mask, **kwargs) - else: - hgroup.create_dataset("_%s" % key, data=["ndarray"]) - hgroup.create_dataset("%s" % key, data=dd[key], **kwargs) - - elif type(dd[key]) is list: - hgroup.create_dataset("%s" % key, data=dd[key], **kwargs) - hgroup.create_dataset("_%s" % key, data=["list"]) - - elif type(dd[key]) is tuple: - hgroup.create_dataset("%s" % key, data=dd[key], **kwargs) - hgroup.create_dataset("_%s" % key, data=["tuple"]) - - elif type(dd[key]) is set: - hgroup.create_dataset("%s" % key, data=list(dd[key]), **kwargs) - hgroup.create_dataset("_%s" % key, data=["set"]) - - elif isinstance(dd[key], dict): - new_group = hgroup.create_group("%s" % key) - _dump_dict(dd[key], new_group, **kwargs) - - elif type(dd[key]) is NoneType: - hgroup.create_dataset("%s" % key, data=[0], **kwargs) - hgroup.create_dataset("_%s" % key, data=["none"]) - - else: - if type(dd[key]).__module__ == np.__name__: - #print type(dd[key]) - hgroup.create_dataset("%s" % key, data=dd[key]) - hgroup.create_dataset("_%s" % key, data=["np_dtype"]) - #new_group = hgroup.create_group("%s" % key) - #dump_np_dtype_dict(dd[key], new_group) - else: - raise NoMatchError - - -def dump_dict(obj, h5f='', **kwargs): - """ dumps a dictionary to h5py file """ - h5f.create_dataset('type', data=['dict']) - hgroup = h5f.create_group('data') - _dump_dict(obj, hgroup, **kwargs) - - -def no_match(obj, h5f, *args, **kwargs): - """ If no match is made, raise an exception """ - try: - import dill as cPickle - except ImportError: - import cPickle - - pickled_obj = cPickle.dumps(obj) - h5f.create_dataset('type', data=['pickle']) - h5f.create_dataset('data', data=[pickled_obj]) - - print("Warning: %s type not understood, data have been serialized" % type(obj)) - #raise NoMatchError - - -def dumper_lookup(obj): - """ What type of object are we trying to pickle? - - This is a python dictionary based equivalent of a case statement. - It returns the correct helper function for a given data type. - """ - t = type(obj) - - types = { - list: dump_list, - tuple: dump_tuple, - set: dump_set, - dict: dump_dict, - str: dump_string, - unicode: dump_unicode, - NoneType: dump_none, - np.ndarray: dump_ndarray, - np.ma.core.MaskedArray: dump_masked, - np.float16: dump_np_dtype, - np.float32: dump_np_dtype, - np.float64: dump_np_dtype, - np.int8: dump_np_dtype, - np.int16: dump_np_dtype, - np.int32: dump_np_dtype, - np.int64: dump_np_dtype, - np.uint8: dump_np_dtype, - np.uint16: dump_np_dtype, - np.uint32: dump_np_dtype, - np.uint64: dump_np_dtype, - np.complex64: dump_np_dtype, - np.complex128: dump_np_dtype, - } - - match = types.get(t, no_match) - return match - - -def dump(obj, file, mode='w', track_times=True, **kwargs): - """ Write a pickled representation of obj to the open file object file. - - Parameters - ---------- - obj: object - python object o store in a Hickle - file: file object, filename string, or h5py.File object - file in which to store the object. A h5py.File or a filename is also acceptable. - mode: string - optional argument, 'r' (read only), 'w' (write) or 'a' (append). Ignored if file - is a file object. - compression: str - optional argument. Applies compression to dataset. Options: None, gzip, lzf (+ szip, - if installed) - track_times: bool - optional argument. If set to False, repeated hickling will produce identical files. - """ - - try: - # See what kind of object to dump - dumper = dumper_lookup(obj) - # Open the file - h5f = file_opener(file, mode, track_times) - print("dumping %s to file %s" % (type(obj), repr(h5f))) - dumper(obj, h5f, **kwargs) - h5f.close() - except NoMatchError: - fname = h5f.filename - h5f.close() - try: - os.remove(fname) - except: - print("Warning: dump failed. Could not remove %s" % fname) - finally: - raise NoMatchError - - -############# -## loaders ## -############# - -def load(file, safe=True): - """ Load a hickle file and reconstruct a python object - - Parameters - ---------- - file: file object, h5py.File, or filename string - - safe (bool): Disable automatic depickling of arbitrary python objects. - DO NOT set this to False unless the file is from a trusted source. - (see http://www.cs.jhu.edu/~s/musings/pickle.html for an explanation) - """ - - try: - h5f = file_opener(file) - dtype = h5f["type"][0] - - if dtype == 'dict': - group = h5f["data"] - data = load_dict(group) - elif dtype == 'pickle': - data = load_pickle(h5f, safe) - elif dtype == 'np_list': - group = h5f["data"] - data = load_np_list(group) - elif dtype == 'np_tuple': - group = h5f["data"] - data = load_np_tuple(group) - elif dtype == 'masked': - data = np.ma.array(h5f["data"][:], mask=h5f["mask"][:]) - elif dtype == 'none': - data = None - else: - if dtype in ('string', 'unicode'): - data = h5f["data"][0] - else: - try: - data = h5f["data"][:] - except ValueError: - data = h5f["data"] - types = { - 'list': list, - 'set': set, - 'unicode': unicode, - 'string': str, - 'ndarray': load_ndarray, - 'np_dtype': load_np_dtype - } - - mod = types.get(dtype, no_match) - data = mod(data) - finally: - if 'h5f' in locals(): - h5f.close() - return data - - -def load_pickle(h5f, safe=True): - """ Deserialize and load a pickled object within a hickle file - - WARNING: Pickle has - - Parameters - ---------- - h5f: h5py.File object - - safe (bool): Disable automatic depickling of arbitrary python objects. - DO NOT set this to False unless the file is from a trusted source. - (see http://www.cs.jhu.edu/~s/musings/pickle.html for an explanation) - """ - - if not safe: - try: - import dill as cPickle - except ImportError: - import cPickle - - data = h5f["data"][:] - data = cPickle.loads(data[0]) - return data - else: - print("\nWarning: Object is of an unknown type, and has not been loaded") - print(" for security reasons (it could be malicious code). If") - print(" you wish to continue, manually set safe=False\n") - - -def load_np_list(group): - """ load a numpy list """ - np_list = [] - for key in sorted(group.keys()): - data = group[key][:] - np_list.append(data) - return np_list - - -def load_np_tuple(group): - """ load a tuple containing numpy arrays """ - return tuple(load_np_list(group)) - - -def load_ndarray(arr): - """ Load a numpy array """ - # Nothing to be done! - return arr - - -def load_np_dtype(arr): - """ Load a numpy array """ - # Just return first value - return arr.value - - -def load_dict(group): - """ Load dictionary """ - - dd = {} - for key in group.keys(): - if isinstance(group[key], h5._hl.group.Group): - new_group = group[key] - dd[key] = load_dict(new_group) - elif not key.startswith("_"): - _key = "_%s" % key - - if group[_key][0] == 'np_dtype': - dd[key] = group[key].value - elif group[_key][0] in ('str', 'int', 'float', 'unicode', 'bool'): - dd[key] = group[key][0] - elif group[_key][0] == 'masked': - key_ma = "_%s_mask" % key - dd[key] = np.ma.array(group[key][:], mask=group[key_ma]) - else: - dd[key] = group[key][:] - - # Convert numpy constructs back to string - dtype = group[_key][0] - types = {'str': str, 'int': int, 'float': float, - 'unicode': unicode, 'bool': bool, 'list': list, 'none' : NoneType} - try: - mod = types.get(dtype) - if dtype == 'none': - dd[key] = None - else: - dd[key] = mod(dd[key]) - except: - pass - return dd - - -def load_large(file): - """ Load a large hickle file (returns the h5py object not the data) - - Parameters - ---------- - file: file object, h5py.File, or filename string - """ - - h5f = file_opener(file) - return h5f diff --git a/hickle/legacy_v3/__init__.py b/hickle/legacy_v3/__init__.py new file mode 100644 index 00000000..aa473ba2 --- /dev/null +++ b/hickle/legacy_v3/__init__.py @@ -0,0 +1,2 @@ +from .hickle import dump, load +from .__version__ import __version__ diff --git a/hickle/legacy_v3/__version__.py b/hickle/legacy_v3/__version__.py new file mode 100644 index 00000000..df6da975 --- /dev/null +++ b/hickle/legacy_v3/__version__.py @@ -0,0 +1,13 @@ +# -*- coding: utf-8 -*- + +""" +Hickle Version +============== +Stores the different versions of the *Hickle* package. + +""" + + +# %% VERSIONS +# Default/Latest/Current version +__version__ = '3.4.8' diff --git a/hickle/legacy_v3/helpers.py b/hickle/legacy_v3/helpers.py new file mode 100644 index 00000000..b7e0034e --- /dev/null +++ b/hickle/legacy_v3/helpers.py @@ -0,0 +1,113 @@ +import re +import six + +def get_type_and_data(h_node): + """ Helper function to return the py_type and data block for a HDF node """ + py_type = h_node.attrs["type"][0] + data = h_node[()] +# if h_node.shape == (): +# data = h_node.value +# else: +# data = h_node[:] + return py_type, data + +def get_type(h_node): + """ Helper function to return the py_type for a HDF node """ + py_type = h_node.attrs["type"][0] + return py_type + +def sort_keys(key_list): + """ Take a list of strings and sort it by integer value within string + + Args: + key_list (list): List of keys + + Returns: + key_list_sorted (list): List of keys, sorted by integer + """ + + # Py3 h5py returns an irritating KeysView object + # Py3 also complains about bytes and strings, convert all keys to bytes + if six.PY3: + key_list2 = [] + for key in key_list: + if isinstance(key, str): + key = bytes(key, 'ascii') + key_list2.append(key) + key_list = key_list2 + + # Check which keys contain a number + numbered_keys = [re.search(br'\d+', key) for key in key_list] + + # Sort the keys on number if they have it, or normally if not + if(len(key_list) and not numbered_keys.count(None)): + to_int = lambda x: int(re.search(br'\d+', x).group(0)) + return(sorted(key_list, key=to_int)) + else: + return(sorted(key_list)) + + +def check_is_iterable(py_obj): + """ Check whether a python object is iterable. + + Note: this treats unicode and string as NON ITERABLE + + Args: + py_obj: python object to test + + Returns: + iter_ok (bool): True if item is iterable, False is item is not + """ + if six.PY2: + string_types = (str, unicode) + else: + string_types = (str, bytes, bytearray) + if isinstance(py_obj, string_types): + return False + try: + iter(py_obj) + return True + except TypeError: + return False + + +def check_is_hashable(py_obj): + """ Check if a python object is hashable + + Note: this function is currently not used, but is useful for future + development. + + Args: + py_obj: python object to test + """ + + try: + py_obj.__hash__() + return True + except TypeError: + return False + + +def check_iterable_item_type(iter_obj): + """ Check if all items within an iterable are the same type. + + Args: + iter_obj: iterable object + + Returns: + iter_type: type of item contained within the iterable. If + the iterable has many types, a boolean False is returned instead. + + References: + http://stackoverflow.com/questions/13252333/python-check-if-all-elements-of-a-list-are-the-same-type + """ + iseq = iter(iter_obj) + + try: + first_type = type(next(iseq)) + except StopIteration: + return False + except Exception as ex: + return False + else: + return first_type if all((type(x) is first_type) for x in iseq) else False diff --git a/hickle/hickle_legacy2.py b/hickle/legacy_v3/hickle.py similarity index 53% rename from hickle/hickle_legacy2.py rename to hickle/legacy_v3/hickle.py index 4d018fde..0179835d 100644 --- a/hickle/hickle_legacy2.py +++ b/hickle/legacy_v3/hickle.py @@ -1,18 +1,41 @@ # encoding: utf-8 """ -# hickle_legacy2.py +# hickle.py Created by Danny Price 2016-02-03. -This is a legacy handler, for hickle v2 files. -If V3 reading fails, this will be called as a fail-over. +Hickle is a HDF5 based clone of Pickle. Instead of serializing to a pickle +file, Hickle dumps to a HDF5 file. It is designed to be as similar to pickle in +usage as possible, providing a load() and dump() function. + +## Notes + +Hickle has two main advantages over Pickle: +1) LARGE PICKLE HANDLING. Unpickling a large pickle is slow, as the Unpickler +reads the entire pickle thing and loads it into memory. In comparison, HDF5 +files are designed for large datasets. Things are only loaded when accessed. + +2) CROSS PLATFORM SUPPORT. Attempting to unpickle a pickle pickled on Windows +on Linux and vice versa is likely to fail with errors like "Insecure string +pickle". HDF5 files will load fine, as long as both machines have +h5py installed. """ +from __future__ import absolute_import, division, print_function +import sys import os +from pkg_resources import get_distribution, DistributionNotFound +from ast import literal_eval + import numpy as np import h5py as h5 -import re + + +from .__version__ import __version__ +from .helpers import get_type, sort_keys, check_is_iterable, check_iterable_item_type +from .lookup import (types_dict, hkl_types_dict, types_not_to_sort, + container_types_dict, container_key_types_dict, check_is_ndarray_like) try: from exceptions import Exception @@ -20,10 +43,24 @@ except ImportError: pass # above imports will fail in python3 -import warnings -__version__ = "2.0.4" -__author__ = "Danny Price" +from six import PY2, PY3, string_types, integer_types +import io +# Make several aliases for Python2/Python3 compatibility +if PY3: + file = io.TextIOWrapper + +# Import dill as pickle +import dill as pickle + +try: + from pathlib import Path + string_like_types = string_types + (Path,) +except ImportError: + # Python 2 does not have pathlib + string_like_types = string_types + +import warnings ################## # Error handling # @@ -69,6 +106,14 @@ def __str__(self): return "Error: this functionality hasn't been implemented yet." +class SerializedWarning(UserWarning): + """ An object type was not understood + + The data will be serialized using pickle. + """ + pass + + ###################### # H5PY file wrappers # ###################### @@ -118,85 +163,40 @@ def file_opener(f, mode='r', track_times=True): different times to produce identical files (e.g. for MD5 hash check). """ + + # Assume that we will have to close the file after dump or load + close_flag = True + # Were we handed a file object or just a file name string? - if isinstance(f, file): + if isinstance(f, (file, io.TextIOWrapper, io.BufferedWriter)): filename, mode = f.name, f.mode f.close() + mode = mode.replace('b', '') h5f = h5.File(filename, mode) - elif isinstance(f, str) or isinstance(f, unicode): + elif isinstance(f, string_like_types): filename = f h5f = h5.File(filename, mode) - elif isinstance(f, H5FileWrapper) or isinstance(f, h5._hl.files.File): + elif isinstance(f, (H5FileWrapper, h5._hl.files.File)): try: filename = f.filename except ValueError: - raise ClosedFileError() + raise ClosedFileError h5f = f + # Since this file was already open, do not close the file afterward + close_flag = False else: - print(type(f)) + print(f.__class__) raise FileError h5f.__class__ = H5FileWrapper h5f.track_times = track_times - return h5f + return(h5f, close_flag) ########### # DUMPERS # ########### -def check_is_iterable(py_obj): - """ Check whether a python object is iterable. - - Note: this treats unicode and string as NON ITERABLE - - Args: - py_obj: python object to test - - Returns: - iter_ok (bool): True if item is iterable, False is item is not - """ - if type(py_obj) in (str, unicode): - return False - try: - iter(py_obj) - return True - except TypeError: - return False - - -def check_iterable_item_type(iter_obj): - """ Check if all items within an iterable are the same type. - - Args: - iter_obj: iterable object - - Returns: - iter_type: type of item contained within the iterable. If - the iterable has many types, a boolean False is returned instead. - - References: - http://stackoverflow.com/questions/13252333/python-check-if-all-elements-of-a-list-are-the-same-type - """ - iseq = iter(iter_obj) - first_type = type(next(iseq)) - return first_type if all((type(x) is first_type) for x in iseq) else False - - -def check_is_numpy_array(py_obj): - """ Check if a python object is a numpy array (masked or regular) - - Args: - py_obj: python object to check whether it is a numpy array - - Returns - is_numpy (bool): Returns True if it is a numpy array, else False if it isn't - """ - - is_numpy = type(py_obj) in (type(np.array([1])), type(np.ma.array([1]))) - - return is_numpy - def _dump(py_obj, h_group, call_id=0, **kwargs): """ Dump a python object to a group within a HDF5 file. @@ -209,13 +209,20 @@ def _dump(py_obj, h_group, call_id=0, **kwargs): call_id (int): index to identify object's relative location in the iterable. """ - dumpable_dtypes = set([bool, int, float, long, complex, str, unicode]) + # Get list of dumpable dtypes + dumpable_dtypes = [] + for lst in [[bool, complex, bytes, float], string_types, integer_types]: + dumpable_dtypes.extend(lst) # Firstly, check if item is a numpy array. If so, just dump it. - if check_is_numpy_array(py_obj): + if check_is_ndarray_like(py_obj): create_hkl_dataset(py_obj, h_group, call_id, **kwargs) - # next, check if item is iterable + # Next, check if item is a dict + elif isinstance(py_obj, dict): + create_hkl_dataset(py_obj, h_group, call_id, **kwargs) + + # If not, check if item is iterable elif check_is_iterable(py_obj): item_type = check_iterable_item_type(py_obj) @@ -234,7 +241,6 @@ def _dump(py_obj, h_group, call_id=0, **kwargs): else: h_subgroup = create_hkl_group(py_obj, h_group, call_id) for ii, py_subobj in enumerate(py_obj): - #print py_subobj, h_subgroup, ii _dump(py_subobj, h_subgroup, call_id=ii, **kwargs) # item is not iterable, so create a dataset for it @@ -259,21 +265,28 @@ def dump(py_obj, file_obj, mode='w', track_times=True, path='/', **kwargs): path (str): path within hdf5 file to save data to. Defaults to root / """ + # Make sure that file is not closed unless modified + # This is to avoid trying to close a file that was never opened + close_flag = False + try: # Open the file - h5f = file_opener(file_obj, mode, track_times) - h5f.attrs["CLASS"] = 'hickle' - h5f.attrs["VERSION"] = 2 - h5f.attrs["type"] = ['hickle'] + h5f, close_flag = file_opener(file_obj, mode, track_times) + h5f.attrs["CLASS"] = b'hickle' + h5f.attrs["VERSION"] = __version__ + h5f.attrs["type"] = [b'hickle'] + # Log which version of python was used to generate the hickle file + pv = sys.version_info + py_ver = "%i.%i.%i" % (pv[0], pv[1], pv[2]) + h5f.attrs["PYTHON_VERSION"] = py_ver h_root_group = h5f.get(path) if h_root_group is None: h_root_group = h5f.create_group(path) - h_root_group.attrs["type"] = ['hickle'] + h_root_group.attrs["type"] = [b'hickle'] _dump(py_obj, h_root_group, **kwargs) - h5f.close() except NoMatchError: fname = h5f.filename h5f.close() @@ -283,6 +296,11 @@ def dump(py_obj, file_obj, mode='w', track_times=True, path='/', **kwargs): warnings.warn("Dump failed. Could not remove %s" % fname) finally: raise NoMatchError + finally: + # Close the file if requested. + # Closing a file twice will not cause any problems + if close_flag: + h5f.close() def create_dataset_lookup(py_obj): @@ -297,41 +315,15 @@ def create_dataset_lookup(py_obj): match: function that should be used to dump data to a new dataset """ t = type(py_obj) + types_lookup = {dict: create_dict_dataset} + types_lookup.update(types_dict) + + match = types_lookup.get(t, no_match) - types = { - dict: create_dict_dataset, - list: create_listlike_dataset, - tuple: create_listlike_dataset, - set: create_listlike_dataset, - str: create_stringlike_dataset, - unicode: create_stringlike_dataset, - int: create_python_dtype_dataset, - float: create_python_dtype_dataset, - long: create_python_dtype_dataset, - bool: create_python_dtype_dataset, - complex: create_python_dtype_dataset, - NoneType: create_none_dataset, - np.ndarray: create_np_array_dataset, - np.ma.core.MaskedArray: create_np_array_dataset, - np.float16: create_np_dtype_dataset, - np.float32: create_np_dtype_dataset, - np.float64: create_np_dtype_dataset, - np.int8: create_np_dtype_dataset, - np.int16: create_np_dtype_dataset, - np.int32: create_np_dtype_dataset, - np.int64: create_np_dtype_dataset, - np.uint8: create_np_dtype_dataset, - np.uint16: create_np_dtype_dataset, - np.uint32: create_np_dtype_dataset, - np.uint64: create_np_dtype_dataset, - np.complex64: create_np_dtype_dataset, - np.complex128: create_np_dtype_dataset - } - - match = types.get(t, no_match) return match + def create_hkl_dataset(py_obj, h_group, call_id=0, **kwargs): """ Create a dataset within the hickle HDF5 file @@ -357,114 +349,37 @@ def create_hkl_group(py_obj, h_group, call_id=0): """ h_subgroup = h_group.create_group('data_%i' % call_id) - h_subgroup.attrs["type"] = [str(type(py_obj))] + h_subgroup.attrs['type'] = [str(type(py_obj)).encode('ascii', 'ignore')] return h_subgroup -def create_listlike_dataset(py_obj, h_group, call_id=0, **kwargs): - """ Dumper for list, set, tuple - - Args: - py_obj: python object to dump; should be list-like - h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. - """ - dtype = str(type(py_obj)) - obj = list(py_obj) - d = h_group.create_dataset('data_%i' % call_id, data=obj, **kwargs) - d.attrs["type"] = [dtype] - - -def create_np_dtype_dataset(py_obj, h_group, call_id=0, **kwargs): - """ dumps an np dtype object to h5py file - - Args: - py_obj: python object to dump; should be a numpy scalar, e.g. np.float16(1) - h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. - """ - d = h_group.create_dataset('data_%i' % call_id, data=py_obj, **kwargs) - d.attrs["type"] = ['np_dtype'] - d.attrs["np_dtype"] = str(d.dtype) - - -def create_python_dtype_dataset(py_obj, h_group, call_id=0, **kwargs): - """ dumps a python dtype object to h5py file - - Args: - py_obj: python object to dump; should be a python type (int, float, bool etc) - h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. - """ - d = h_group.create_dataset('data_%i' % call_id, data=py_obj, - dtype=type(py_obj), **kwargs) - d.attrs["type"] = ['python_dtype'] - d.attrs['python_subdtype'] = str(type(py_obj)) - - def create_dict_dataset(py_obj, h_group, call_id=0, **kwargs): """ Creates a data group for each key in dictionary + Notes: + This is a very important function which uses the recursive _dump + method to build up hierarchical data models stored in the HDF5 file. + As this is critical to functioning, it is kept in the main hickle.py + file instead of in the loaders/ directory. + Args: py_obj: python object to dump; should be dictionary h_group (h5.File.group): group to dump data into. call_id (int): index to identify object's relative location in the iterable. """ h_dictgroup = h_group.create_group('data_%i' % call_id) - h_dictgroup.attrs["type"] = ['dict'] - for key, py_subobj in py_obj.items(): - h_subgroup = h_dictgroup.create_group(key) - h_subgroup.attrs["type"] = ['dict_item'] - _dump(py_subobj, h_subgroup, call_id=0, **kwargs) - - -def create_np_array_dataset(py_obj, h_group, call_id=0, **kwargs): - """ dumps an ndarray object to h5py file - - Args: - py_obj: python object to dump; should be a numpy array or np.ma.array (masked) - h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. - """ - if isinstance(py_obj, type(np.ma.array([1]))): - d = h_group.create_dataset('data_%i' % call_id, data=py_obj, **kwargs) - #m = h_group.create_dataset('mask_%i' % call_id, data=py_obj.mask, **kwargs) - m = h_group.create_dataset('data_%i_mask' % call_id, data=py_obj.mask, **kwargs) - d.attrs["type"] = ['ndarray_masked_data'] - m.attrs["type"] = ['ndarray_masked_mask'] - else: - d = h_group.create_dataset('data_%i' % call_id, data=py_obj, **kwargs) - d.attrs["type"] = ['ndarray'] - - -def create_stringlike_dataset(py_obj, h_group, call_id=0, **kwargs): - """ dumps a list object to h5py file - - Args: - py_obj: python object to dump; should be string-like (unicode or string) - h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. - """ - if isinstance(py_obj, str): - d = h_group.create_dataset('data_%i' % call_id, data=[py_obj], **kwargs) - d.attrs["type"] = ['string'] - else: - dt = h5.special_dtype(vlen=unicode) - dset = h_group.create_dataset('data_%i' % call_id, shape=(1, ), dtype=dt, **kwargs) - dset[0] = py_obj - dset.attrs['type'] = ['unicode'] + h_dictgroup.attrs['type'] = [str(type(py_obj)).encode('ascii', 'ignore')] + for key, py_subobj in py_obj.items(): + if isinstance(key, string_types): + h_subgroup = h_dictgroup.create_group("%r" % (key)) + else: + h_subgroup = h_dictgroup.create_group(str(key)) + h_subgroup.attrs["type"] = [b'dict_item'] -def create_none_dataset(py_obj, h_group, call_id=0, **kwargs): - """ Dump None type to file + h_subgroup.attrs["key_type"] = [str(type(key)).encode('ascii', 'ignore')] - Args: - py_obj: python object to dump; must be None object - h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. - """ - d = h_group.create_dataset('data_%i' % call_id, data=[0], **kwargs) - d.attrs["type"] = ['none'] + _dump(py_subobj, h_subgroup, call_id=0, **kwargs) def no_match(py_obj, h_group, call_id=0, **kwargs): @@ -475,17 +390,13 @@ def no_match(py_obj, h_group, call_id=0, **kwargs): h_group (h5.File.group): group to dump data into. call_id (int): index to identify object's relative location in the iterable. """ - try: - import dill as cPickle - except ImportError: - import cPickle - - pickled_obj = cPickle.dumps(py_obj) + pickled_obj = pickle.dumps(py_obj) d = h_group.create_dataset('data_%i' % call_id, data=[pickled_obj]) - d.attrs["type"] = ['pickle'] + d.attrs["type"] = [b'pickle'] + + warnings.warn("%s type not understood, data have been serialized" % type(py_obj), + SerializedWarning) - warnings.warn("%s type not understood, data have been " - "serialized" % type(py_obj)) ############# @@ -504,25 +415,54 @@ def __init__(self): super(PyContainer, self).__init__() self.container_type = None self.name = None + self.key_type = None def convert(self): """ Convert from PyContainer to python core data type. Returns: self, either as a list, tuple, set or dict + (or other type specified in lookup.py) """ - if self.container_type == "": - return list(self) - if self.container_type == "": - return tuple(self) - if self.container_type == "": - return set(self) - if self.container_type == "dict": - keys = [str(item.name.split('/')[-1]) for item in self] + + if self.container_type in container_types_dict.keys(): + convert_fn = container_types_dict[self.container_type] + return convert_fn(self) + if self.container_type == str(dict).encode('ascii', 'ignore'): + keys = [] + for item in self: + key = item.name.split('/')[-1] + key_type = item.key_type[0] + if key_type in container_key_types_dict.keys(): + to_type_fn = container_key_types_dict[key_type] + key = to_type_fn(key) + keys.append(key) + items = [item[0] for item in self] return dict(zip(keys, items)) else: return self +def no_match_load(key): + """ If no match is made when loading, need to raise an exception + """ + raise RuntimeError("Cannot load %s data type" % key) + #pass + +def load_dataset_lookup(key): + """ What type of object are we trying to unpickle? This is a python + dictionary based equivalent of a case statement. It returns the type + a given 'type' keyword in the hickle file. + + Args: + py_obj: python object to look-up what function to use to dump to disk + + Returns: + match: function that should be used to dump data to a new dataset + """ + + match = hkl_types_dict.get(key, no_match_load) + + return match def load(fileobj, path='/', safe=True): """ Load a hickle file and reconstruct a python object @@ -536,25 +476,70 @@ def load(fileobj, path='/', safe=True): path (str): path within hdf5 file to save data to. Defaults to root / """ + # Make sure that the file is not closed unless modified + # This is to avoid trying to close a file that was never opened + close_flag = False + try: - h5f = file_opener(fileobj) + h5f, close_flag = file_opener(fileobj) h_root_group = h5f.get(path) - try: assert 'CLASS' in h5f.attrs.keys() assert 'VERSION' in h5f.attrs.keys() - py_container = PyContainer() - py_container.container_type = 'hickle' - py_container = _load(py_container, h_root_group) - return py_container[0][0] + VER = h5f.attrs['VERSION'] + try: + VER_MAJOR = int(VER) + except ValueError: + VER_MAJOR = int(VER[0]) + if VER_MAJOR == 1: + if PY2: + warnings.warn("Hickle file versioned as V1, attempting legacy loading...") + from . import hickle_legacy + return hickle_legacy.load(fileobj, safe) + else: + raise RuntimeError("Cannot open file. This file was likely" + " created with Python 2 and an old hickle version.") + elif VER_MAJOR == 2: + if PY2: + warnings.warn("Hickle file appears to be old version (v2), attempting " + "legacy loading...") + from . import hickle_legacy2 + return hickle_legacy2.load(fileobj, path=path, safe=safe) + else: + raise RuntimeError("Cannot open file. This file was likely" + " created with Python 2 and an old hickle version.") + # There is an unfortunate period of time where hickle 2.1.0 claims VERSION = int(3) + # For backward compatibility we really need to catch this. + # Actual hickle v3 files are versioned as A.B.C (e.g. 3.1.0) + elif VER_MAJOR == 3 and VER == VER_MAJOR: + if PY2: + warnings.warn("Hickle file appears to be old version (v2.1.0), attempting " + "legacy loading...") + from . import hickle_legacy2 + return hickle_legacy2.load(fileobj, path=path, safe=safe) + else: + raise RuntimeError("Cannot open file. This file was likely" + " created with Python 2 and an old hickle version.") + elif VER_MAJOR >= 3: + py_container = PyContainer() + py_container.container_type = 'hickle' + py_container = _load(py_container, h_root_group) + return py_container[0][0] + except AssertionError: - import hickle_legacy - return hickle_legacy.load(fileobj, safe) + if PY2: + warnings.warn("Hickle file is not versioned, attempting legacy loading...") + from . import hickle_legacy + return hickle_legacy.load(fileobj, safe) + else: + raise RuntimeError("Cannot open file. This file was likely" + " created with Python 2 and an old hickle version.") finally: - if 'h5f' in locals(): + # Close the file if requested. + # Closing a file twice will not cause any problems + if close_flag: h5f.close() - def load_dataset(h_node): """ Load a dataset, converting into its correct python type @@ -564,72 +549,14 @@ def load_dataset(h_node): Returns: data: reconstructed python object from loaded data """ - py_type = h_node.attrs["type"][0] - - if h_node.shape == (): - data = h_node.value - else: - data = h_node[:] - - if py_type == "": - #print self.name - return list(data) - elif py_type == "": - return tuple(data) - elif py_type == "": - return set(data) - elif py_type == "np_dtype": - subtype = h_node.attrs["np_dtype"] - data = np.array(data, dtype=subtype) - return data - elif py_type == 'ndarray': - return np.array(data) - elif py_type == 'ndarray_masked_data': - try: - mask_path = h_node.name + "_mask" - h_root = h_node.parent - mask = h_root.get(mask_path)[:] - except IndexError: - mask = h_root.get(mask_path) - except ValueError: - mask = h_root.get(mask_path) - data = np.ma.array(data, mask=mask) - return data - elif py_type == 'python_dtype': - subtype = h_node.attrs["python_subdtype"] - type_dict = { - "": int, - "": float, - "": long, - "": bool, - "": complex - } - tcast = type_dict.get(subtype) - return tcast(data) - elif py_type == 'string': - return str(data[0]) - elif py_type == 'unicode': - return unicode(data[0]) - elif py_type == 'none': - return None - else: - print(h_node.name, py_type, h_node.attrs.keys()) - return data - - -def sort_keys(key_list): - """ Take a list of strings and sort it by integer value within string - - Args: - key_list (list): List of keys - - Returns: - key_list_sorted (list): List of keys, sorted by integer - """ - to_int = lambda x: int(re.search('\d+', x).group(0)) - keys_by_int = sorted([(to_int(key), key) for key in key_list]) - return [ii[1] for ii in keys_by_int] + py_type = get_type(h_node) + try: + load_fn = load_dataset_lookup(py_type) + return load_fn(h_node) + except: + raise + #raise RuntimeError("Hickle type %s not understood." % py_type) def _load(py_container, h_group): """ Load a hickle file @@ -646,12 +573,20 @@ def _load(py_container, h_group): dataset_dtype = h5._hl.dataset.Dataset #either a file, group, or dataset - if isinstance(h_group, H5FileWrapper) or isinstance(h_group, group_dtype): + if isinstance(h_group, (H5FileWrapper, group_dtype)): + py_subcontainer = PyContainer() - py_subcontainer.container_type = h_group.attrs['type'][0] + try: + py_subcontainer.container_type = bytes(h_group.attrs['type'][0]) + except KeyError: + raise + #py_subcontainer.container_type = '' py_subcontainer.name = h_group.name - if py_subcontainer.container_type != 'dict': + if py_subcontainer.container_type == b'dict_item': + py_subcontainer.key_type = h_group.attrs['key_type'] + + if py_subcontainer.container_type not in types_not_to_sort: h_keys = sort_keys(h_group.keys()) else: h_keys = h_group.keys() @@ -668,5 +603,4 @@ def _load(py_container, h_group): subdata = load_dataset(h_group) py_container.append(subdata) - #print h_group.name, py_container return py_container diff --git a/hickle/legacy_v3/loaders/__init__.py b/hickle/legacy_v3/loaders/__init__.py new file mode 100644 index 00000000..3be6bd29 --- /dev/null +++ b/hickle/legacy_v3/loaders/__init__.py @@ -0,0 +1 @@ +from __future__ import absolute_import \ No newline at end of file diff --git a/hickle/legacy_v3/loaders/load_astropy.py b/hickle/legacy_v3/loaders/load_astropy.py new file mode 100644 index 00000000..17ca3d4c --- /dev/null +++ b/hickle/legacy_v3/loaders/load_astropy.py @@ -0,0 +1,237 @@ +import numpy as np +from astropy.units import Quantity +from astropy.coordinates import Angle, SkyCoord +from astropy.constants import Constant, EMConstant +from astropy.table import Table +from astropy.time import Time + +from ..helpers import get_type_and_data +import six + +def create_astropy_quantity(py_obj, h_group, call_id=0, **kwargs): + """ dumps an astropy quantity + + Args: + py_obj: python object to dump; should be a python type (int, float, bool etc) + h_group (h5.File.group): group to dump data into. + call_id (int): index to identify object's relative location in the iterable. + """ + # kwarg compression etc does not work on scalars + d = h_group.create_dataset('data_%i' % call_id, data=py_obj.value, + dtype='float64') #, **kwargs) + d.attrs["type"] = [b'astropy_quantity'] + if six.PY3: + unit = bytes(str(py_obj.unit), 'ascii') + else: + unit = str(py_obj.unit) + d.attrs['unit'] = [unit] + +def create_astropy_angle(py_obj, h_group, call_id=0, **kwargs): + """ dumps an astropy quantity + + Args: + py_obj: python object to dump; should be a python type (int, float, bool etc) + h_group (h5.File.group): group to dump data into. + call_id (int): index to identify object's relative location in the iterable. + """ + # kwarg compression etc does not work on scalars + d = h_group.create_dataset('data_%i' % call_id, data=py_obj.value, + dtype='float64') #, **kwargs) + d.attrs["type"] = [b'astropy_angle'] + if six.PY3: + unit = str(py_obj.unit).encode('ascii') + else: + unit = str(py_obj.unit) + d.attrs['unit'] = [unit] + +def create_astropy_skycoord(py_obj, h_group, call_id=0, **kwargs): + """ dumps an astropy quantity + + Args: + py_obj: python object to dump; should be a python type (int, float, bool etc) + h_group (h5.File.group): group to dump data into. + call_id (int): index to identify object's relative location in the iterable. + """ + # kwarg compression etc does not work on scalars + lat = py_obj.data.lat.value + lon = py_obj.data.lon.value + dd = np.stack((lon, lat), axis=-1) + + d = h_group.create_dataset('data_%i' % call_id, data=dd, + dtype='float64') #, **kwargs) + d.attrs["type"] = [b'astropy_skycoord'] + if six.PY3: + lon_unit = str(py_obj.data.lon.unit).encode('ascii') + lat_unit = str(py_obj.data.lat.unit).encode('ascii') + else: + lon_unit = str(py_obj.data.lon.unit) + lat_unit = str(py_obj.data.lat.unit) + d.attrs['lon_unit'] = [lon_unit] + d.attrs['lat_unit'] = [lat_unit] + +def create_astropy_time(py_obj, h_group, call_id=0, **kwargs): + """ dumps an astropy Time object + + Args: + py_obj: python object to dump; should be a python type (int, float, bool etc) + h_group (h5.File.group): group to dump data into. + call_id (int): index to identify object's relative location in the iterable. + """ + + # kwarg compression etc does not work on scalars + data = py_obj.value + dtype = str(py_obj.value.dtype) + + # Need to catch string times + if '" : load_list_dataset, + "" : load_tuple_dataset + } + +3) container_types_dict +container_types_dict: mapping required to convert the PyContainer object in hickle.py + back into the required native type. PyContainer is required as + some iterable types are immutable (do not have an append() function). + Here is an example: + container_types_dict = { + "": list, + "": tuple + } + +4) container_key_types_dict +container_key_types_dict: mapping specifically for converting hickled dict data back into + a dictionary with the same key type. While python dictionary keys + can be any hashable object, in HDF5 a unicode/string is required + for a dataset name. Example: + container_key_types_dict = { + "": str, + "": unicode + } + +5) types_not_to_sort +type_not_to_sort is a list of hickle type attributes that may be hierarchical, +but don't require sorting by integer index. + +## Extending hickle to add support for other classes and types + +The process to add new load/dump capabilities is as follows: + +1) Create a file called load_[newstuff].py in loaders/ +2) In the load_[newstuff].py file, define your create_dataset and load_dataset functions, + along with all required mapping dictionaries. +3) Add an import call here, and populate the lookup dictionaries with update() calls: + # Add loaders for [newstuff] + try: + from .loaders.load_[newstuff[ import types_dict as ns_types_dict + from .loaders.load_[newstuff[ import hkl_types_dict as ns_hkl_types_dict + types_dict.update(ns_types_dict) + hkl_types_dict.update(ns_hkl_types_dict) + ... (Add container_types_dict etc if required) + except ImportError: + raise +""" + +import six +from ast import literal_eval + +def return_first(x): + """ Return first element of a list """ + return x[0] + +def load_nothing(h_hode): + pass + +types_dict = {} + +hkl_types_dict = {} + +types_not_to_sort = [b'dict', b'csr_matrix', b'csc_matrix', b'bsr_matrix'] + +container_types_dict = { + b"": list, + b"": tuple, + b"": set, + b"": list, + b"": tuple, + b"": set, + b"csr_matrix": return_first, + b"csc_matrix": return_first, + b"bsr_matrix": return_first + } + +# Technically, any hashable object can be used, for now sticking with built-in types +container_key_types_dict = { + b"": literal_eval, + b"": float, + b"": bool, + b"": int, + b"": complex, + b"": literal_eval, + b"": literal_eval, + b"": float, + b"": bool, + b"": int, + b"": complex, + b"": literal_eval + } + +if six.PY2: + container_key_types_dict[b""] = literal_eval + container_key_types_dict[b""] = long + +# Add loaders for built-in python types +if six.PY2: + from .loaders.load_python import types_dict as py_types_dict + from .loaders.load_python import hkl_types_dict as py_hkl_types_dict +else: + from .loaders.load_python3 import types_dict as py_types_dict + from .loaders.load_python3 import hkl_types_dict as py_hkl_types_dict + +types_dict.update(py_types_dict) +hkl_types_dict.update(py_hkl_types_dict) + +# Add loaders for numpy types +from .loaders.load_numpy import types_dict as np_types_dict +from .loaders.load_numpy import hkl_types_dict as np_hkl_types_dict +from .loaders.load_numpy import check_is_numpy_array +types_dict.update(np_types_dict) +hkl_types_dict.update(np_hkl_types_dict) + +####################### +## ND-ARRAY checking ## +####################### + +ndarray_like_check_fns = [ + check_is_numpy_array +] + +def check_is_ndarray_like(py_obj): + is_ndarray_like = False + for ii, check_fn in enumerate(ndarray_like_check_fns): + is_ndarray_like = check_fn(py_obj) + if is_ndarray_like: + break + return is_ndarray_like + + + + +####################### +## loading optional ## +####################### + +def register_class(myclass_type, hkl_str, dump_function, load_function, + to_sort=True, ndarray_check_fn=None): + """ Register a new hickle class. + + Args: + myclass_type type(class): type of class + dump_function (function def): function to write data to HDF5 + load_function (function def): function to load data from HDF5 + is_iterable (bool): Is the item iterable? + hkl_str (str): String to write to HDF5 file to describe class + to_sort (bool): If the item is iterable, does it require sorting? + ndarray_check_fn (function def): function to use to check if + + """ + types_dict.update({myclass_type: dump_function}) + hkl_types_dict.update({hkl_str: load_function}) + if to_sort == False: + types_not_to_sort.append(hkl_str) + if ndarray_check_fn is not None: + ndarray_like_check_fns.append(ndarray_check_fn) + +def register_class_list(class_list): + """ Register multiple classes in a list + + Args: + class_list (list): A list, where each item is an argument to + the register_class() function. + + Notes: This just runs the code: + for item in mylist: + register_class(*item) + """ + for class_item in class_list: + register_class(*class_item) + +def register_class_exclude(hkl_str_to_ignore): + """ Tell loading funciton to ignore any HDF5 dataset with attribute 'type=XYZ' + + Args: + hkl_str_to_ignore (str): attribute type=string to ignore and exclude from loading. + """ + hkl_types_dict[hkl_str_to_ignore] = load_nothing + +def register_exclude_list(exclude_list): + """ Ignore HDF5 datasets with attribute type='XYZ' from loading + + ArgsL + exclude_list (list): List of strings, which correspond to hdf5/hickle + type= attributes not to load. + """ + for hkl_str in exclude_list: + register_class_exclude(hkl_str) + +######################## +## Scipy sparse array ## +######################## + +try: + from .loaders.load_scipy import class_register, exclude_register + register_class_list(class_register) + register_exclude_list(exclude_register) +except ImportError: + pass +except NameError: + pass + +#################### +## Astropy stuff ## +#################### + +try: + from .loaders.load_astropy import class_register + register_class_list(class_register) +except ImportError: + pass + +################## +## Pandas stuff ## +################## + +try: + from .loaders.load_pandas import class_register + register_class_list(class_register) +except ImportError: + pass diff --git a/hickle/loaders/__init__.py b/hickle/loaders/__init__.py index 3be6bd29..e69de29b 100644 --- a/hickle/loaders/__init__.py +++ b/hickle/loaders/__init__.py @@ -1 +0,0 @@ -from __future__ import absolute_import \ No newline at end of file diff --git a/hickle/loaders/load_astropy.py b/hickle/loaders/load_astropy.py index dd8efce6..7857d78e 100644 --- a/hickle/loaders/load_astropy.py +++ b/hickle/loaders/load_astropy.py @@ -1,84 +1,87 @@ -import numpy as np -from astropy.units import Quantity +# %% IMPORTS +# Package imports from astropy.coordinates import Angle, SkyCoord -from astropy.constants import Constant, EMConstant +from astropy.constants import Constant from astropy.table import Table from astropy.time import Time +from astropy.units import Quantity +import numpy as np +# hickle imports from hickle.helpers import get_type_and_data -import six -def create_astropy_quantity(py_obj, h_group, call_id=0, **kwargs): + +# %% FUNCTION DEFINITIONS +def create_astropy_quantity(py_obj, h_group, name, **kwargs): """ dumps an astropy quantity Args: - py_obj: python object to dump; should be a python type (int, float, bool etc) + py_obj: python object to dump; should be a python type (int, float, + bool etc) h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - # kwarg compression etc does not work on scalars - d = h_group.create_dataset('data_%i' % call_id, data=py_obj.value, - dtype='float64') #, **kwargs) - d.attrs["type"] = [b'astropy_quantity'] - if six.PY3: - unit = bytes(str(py_obj.unit), 'ascii') - else: - unit = str(py_obj.unit) - d.attrs['unit'] = [unit] -def create_astropy_angle(py_obj, h_group, call_id=0, **kwargs): + d = h_group.create_dataset(name, data=py_obj.value, dtype='float64', + **kwargs) + unit = bytes(str(py_obj.unit), 'ascii') + d.attrs['unit'] = unit + return(d) + + +def create_astropy_angle(py_obj, h_group, name, **kwargs): """ dumps an astropy quantity Args: - py_obj: python object to dump; should be a python type (int, float, bool etc) + py_obj: python object to dump; should be a python type (int, float, + bool etc) h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - # kwarg compression etc does not work on scalars - d = h_group.create_dataset('data_%i' % call_id, data=py_obj.value, - dtype='float64') #, **kwargs) - d.attrs["type"] = [b'astropy_angle'] - if six.PY3: - unit = str(py_obj.unit).encode('ascii') - else: - unit = str(py_obj.unit) - d.attrs['unit'] = [unit] -def create_astropy_skycoord(py_obj, h_group, call_id=0, **kwargs): + d = h_group.create_dataset(name, data=py_obj.value, dtype='float64', + **kwargs) + unit = str(py_obj.unit).encode('ascii') + d.attrs['unit'] = unit + return(d) + + +def create_astropy_skycoord(py_obj, h_group, name, **kwargs): """ dumps an astropy quantity Args: - py_obj: python object to dump; should be a python type (int, float, bool etc) + py_obj: python object to dump; should be a python type (int, float, + bool etc) h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - # kwarg compression etc does not work on scalars + lat = py_obj.data.lat.value lon = py_obj.data.lon.value - dd = np.column_stack((lon, lat)) - - d = h_group.create_dataset('data_%i' % call_id, data=dd, - dtype='float64') #, **kwargs) - d.attrs["type"] = [b'astropy_skycoord'] - if six.PY3: - lon_unit = str(py_obj.data.lon.unit).encode('ascii') - lat_unit = str(py_obj.data.lat.unit).encode('ascii') - else: - lon_unit = str(py_obj.data.lon.unit) - lat_unit = str(py_obj.data.lat.unit) - d.attrs['lon_unit'] = [lon_unit] - d.attrs['lat_unit'] = [lat_unit] + dd = np.stack((lon, lat), axis=-1) + + d = h_group.create_dataset(name, data=dd, dtype='float64', **kwargs) + lon_unit = str(py_obj.data.lon.unit).encode('ascii') + lat_unit = str(py_obj.data.lat.unit).encode('ascii') + d.attrs['lon_unit'] = lon_unit + d.attrs['lat_unit'] = lat_unit + return(d) + -def create_astropy_time(py_obj, h_group, call_id=0, **kwargs): +def create_astropy_time(py_obj, h_group, name, **kwargs): """ dumps an astropy Time object Args: - py_obj: python object to dump; should be a python type (int, float, bool etc) + py_obj: python object to dump; should be a python type (int, float, + bool etc) h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - # kwarg compression etc does not work on scalars data = py_obj.value dtype = str(py_obj.value.dtype) @@ -90,148 +93,143 @@ def create_astropy_time(py_obj, h_group, call_id=0, **kwargs): for item in py_obj.value: data.append(str(item).encode('ascii')) - d = h_group.create_dataset('data_%i' % call_id, data=data, dtype=dtype) #, **kwargs) - d.attrs["type"] = [b'astropy_time'] - if six.PY2: - fmt = str(py_obj.format) - scale = str(py_obj.scale) - else: - fmt = str(py_obj.format).encode('ascii') - scale = str(py_obj.scale).encode('ascii') - d.attrs['format'] = [fmt] - d.attrs['scale'] = [scale] + d = h_group.create_dataset(name, data=data, dtype=dtype, **kwargs) + fmt = str(py_obj.format).encode('ascii') + scale = str(py_obj.scale).encode('ascii') + d.attrs['format'] = fmt + d.attrs['scale'] = scale + + return(d) -def create_astropy_constant(py_obj, h_group, call_id=0, **kwargs): + +def create_astropy_constant(py_obj, h_group, name, **kwargs): """ dumps an astropy constant Args: - py_obj: python object to dump; should be a python type (int, float, bool etc) + py_obj: python object to dump; should be a python type (int, float, + bool etc) h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - # kwarg compression etc does not work on scalars - d = h_group.create_dataset('data_%i' % call_id, data=py_obj.value, - dtype='float64') #, **kwargs) - d.attrs["type"] = [b'astropy_constant'] - d.attrs["unit"] = [str(py_obj.unit)] - d.attrs["abbrev"] = [str(py_obj.abbrev)] - d.attrs["name"] = [str(py_obj.name)] - d.attrs["reference"] = [str(py_obj.reference)] - d.attrs["uncertainty"] = [py_obj.uncertainty] + + d = h_group.create_dataset(name, data=py_obj.value, dtype='float64', + **kwargs) + d.attrs["unit"] = str(py_obj.unit) + d.attrs["abbrev"] = str(py_obj.abbrev) + d.attrs["name"] = str(py_obj.name) + d.attrs["reference"] = str(py_obj.reference) + d.attrs["uncertainty"] = py_obj.uncertainty if py_obj.system: - d.attrs["system"] = [py_obj.system] + d.attrs["system"] = py_obj.system + return(d) -def create_astropy_table(py_obj, h_group, call_id=0, **kwargs): +def create_astropy_table(py_obj, h_group, name, **kwargs): """ Dump an astropy Table Args: - py_obj: python object to dump; should be a python type (int, float, bool etc) + py_obj: python object to dump; should be a python type (int, float, + bool etc) h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ data = py_obj.as_array() - d = h_group.create_dataset('data_%i' % call_id, data=data, dtype=data.dtype, **kwargs) - d.attrs['type'] = [b'astropy_table'] + d = h_group.create_dataset(name, data=data, dtype=data.dtype, **kwargs) - if six.PY3: - colnames = [bytes(cn, 'ascii') for cn in py_obj.colnames] - else: - colnames = py_obj.colnames + colnames = [bytes(cn, 'ascii') for cn in py_obj.colnames] d.attrs['colnames'] = colnames for key, value in py_obj.meta.items(): - d.attrs[key] = value + d.attrs[key] = value + return(d) def load_astropy_quantity_dataset(h_node): - py_type, data = get_type_and_data(h_node) - unit = h_node.attrs["unit"][0] - q = Quantity(data, unit) + py_type, _, data = get_type_and_data(h_node) + unit = h_node.attrs["unit"] + q = py_type(data, unit, copy=False) return q + def load_astropy_time_dataset(h_node): - py_type, data = get_type_and_data(h_node) - if six.PY3: - fmt = h_node.attrs["format"][0].decode('ascii') - scale = h_node.attrs["scale"][0].decode('ascii') - else: - fmt = h_node.attrs["format"][0] - scale = h_node.attrs["scale"][0] - q = Time(data, format=fmt, scale=scale) + py_type, _, data = get_type_and_data(h_node) + fmt = h_node.attrs["format"].decode('ascii') + scale = h_node.attrs["scale"].decode('ascii') + q = py_type(data, format=fmt, scale=scale) return q + def load_astropy_angle_dataset(h_node): - py_type, data = get_type_and_data(h_node) - unit = h_node.attrs["unit"][0] - q = Angle(data, unit) + py_type, _, data = get_type_and_data(h_node) + unit = h_node.attrs["unit"] + q = py_type(data, unit) return q + def load_astropy_skycoord_dataset(h_node): - py_type, data = get_type_and_data(h_node) - lon_unit = h_node.attrs["lon_unit"][0] - lat_unit = h_node.attrs["lat_unit"][0] - q = SkyCoord(data[:,0], data[:, 1], unit=(lon_unit, lat_unit)) + py_type, _, data = get_type_and_data(h_node) + lon_unit = h_node.attrs["lon_unit"] + lat_unit = h_node.attrs["lat_unit"] + q = py_type(data[..., 0], data[..., 1], unit=(lon_unit, lat_unit)) return q + def load_astropy_constant_dataset(h_node): - py_type, data = get_type_and_data(h_node) - unit = h_node.attrs["unit"][0] - abbrev = h_node.attrs["abbrev"][0] - name = h_node.attrs["name"][0] - ref = h_node.attrs["reference"][0] - unc = h_node.attrs["uncertainty"][0] + py_type, _, data = get_type_and_data(h_node) + unit = h_node.attrs["unit"] + abbrev = h_node.attrs["abbrev"] + name = h_node.attrs["name"] + ref = h_node.attrs["reference"] + unc = h_node.attrs["uncertainty"] system = None if "system" in h_node.attrs.keys(): - system = h_node.attrs["system"][0] + system = h_node.attrs["system"] - c = Constant(abbrev, name, data, unit, unc, ref, system) + c = py_type(abbrev, name, data, unit, unc, ref, system) return c + def load_astropy_table(h_node): - py_type, data = get_type_and_data(h_node) + py_type, _, data = get_type_and_data(h_node) metadata = dict(h_node.attrs.items()) metadata.pop('type') + metadata.pop('base_type') metadata.pop('colnames') - if six.PY3: - colnames = [cn.decode('ascii') for cn in h_node.attrs["colnames"]] - else: - colnames = h_node.attrs["colnames"] + colnames = [cn.decode('ascii') for cn in h_node.attrs["colnames"]] - t = Table(data, names=colnames, meta=metadata) + t = py_type(data, names=colnames, meta=metadata) return t + def check_is_astropy_table(py_obj): return isinstance(py_obj, Table) + def check_is_astropy_quantity_array(py_obj): - if isinstance(py_obj, Quantity) or isinstance(py_obj, Time) or \ - isinstance(py_obj, Angle) or isinstance(py_obj, SkyCoord): - if py_obj.isscalar: - return False - else: - return True + if(isinstance(py_obj, (Quantity, Time, Angle, SkyCoord)) and + not py_obj.isscalar): + return(True) else: - return False - + return(False) -##################### -# Lookup dictionary # -##################### +# %% REGISTERS class_register = [ - [Quantity, b'astropy_quantity', create_astropy_quantity, load_astropy_quantity_dataset, - True, check_is_astropy_quantity_array], - [Time, b'astropy_time', create_astropy_time, load_astropy_time_dataset, - True, check_is_astropy_quantity_array], - [Angle, b'astropy_angle', create_astropy_angle, load_astropy_angle_dataset, - True, check_is_astropy_quantity_array], - [SkyCoord, b'astropy_skycoord', create_astropy_skycoord, load_astropy_skycoord_dataset, - True, check_is_astropy_quantity_array], - [Constant, b'astropy_constant', create_astropy_constant, load_astropy_constant_dataset, - True, None], - [Table, b'astropy_table', create_astropy_table, load_astropy_table, - True, check_is_astropy_table] -] + [Quantity, b'astropy_quantity', create_astropy_quantity, + load_astropy_quantity_dataset, check_is_astropy_quantity_array], + [Time, b'astropy_time', create_astropy_time, load_astropy_time_dataset, + check_is_astropy_quantity_array], + [Angle, b'astropy_angle', create_astropy_angle, load_astropy_angle_dataset, + check_is_astropy_quantity_array], + [SkyCoord, b'astropy_skycoord', create_astropy_skycoord, + load_astropy_skycoord_dataset, check_is_astropy_quantity_array], + [Constant, b'astropy_constant', create_astropy_constant, + load_astropy_constant_dataset], + [Table, b'astropy_table', create_astropy_table, load_astropy_table, + check_is_astropy_table]] + +exclude_register = [] diff --git a/hickle/loaders/load_builtins.py b/hickle/loaders/load_builtins.py new file mode 100644 index 00000000..d9c21b03 --- /dev/null +++ b/hickle/loaders/load_builtins.py @@ -0,0 +1,169 @@ +# encoding: utf-8 +""" +# load_python.py + +Handlers for dumping and loading built-in python types. +NB: As these are for built-in types, they are critical to the functioning of +hickle. + +""" + + +# %% IMPORTS +# Built-in imports +import warnings + +# Package imports +import dill as pickle +import numpy as np + +# hickle imports +from hickle.helpers import get_type_and_data + + +# %% CLASS DEFINITIONS +class SerializedWarning(UserWarning): + """ An object type was not understood + + The data will be serialized using pickle. + """ + pass + + +# %% FUNCTION DEFINITIONS +def create_listlike_dataset(py_obj, h_group, name, **kwargs): + """ Dumper for list, set, tuple + + Args: + py_obj: python object to dump; should be list-like + h_group (h5.File.group): group to dump data into. + call_id (int): index to identify object's relative location in the + iterable. + """ + + obj = list(py_obj) + + # h5py does not handle Py3 'str' objects well. Need to catch this + # Only need to check first element as this method + # is only called if all elements have same dtype + str_type = None + if type(obj[0]) in (str, bytes): + str_type = bytes(type(obj[0]).__name__, 'ascii') + + if type(obj[0]) is str: + obj = [bytes(oo, 'utf8') for oo in obj] + + d = h_group.create_dataset(name, data=obj, **kwargs) + + # Need to add some metadata to aid in unpickling if it's a string type + if str_type is not None: + d.attrs["str_type"] = str_type + return(d) + + +def create_scalar_dataset(py_obj, h_group, name, **kwargs): + """ dumps a python dtype object to h5py file + + Args: + py_obj: python object to dump; should be a scalar (int, float, + bool, str, etc) + h_group (h5.File.group): group to dump data into. + call_id (int): index to identify object's relative location in the + iterable. + """ + + # Make sure 'compression' is not in kwargs + kwargs.pop('compression', None) + + # If py_obj is an integer and cannot be stored in 64-bits, convert to str + if isinstance(py_obj, int) and (py_obj.bit_length() > 64): + py_obj = bytes(str(py_obj), 'ascii') + + d = h_group.create_dataset(name, data=py_obj, **kwargs) + return(d) + + +def create_none_dataset(py_obj, h_group, name, **kwargs): + """ Dump None type to file + + Args: + py_obj: python object to dump; must be None object + h_group (h5.File.group): group to dump data into. + call_id (int): index to identify object's relative location in the + iterable. + """ + d = h_group.create_dataset(name, data=b'None', **kwargs) + return(d) + + +def create_pickled_dataset(py_obj, h_group, name, reason=None, **kwargs): + """ If no match is made, raise a warning + + Args: + py_obj: python object to dump; default if item is not matched. + h_group (h5.File.group): group to dump data into. + call_id (int): index to identify object's relative location in the + iterable. + """ + reason_str = " (Reason: %s)" % (reason) if reason is not None else "" + pickled_obj = pickle.dumps(py_obj) + d = h_group.create_dataset(name, data=np.array(pickled_obj), **kwargs) + + warnings.warn("%r type not understood, data has been serialized%s" + % (py_obj.__class__.__name__, reason_str), SerializedWarning) + return(d) + + +def load_list_dataset(h_node): + _, _, data = get_type_and_data(h_node) + str_type = h_node.attrs.get('str_type', None) + + if str_type == b'str': + return(np.array(data, copy=False, dtype=str).tolist()) + else: + return(data.tolist()) + + +def load_tuple_dataset(h_node): + data = load_list_dataset(h_node) + return tuple(data) + + +def load_set_dataset(h_node): + data = load_list_dataset(h_node) + return set(data) + + +def load_none_dataset(h_node): + return None + + +def load_pickled_data(h_node): + _, _, data = get_type_and_data(h_node) + return pickle.loads(data) + + +def load_scalar_dataset(h_node): + _, base_type, data = get_type_and_data(h_node) + + if(base_type == b'int'): + data = int(data) + + return(data) + + +# %% REGISTERS +class_register = [ + [list, b"list", create_listlike_dataset, load_list_dataset], + [tuple, b"tuple", create_listlike_dataset, load_tuple_dataset], + [set, b"set", create_listlike_dataset, load_set_dataset], + [bytes, b"bytes", create_scalar_dataset, load_scalar_dataset], + [str, b"str", create_scalar_dataset, load_scalar_dataset], + [int, b"int", create_scalar_dataset, load_scalar_dataset], + [float, b"float", create_scalar_dataset, load_scalar_dataset], + [complex, b"complex", create_scalar_dataset, load_scalar_dataset], + [bool, b"bool", create_scalar_dataset, load_scalar_dataset], + [type(None), b"None", create_none_dataset, load_none_dataset], + [object, b"pickle", create_pickled_dataset, load_pickled_data]] + +exclude_register = [] diff --git a/hickle/loaders/load_numpy.py b/hickle/loaders/load_numpy.py index 7a31b12e..38c40a86 100644 --- a/hickle/loaders/load_numpy.py +++ b/hickle/loaders/load_numpy.py @@ -5,13 +5,18 @@ Utilities and dump / load handlers for handling numpy and scipy arrays """ -import six -import numpy as np +# %% IMPORTS +# Package imports +import numpy as np +import dill as pickle +# hickle imports from hickle.helpers import get_type_and_data +from hickle.hickle import _dump +# %% FUNCTION DEFINITIONS def check_is_numpy_array(py_obj): """ Check if a python object is a numpy array (masked or regular) @@ -19,127 +24,133 @@ def check_is_numpy_array(py_obj): py_obj: python object to check whether it is a numpy array Returns - is_numpy (bool): Returns True if it is a numpy array, else False if it isn't + is_numpy (bool): Returns True if it is a numpy array, else False if it + isn't """ - is_numpy = type(py_obj) in (type(np.array([1])), type(np.ma.array([1]))) - - return is_numpy + return(isinstance(py_obj, np.ndarray)) -def create_np_scalar_dataset(py_obj, h_group, call_id=0, **kwargs): +def create_np_scalar_dataset(py_obj, h_group, name, **kwargs): """ dumps an np dtype object to h5py file Args: - py_obj: python object to dump; should be a numpy scalar, e.g. np.float16(1) + py_obj: python object to dump; should be a numpy scalar, e.g. + np.float16(1) h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - # DO NOT PASS KWARGS TO SCALAR DATASETS! - d = h_group.create_dataset('data_%i' % call_id, data=py_obj) # **kwargs) - d.attrs["type"] = [b'np_scalar'] + d = h_group.create_dataset(name, data=py_obj, **kwargs) - if six.PY2: - d.attrs["np_dtype"] = str(d.dtype) - else: - d.attrs["np_dtype"] = bytes(str(d.dtype), 'ascii') + d.attrs["np_dtype"] = bytes(str(d.dtype), 'ascii') + return(d) -def create_np_dtype(py_obj, h_group, call_id=0, **kwargs): +def create_np_dtype(py_obj, h_group, name, **kwargs): """ dumps an np dtype object to h5py file Args: - py_obj: python object to dump; should be a numpy scalar, e.g. np.float16(1) + py_obj: python object to dump; should be a numpy dtype, e.g. + np.float16 h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - d = h_group.create_dataset('data_%i' % call_id, data=[str(py_obj)]) - d.attrs["type"] = [b'np_dtype'] + d = h_group.create_dataset(name, data=str(py_obj), **kwargs) + return(d) -def create_np_array_dataset(py_obj, h_group, call_id=0, **kwargs): +def create_np_array_dataset(py_obj, h_group, name, **kwargs): """ dumps an ndarray object to h5py file Args: - py_obj: python object to dump; should be a numpy array or np.ma.array (masked) + py_obj: python object to dump; should be a numpy array or np.ma.array + (masked) h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - if isinstance(py_obj, type(np.ma.array([1]))): - d = h_group.create_dataset('data_%i' % call_id, data=py_obj, **kwargs) - #m = h_group.create_dataset('mask_%i' % call_id, data=py_obj.mask, **kwargs) - m = h_group.create_dataset('data_%i_mask' % call_id, data=py_obj.mask, **kwargs) - d.attrs["type"] = [b'ndarray_masked_data'] - m.attrs["type"] = [b'ndarray_masked_mask'] + + # Obtain dtype of py_obj + dtype = str(py_obj.dtype) + + # Check if py_obj contains strings + if '": int, - "": float, - "": long, - "": bool, - "": complex - } - tcast = type_dict.get(subtype) - return tcast(data) - -types_dict = { - list: create_listlike_dataset, - tuple: create_listlike_dataset, - set: create_listlike_dataset, - str: create_stringlike_dataset, - unicode: create_stringlike_dataset, - int: create_python_dtype_dataset, - float: create_python_dtype_dataset, - long: create_python_dtype_dataset, - bool: create_python_dtype_dataset, - complex: create_python_dtype_dataset, - NoneType: create_none_dataset, -} - -hkl_types_dict = { - "" : load_list_dataset, - "" : load_tuple_dataset, - "" : load_set_dataset, - "python_dtype" : load_python_dtype_dataset, - "string" : load_string_dataset, - "unicode" : load_unicode_dataset, - "none" : load_none_dataset -} - diff --git a/hickle/loaders/load_scipy.py b/hickle/loaders/load_scipy.py index ab09fe23..06298def 100644 --- a/hickle/loaders/load_scipy.py +++ b/hickle/loaders/load_scipy.py @@ -1,9 +1,20 @@ -import six +# %% IMPORTS +# Package imports +import dill as pickle +import numpy as np import scipy from scipy import sparse +# hickle imports from hickle.helpers import get_type_and_data + +# %% FUNCTION DEFINITIONS +def return_first(x): + """ Return first element of a list """ + return x[0] + + def check_is_scipy_sparse_array(py_obj): """ Check if a python object is a scipy sparse array @@ -11,74 +22,83 @@ def check_is_scipy_sparse_array(py_obj): py_obj: python object to check whether it is a sparse array Returns - is_numpy (bool): Returns True if it is a sparse array, else False if it isn't + is_numpy (bool): Returns True if it is a sparse array, else False if it + isn't """ - t_csr = type(scipy.sparse.csr_matrix([0])) - t_csc = type(scipy.sparse.csc_matrix([0])) - t_bsr = type(scipy.sparse.bsr_matrix([0])) + t_csr = sparse.csr_matrix + t_csc = sparse.csc_matrix + t_bsr = sparse.bsr_matrix is_sparse = type(py_obj) in (t_csr, t_csc, t_bsr) return is_sparse -def create_sparse_dataset(py_obj, h_group, call_id=0, **kwargs): +def create_sparse_dataset(py_obj, h_group, name, **kwargs): """ dumps an sparse array to h5py file Args: - py_obj: python object to dump; should be a numpy array or np.ma.array (masked) + py_obj: python object to dump; should be a numpy array or np.ma.array + (masked) h_group (h5.File.group): group to dump data into. - call_id (int): index to identify object's relative location in the iterable. + call_id (int): index to identify object's relative location in the + iterable. """ - h_sparsegroup = h_group.create_group('data_%i' % call_id) + h_sparsegroup = h_group.create_group(name) data = h_sparsegroup.create_dataset('data', data=py_obj.data, **kwargs) - indices = h_sparsegroup.create_dataset('indices', data=py_obj.indices, **kwargs) - indptr = h_sparsegroup.create_dataset('indptr', data=py_obj.indptr, **kwargs) + indices = h_sparsegroup.create_dataset('indices', data=py_obj.indices, + **kwargs) + indptr = h_sparsegroup.create_dataset('indptr', data=py_obj.indptr, + **kwargs) shape = h_sparsegroup.create_dataset('shape', data=py_obj.shape, **kwargs) - if isinstance(py_obj, type(sparse.csr_matrix([0]))): + if isinstance(py_obj, sparse.csr_matrix): type_str = 'csr' - elif isinstance(py_obj, type(sparse.csc_matrix([0]))): + elif isinstance(py_obj, sparse.csc_matrix): type_str = 'csc' - elif isinstance(py_obj, type(sparse.bsr_matrix([0]))): + elif isinstance(py_obj, sparse.bsr_matrix): type_str = 'bsr' - if six.PY2: - h_sparsegroup.attrs["type"] = [b'%s_matrix' % type_str] - data.attrs["type"] = [b"%s_matrix_data" % type_str] - indices.attrs["type"] = [b"%s_matrix_indices" % type_str] - indptr.attrs["type"] = [b"%s_matrix_indptr" % type_str] - shape.attrs["type"] = [b"%s_matrix_shape" % type_str] - else: - h_sparsegroup.attrs["type"] = [bytes(str('%s_matrix' % type_str), 'ascii')] - data.attrs["type"] = [bytes(str("%s_matrix_data" % type_str), 'ascii')] - indices.attrs["type"] = [bytes(str("%s_matrix_indices" % type_str), 'ascii')] - indptr.attrs["type"] = [bytes(str("%s_matrix_indptr" % type_str), 'ascii')] - shape.attrs["type"] = [bytes(str("%s_matrix_shape" % type_str), 'ascii')] + NoneType = type(None) + h_sparsegroup.attrs['type'] = np.array(pickle.dumps(return_first)) + h_sparsegroup.attrs['base_type'] = ('%s_matrix' % type_str).encode('ascii') + indices.attrs['type'] = np.array(pickle.dumps(NoneType)) + indices.attrs['base_type'] =\ + ("%s_matrix_indices" % type_str).encode('ascii') + indptr.attrs['type'] = np.array(pickle.dumps(NoneType)) + indptr.attrs['base_type'] = ("%s_matrix_indptr" % type_str).encode('ascii') + shape.attrs['type'] = np.array(pickle.dumps(NoneType)) + shape.attrs['base_type'] = ("%s_matrix_shape" % type_str).encode('ascii') + + return(data) -def load_sparse_matrix_data(h_node): - py_type, data = get_type_and_data(h_node) - h_root = h_node.parent +def load_sparse_matrix_data(h_node): + _, base_type, data = get_type_and_data(h_node) + h_root = h_node.parent indices = h_root.get('indices')[:] - indptr = h_root.get('indptr')[:] - shape = h_root.get('shape')[:] - - if py_type == b'csc_matrix_data': - smat = sparse.csc_matrix((data, indices, indptr), dtype=data.dtype, shape=shape) - elif py_type == b'csr_matrix_data': - smat = sparse.csr_matrix((data, indices, indptr), dtype=data.dtype, shape=shape) - elif py_type == b'bsr_matrix_data': - smat = sparse.bsr_matrix((data, indices, indptr), dtype=data.dtype, shape=shape) + indptr = h_root.get('indptr')[:] + shape = h_root.get('shape')[:] + + if base_type == b'csc_matrix': + smat = sparse.csc_matrix((data, indices, indptr), dtype=data.dtype, + shape=shape) + elif base_type == b'csr_matrix': + smat = sparse.csr_matrix((data, indices, indptr), dtype=data.dtype, + shape=shape) + elif base_type == b'bsr_matrix': + smat = sparse.bsr_matrix((data, indices, indptr), dtype=data.dtype, + shape=shape) return smat - - - +# %% REGISTERS class_register = [ - [scipy.sparse.csr_matrix, b'csr_matrix_data', create_sparse_dataset, load_sparse_matrix_data, False, check_is_scipy_sparse_array], - [scipy.sparse.csc_matrix, b'csc_matrix_data', create_sparse_dataset, load_sparse_matrix_data, False, check_is_scipy_sparse_array], - [scipy.sparse.bsr_matrix, b'bsr_matrix_data', create_sparse_dataset, load_sparse_matrix_data, False, check_is_scipy_sparse_array], + [scipy.sparse.csr_matrix, b'csr_matrix', create_sparse_dataset, + load_sparse_matrix_data, check_is_scipy_sparse_array, False], + [scipy.sparse.csc_matrix, b'csc_matrix', create_sparse_dataset, + load_sparse_matrix_data, check_is_scipy_sparse_array, False], + [scipy.sparse.bsr_matrix, b'bsr_matrix', create_sparse_dataset, + load_sparse_matrix_data, check_is_scipy_sparse_array, False], ] exclude_register = [] @@ -87,6 +107,5 @@ def load_sparse_matrix_data(h_node): for mat_type in ('csr', 'csc', 'bsr'): for attrib in ('indices', 'indptr', 'shape'): hkl_key = "%s_matrix_%s" % (mat_type, attrib) - if not six.PY2: - hkl_key = hkl_key.encode('ascii') + hkl_key = hkl_key.encode('ascii') exclude_register.append(hkl_key) diff --git a/hickle/lookup.py b/hickle/lookup.py index 99d13df9..c3cca81c 100644 --- a/hickle/lookup.py +++ b/hickle/lookup.py @@ -1,238 +1,187 @@ """ #lookup.py -This file contains all the mappings between hickle/HDF5 metadata and python types. -There are four dictionaries and one set that are populated here: +This file contains all the mappings between hickle/HDF5 metadata and python +types. +There are three dictionaries that are populated here: 1) types_dict -types_dict: mapping between python types and dataset creation functions, e.g. +Mapping between python types and dataset creation functions, e.g. types_dict = { - list: create_listlike_dataset, - int: create_python_dtype_dataset, - np.ndarray: create_np_array_dataset + list: (create_listlike_dataset, 'list'), + int: (create_python_dtype_dataset, 'int'), + np.ndarray: (create_np_array_dataset, 'ndarray'), } 2) hkl_types_dict -hkl_types_dict: mapping between hickle metadata and dataset loading functions, e.g. +Mapping between hickle metadata and dataset loading functions, e.g. hkl_types_dict = { - "" : load_list_dataset, - "" : load_tuple_dataset + 'list': load_list_dataset, + 'tuple': load_tuple_dataset } -3) container_types_dict -container_types_dict: mapping required to convert the PyContainer object in hickle.py - back into the required native type. PyContainer is required as - some iterable types are immutable (do not have an append() function). - Here is an example: - container_types_dict = { - "": list, - "": tuple - } +3) dict_key_types_dict +Mapping specifically for converting hickled dict data back into a dictionary +with the same key type. While python dictionary keys can be any hashable +object, in HDF5 a unicode/string is required for a dataset name. -4) container_key_types_dict -container_key_types_dict: mapping specifically for converting hickled dict data back into - a dictionary with the same key type. While python dictionary keys - can be any hashable object, in HDF5 a unicode/string is required - for a dataset name. Example: - container_key_types_dict = { - "": str, - "": unicode +Example: + dict_key_types_dict = { + 'str': literal_eval, + 'float': float } -5) types_not_to_sort -type_not_to_sort is a list of hickle type attributes that may be hierarchical, -but don't require sorting by integer index. - ## Extending hickle to add support for other classes and types The process to add new load/dump capabilities is as follows: 1) Create a file called load_[newstuff].py in loaders/ -2) In the load_[newstuff].py file, define your create_dataset and load_dataset functions, - along with all required mapping dictionaries. -3) Add an import call here, and populate the lookup dictionaries with update() calls: - # Add loaders for [newstuff] - try: - from .loaders.load_[newstuff[ import types_dict as ns_types_dict - from .loaders.load_[newstuff[ import hkl_types_dict as ns_hkl_types_dict - types_dict.update(ns_types_dict) - hkl_types_dict.update(ns_hkl_types_dict) - ... (Add container_types_dict etc if required) - except ImportError: - raise +2) In the load_[newstuff].py file, define your create_dataset and load_dataset + functions, along with the 'class_register' and 'exclude_register' lists. + """ -import six -from ast import literal_eval -def return_first(x): - """ Return first element of a list """ - return x[0] +# %% IMPORTS +# Built-in imports +from ast import literal_eval +from importlib import import_module +from inspect import isclass +from itertools import starmap -def load_nothing(h_hode): - pass +# %% GLOBALS +# Define dict of all acceptable types types_dict = {} +# Define dict of all acceptable hickle types hkl_types_dict = {} -types_not_to_sort = [b'dict', b'csr_matrix', b'csc_matrix', b'bsr_matrix'] - -container_types_dict = { - b"": list, - b"": tuple, - b"": set, - b"": list, - b"": tuple, - b"": set, - b"csr_matrix": return_first, - b"csc_matrix": return_first, - b"bsr_matrix": return_first - } +# Define list of types that should never be sorted +types_not_to_sort = [] -# Technically, any hashable object can be used, for now sticking with built-in types -container_key_types_dict = { - b"": literal_eval, - b"": float, - b"": bool, - b"": int, - b"": complex, - b"": literal_eval, - b"": literal_eval, - b"": float, - b"": bool, - b"": int, - b"": complex, - b"": literal_eval - } +# Empty list of loaded loader names +loaded_loaders = [] -if six.PY2: - container_key_types_dict[b""] = literal_eval - container_key_types_dict[b""] = long +# Define dict containing validation functions for ndarray-like objects +ndarray_like_check_fns = {} -# Add loaders for built-in python types -if six.PY2: - from .loaders.load_python import types_dict as py_types_dict - from .loaders.load_python import hkl_types_dict as py_hkl_types_dict -else: - from .loaders.load_python3 import types_dict as py_types_dict - from .loaders.load_python3 import hkl_types_dict as py_hkl_types_dict +# Define conversion dict of all acceptable dict key types +dict_key_types_dict = { + b'str': literal_eval, + b'float': float, + b'bool': bool, + b'int': int, + b'complex': complex, + b'tuple': literal_eval, + b'NoneType': literal_eval, + } -types_dict.update(py_types_dict) -hkl_types_dict.update(py_hkl_types_dict) -# Add loaders for numpy types -from .loaders.load_numpy import types_dict as np_types_dict -from .loaders.load_numpy import hkl_types_dict as np_hkl_types_dict -from .loaders.load_numpy import check_is_numpy_array -types_dict.update(np_types_dict) -hkl_types_dict.update(np_hkl_types_dict) +# %% FUNCTION DEFINITIONS +def load_nothing(h_node): + pass -####################### -## ND-ARRAY checking ## -####################### -ndarray_like_check_fns = [ - check_is_numpy_array -] +##################### +# ND-ARRAY checking # +##################### def check_is_ndarray_like(py_obj): - is_ndarray_like = False - for ii, check_fn in enumerate(ndarray_like_check_fns): - is_ndarray_like = check_fn(py_obj) - if is_ndarray_like: - break - return is_ndarray_like + # Obtain the MRO of this object + mro_list = py_obj.__class__.mro() + # Create a function map + func_map = map(ndarray_like_check_fns.get, mro_list) + # Loop over the entire func_map until something else than None is found + for func_item in func_map: + if func_item is not None: + return(func_item(py_obj)) + # If that did not happen, then py_obj is not ndarray_like + else: + return(False) -####################### -## loading optional ## -####################### +##################### +# loading optional # +##################### +# This function registers a class to be used by hickle def register_class(myclass_type, hkl_str, dump_function, load_function, - to_sort=True, ndarray_check_fn=None): + ndarray_check_fn=None, to_sort=True): """ Register a new hickle class. Args: myclass_type type(class): type of class + hkl_str (str): String to write to HDF5 file to describe class dump_function (function def): function to write data to HDF5 load_function (function def): function to load data from HDF5 - is_iterable (bool): Is the item iterable? - hkl_str (str): String to write to HDF5 file to describe class - to_sort (bool): If the item is iterable, does it require sorting? ndarray_check_fn (function def): function to use to check if + to_sort (bool): If the item is iterable, does it require sorting? """ - types_dict.update({myclass_type: dump_function}) - hkl_types_dict.update({hkl_str: load_function}) - if to_sort == False: + types_dict[myclass_type] = (dump_function, hkl_str) + hkl_types_dict[hkl_str] = load_function + if not to_sort: types_not_to_sort.append(hkl_str) if ndarray_check_fn is not None: - ndarray_like_check_fns.append(ndarray_check_fn) - -def register_class_list(class_list): - """ Register multiple classes in a list - - Args: - class_list (list): A list, where each item is an argument to - the register_class() function. + ndarray_like_check_fns[myclass_type] = ndarray_check_fn - Notes: This just runs the code: - for item in mylist: - register_class(*item) - """ - for class_item in class_list: - register_class(*class_item) def register_class_exclude(hkl_str_to_ignore): - """ Tell loading funciton to ignore any HDF5 dataset with attribute 'type=XYZ' + """ Tell loading funciton to ignore any HDF5 dataset with attribute + 'type=XYZ' Args: - hkl_str_to_ignore (str): attribute type=string to ignore and exclude from loading. + hkl_str_to_ignore (str): attribute type=string to ignore and exclude + from loading. """ hkl_types_dict[hkl_str_to_ignore] = load_nothing -def register_exclude_list(exclude_list): - """ Ignore HDF5 datasets with attribute type='XYZ' from loading - ArgsL - exclude_list (list): List of strings, which correspond to hdf5/hickle - type= attributes not to load. +# This function checks if an additional loader is required for given py_obj +def load_loader(py_obj): """ - for hkl_str in exclude_list: - register_class_exclude(hkl_str) - -######################## -## Scipy sparse array ## -######################## - -try: - from .loaders.load_scipy import class_register, exclude_register - register_class_list(class_register) - register_exclude_list(exclude_register) -except ImportError: - pass -except NameError: - pass - -#################### -## Astropy stuff ## -#################### - -try: - from .loaders.load_astropy import class_register - register_class_list(class_register) -except ImportError: - pass + Checks if given `py_obj` requires an additional loader to be handled + properly and loads it if so. -################## -## Pandas stuff ## -################## + """ -try: - from .loaders.load_pandas import class_register - register_class_list(class_register) -except ImportError: - pass + # Obtain the MRO of this object + if isclass(py_obj): + mro_list = py_obj.mro() + else: + mro_list = py_obj.__class__.mro() + + # Loop over the entire mro_list + for mro_item in mro_list: + # Check if mro_item can be found in types_dict and return if so + if mro_item in types_dict: + return + + # Obtain the package name of mro_item + pkg_name = mro_item.__module__.split('.')[0] + + # Obtain the name of the associated loader + loader_name = 'hickle.loaders.load_%s' % (pkg_name) + + # Check if this module is already loaded, and return if so + if loader_name in loaded_loaders: + return + + # Try to load a loader with this name + try: + loader = import_module(loader_name) + # If any module is not found, catch error and check it + except ImportError as error: + # Check if the error was due to a package in loader not being found + if 'hickle' not in error.args[0]: # pragma: no cover + # If so, reraise the error + raise + # If such a loader does exist, register classes and return + else: + list(starmap(register_class, loader.class_register)) + list(map(register_class_exclude, loader.exclude_register)) + loaded_loaders.append(loader_name) + return diff --git a/tests/__init__.py b/hickle/tests/__init__.py similarity index 100% rename from tests/__init__.py rename to hickle/tests/__init__.py diff --git a/tests/legacy_hkls/generate_test_hickle.py b/hickle/tests/legacy_hkls/generate_test_hickle.py similarity index 100% rename from tests/legacy_hkls/generate_test_hickle.py rename to hickle/tests/legacy_hkls/generate_test_hickle.py diff --git a/tests/legacy_hkls/hickle_2_0_5.hkl b/hickle/tests/legacy_hkls/hickle_3_4_8.hkl similarity index 67% rename from tests/legacy_hkls/hickle_2_0_5.hkl rename to hickle/tests/legacy_hkls/hickle_3_4_8.hkl index bedd2e4c53f8bea1c19071871f35e3a3964e3912..976ced42ba35dd494bcd0dc791307c37aff46fa8 100644 GIT binary patch literal 16504 zcmeHO%}-N75Z?zP1wrvE0gd7L5zC1XA&Lg22CFq8K!6xM)j}(;0Scidp*N3s^N9Wh z965UQ$kBN6$e*BpfzIsAyeCUvKcvKz?gqNM^XAR${@(8FzL|X=63?D@cV6pMm?9B% zM)l~T{QTy$6(fq&<@#GLIGk^DzS}|tN3|=w*30xBmhZL_VTNyAaEg4%`8Hob z#3$H7Ga-WQB9B*dTz9D$%oWsgR$WmM zH8J^gZVu{wpcvbR|I`K=vC^KiKnHf6kQT^^XgROqvvV(}r$CNVR!$pAU^xgumm1~v zv;nzS6^iQ_WVsAVP^Sa(FiATk_p*!2E5O;M5`6y!-Hw?z^NH!Hq%^0d_R!RpC#2Q- zKMrU?-KISlYA2TW%vs4E99$oDJoiRnq*SDvoYZ-G(lgP2)ucRt{!mypEB|LCJY?P`F%UEA0crU z^!FySh0H4VCy2jv==jZYrkIo%4zgez2Ysnw^2EQIFKmz-nd4S0-p?c+D*!t@GQ#Ax zLTm6@pnmKRdbhFO($a#NZY$=kjvBz6xTu7`mw*EkGB1DcPnL^Qp-(uh9Sg-mCXa$3 zPF2rju75w-k6uq5%^<7N`uF7yp&<6988ESZ3=jxi|9IUI&U&wZzTI|7gRFCnXSX}a zWNWL7_B61o*t1r!_jsuX$h3ubkzB@JcMNC-=B9EfdoAZ2j>a{+;HEHpyh`1R@vrrqAXUEO__>o5YrzN1H<`{3u6-&~+aSLr;9j_=Y?KYt*@@EU1GaeoEI12}6JFNaOu-+X0Q~$3 z)30$fBpC3=;QrrbsM=%^un1TLECLn*i$LQeQ0cvWXqfuQTfD~au*N4fK7VUIbv2 znDHd}J|M_z^;2GD1C8FsAl0%?RP+7CN;c1*9Z1A7&{a#xTpwb`6IYAsQ^gzRFMR!& zy1CE2tz_Sa#);9|_fs`Is_smfeOEk=M9@juck`TsweRM+ftyv@_wesq@u*5z94eQF a@A*Rdt z{sNEw1djX!jvhVY&5MQ8+ufVWgv~-83TCDt^XloD`L??=)4lnW9Dmi_b-hbKGEH$# zMBO5P98%hQi7i~`{&^}Gly6i1HG~C2bO`8+QG1m5TVZVYMeT2ACnkiXWd58}x5vQr z8g!wZ5J*mp&cJ@>XwxjEdn9L!MK)8+q~jn-n3CL4XWm2ds>*dlB&4puJRRbiFvZy9 zXeuS^y>en$8~o!MP-1yK9kN}Ly9wGNB2IivSvJMHiP_Yf=_wJFG)R{4XraK{jff!{ zcV5!PL~(012mMziQ<%dra7Lb&bK6<`xa3l@M#8A-WJvBXhN)O@_uUN$u}MRM^wz$C zg#5g!XVNYIGXkpK4$)>yBuBjWOx_hrQP@~shqmlWKE1G(%Vmk6kXc)UqMdW{c9C^U z2#fmFjP*7FN-EPl%~wlrTz&S@o2;`aXNANXfi?BhPj9~6M9LBkP;p`X*y*g~*lc@N z$z6!oW7CtaZJ-6Po#4mA_M&Weki#$-9IM|c6#Lzi0ezi|OEdxXe^^?{5CwMZTvk`0 z>ly)#fJQ(gpb^jrXaqC@8Uc-fMnEH=5zq*<8iCZ*^td5CedLl9G4hwNA3*h=w0b>f zm4A>QG|`7UW1P?GKJ~imLjd&*eD#=S$j`@mmh8p$O=@Q^UX zIaZnOfa=lbNH9Qj@yA$Tx;d(EKEZbOLkb$ebOTgR-X%u=C}Y2_idQ)vQymYvAlK>_ z4*|b!+9~Fiz5OuqUPDE}5M3D8Az;{YGEQL``GK+;zkyXjdngogNg-wfDQpPKEqIcF zwJTHy;@_FR_qFyHku-k$Tiq(!D}{8H<3k+B*}jeQ*qF&rmL#v?SOxX_BYx{x!!(hC zKtv=do)>lFiS*poHx}^G1;rIX23Um9SMO=EGE-$01OMfC3j6ulpFd_h_C`kjxl${~ zssvv?L;5*V$yV1i0vZ90fJQ(gpb89PuuJFs60JSxemip?3C&ZySG}~>A>j8u zi>=%joBciu-v`zEX|4NGbMJ$l%6$;;b$lP=b96aB>HW~O@TpbaOHXn??3N68J$#Jk zoZiz}a@P4ZouP(=M7<9t?lr`&Z3p|=%}#vkO#iWGV>;sRJ*)ipS!OK zD>>UN{Os}p`OizVy&$_>dWMABE}w0AsCV^#c6o>7i21h4E~{~$(_)v)=VV0SLxjwhq)6a24SuS2dG%?74BK?LpIzRj_-Bc539`$v7f7h> za(fDrpIwfV95E`bvde1R{_ol4ptyF-^>g%CBNS##h{20u@Q@6)|9N@TLd+^Ps`|Xl yzo)E@-`5l0o~Pd{+x)=zg$A!)yP(=-bGSCW?mZfTM&sJ<@5xAXP}Gi&YySZ)%Vz=r diff --git a/tests/test_astropy.py b/hickle/tests/test_astropy.py similarity index 68% rename from tests/test_astropy.py rename to hickle/tests/test_astropy.py index 2086ec37..4c2caf96 100644 --- a/tests/test_astropy.py +++ b/hickle/tests/test_astropy.py @@ -1,17 +1,22 @@ -import hickle as hkl +# %% IMPORTS +# Package imports from astropy.units import Quantity from astropy.time import Time from astropy.coordinates import Angle, SkyCoord -from astropy.constants import Constant, EMConstant, G +import astropy.constants as apc from astropy.table import Table import numpy as np from py.path import local +# hickle imports +import hickle as hkl + # Set the current working directory to the temporary directory local.get_temproot().chdir() -def test_astropy_quantity(): +# %% FUNCTION DEFINITIONS +def test_astropy_quantity(): for uu in ['m^3', 'm^3 / s', 'kg/pc']: a = Quantity(7, unit=uu) @@ -27,12 +32,16 @@ def test_astropy_quantity(): assert a == b assert a.unit == b.unit -def TODO_test_astropy_constant(): - hkl.dump(G, "test_ap.h5") - gg = hkl.load("test_ap.h5") - print(G) - print(gg) +def test_astropy_constant(): + hkl.dump(apc.G, "test_ap.h5") + gg = hkl.load("test_ap.h5") + assert gg == apc.G + + hkl.dump(apc.cgs.e, 'test_ap.h5') + ee = hkl.load('test_ap.h5') + assert ee == apc.cgs.e + def test_astropy_table(): t = Table([[1, 2], [3, 4]], names=('a', 'b'), meta={'name': 'test_thing'}) @@ -52,8 +61,9 @@ def test_astropy_table(): assert np.allclose(t['a'].astype('float32'), t2['a'].astype('float32')) assert np.allclose(t['b'].astype('float32'), t2['b'].astype('float32')) + def test_astropy_quantity_array(): - a = Quantity([1,2,3], unit='m') + a = Quantity([1, 2, 3], unit='m') hkl.dump(a, "test_ap.h5") b = hkl.load("test_ap.h5") @@ -61,6 +71,7 @@ def test_astropy_quantity_array(): assert np.allclose(a.value, b.value) assert a.unit == b.unit + def test_astropy_time_array(): times = ['1999-01-01T00:00:00.123456789', '2010-01-01T00:00:00'] t1 = Time(times, format='isot', scale='utc') @@ -87,6 +98,7 @@ def test_astropy_time_array(): assert t1.format == t2.format assert t1.scale == t2.scale + def test_astropy_angle(): for uu in ['radian', 'degree']: a = Angle(1.02, unit=uu) @@ -96,8 +108,9 @@ def test_astropy_angle(): assert a == b assert a.unit == b.unit + def test_astropy_angle_array(): - a = Angle([1,2,3], unit='degree') + a = Angle([1, 2, 3], unit='degree') hkl.dump(a, "test_ap.h5") b = hkl.load("test_ap.h5") @@ -105,29 +118,55 @@ def test_astropy_angle_array(): assert np.allclose(a.value, b.value) assert a.unit == b.unit + def test_astropy_skycoord(): - ra = Angle(['1d20m', '1d21m'], unit='degree') - dec = Angle(['33d0m0s', '33d01m'], unit='degree') + ra = Angle('1d20m', unit='degree') + dec = Angle('33d0m0s', unit='degree') + radec = SkyCoord(ra, dec) + hkl.dump(radec, "test_ap.h5") + radec2 = hkl.load("test_ap.h5") + assert radec.ra == radec2.ra + assert radec.dec == radec2.dec + + ra = Angle('1d20m', unit='hourangle') + dec = Angle('33d0m0s', unit='degree') + radec = SkyCoord(ra, dec) + hkl.dump(radec, "test_ap.h5") + radec2 = hkl.load("test_ap.h5") + assert radec.ra == radec2.ra + assert radec.dec == radec2.dec + + +def test_astropy_skycoord_array(): + ra = Angle(['1d20m', '0d21m'], unit='degree') + dec = Angle(['33d0m0s', '-33d01m'], unit='degree') radec = SkyCoord(ra, dec) hkl.dump(radec, "test_ap.h5") radec2 = hkl.load("test_ap.h5") assert np.allclose(radec.ra.value, radec2.ra.value) assert np.allclose(radec.dec.value, radec2.dec.value) + assert radec.ra.shape == radec2.ra.shape + assert radec.dec.shape == radec2.dec.shape - ra = Angle(['1d20m', '1d21m'], unit='hourangle') - dec = Angle(['33d0m0s', '33d01m'], unit='degree') + ra = Angle([['1d20m', '0d21m'], ['1d20m', '0d21m']], unit='hourangle') + dec = Angle([['33d0m0s', '33d01m'], ['33d0m0s', '33d01m']], unit='degree') radec = SkyCoord(ra, dec) hkl.dump(radec, "test_ap.h5") radec2 = hkl.load("test_ap.h5") assert np.allclose(radec.ra.value, radec2.ra.value) assert np.allclose(radec.dec.value, radec2.dec.value) + assert radec.ra.shape == radec2.ra.shape + assert radec.dec.shape == radec2.dec.shape + +# %% MAIN SCRIPT if __name__ == "__main__": test_astropy_quantity() - #test_astropy_constant() + test_astropy_constant() test_astropy_table() test_astropy_quantity_array() test_astropy_time_array() test_astropy_angle() test_astropy_angle_array() test_astropy_skycoord() + test_astropy_skycoord_array() diff --git a/tests/test_hickle.py b/hickle/tests/test_hickle.py similarity index 56% rename from tests/test_hickle.py rename to hickle/tests/test_hickle.py index 54910542..5ecfb3db 100644 --- a/tests/test_hickle.py +++ b/hickle/tests/test_hickle.py @@ -7,23 +7,27 @@ """ -import h5py -import hashlib -import numpy as np + +# %% IMPORTS +# Built-in imports +from collections import OrderedDict as odict import os -import six -import time from pprint import pprint +# Package imports +import h5py +import numpy as np from py.path import local +import pytest -import hickle -from hickle.hickle import * - +# hickle imports +from hickle import dump, helpers, hickle, load, loaders # Set current working directory to the temporary directory local.get_temproot().chdir() + +# %% GLOBALS NESTED_DICT = { "level1_1": { "level2_1": [1, 2, 3], @@ -42,53 +46,113 @@ } } -DUMP_CACHE = [] # Used in test_track_times() + +# %% HELPER DEFINITIONS +# Define a test function that must be serialized and unpacked again +def func(a, b, c=0): + return(a, b, c) + + +# Define a class that must always be pickled +class with_state(object): + def __init__(self): + self.a = 12 + self.b = { + 'love': np.ones([12, 7]), + 'hatred': np.zeros([4, 9])} + + def __getstate__(self): + self.a *= 2 + return({ + 'a': self.a, + 'b': self.b}) + + def __setstate__(self, state): + self.a = state['a'] + self.b = state['b'] + + def __getitem__(self, index): + if(index == 0): + return(self.a) + if(index < 2): + return(self.b['hatred']) + if(index > 2): + raise ValueError("index unknown") + return(self.b['love']) + + +# %% FUNCTION DEFINITIONS +def test_invalid_file(): + """ Test if trying to use a non-file object fails. """ + + with pytest.raises(hickle.FileError): + dump('test', ()) + + +def test_state_obj(): + """ Dumping and loading a class object with pickle states + + https://github.com/telegraphic/hickle/issues/125""" + filename, mode = 'test.h5', 'w' + obj = with_state() + with pytest.warns(loaders.load_builtins.SerializedWarning): + dump(obj, filename, mode) + obj_hkl = load(filename) + assert type(obj) == type(obj_hkl) + assert np.allclose(obj[1], obj_hkl[1]) + + +def test_local_func(): + """ Dumping and loading a local function + + https://github.com/telegraphic/hickle/issues/119""" + filename, mode = 'test.h5', 'w' + with pytest.warns(loaders.load_builtins.SerializedWarning): + dump(func, filename, mode) + func_hkl = load(filename) + assert type(func) == type(func_hkl) + assert func(1, 2) == func_hkl(1, 2) + + +def test_binary_file(): + """ Test if using a binary file works + + https://github.com/telegraphic/hickle/issues/123""" + + with open("test.hdf5", "w") as f: + hickle.dump(None, f) + + with open("test.hdf5", "wb") as f: + hickle.dump(None, f) + + +def test_non_empty_group(): + """ Test if attempting to dump to a group with data fails """ + + hickle.dump(None, 'test.hdf5') + with pytest.raises(ValueError): + dump(None, 'test.hdf5', 'r+') def test_string(): """ Dumping and loading a string """ - if six.PY2: - filename, mode = 'test.h5', 'w' - string_obj = "The quick brown fox jumps over the lazy dog" - dump(string_obj, filename, mode) - string_hkl = load(filename) - #print "Initial list: %s"%list_obj - #print "Unhickled data: %s"%list_hkl - assert type(string_obj) == type(string_hkl) == str - assert string_obj == string_hkl - else: - pass - - -def test_unicode(): - """ Dumping and loading a unicode string """ - if six.PY2: - filename, mode = 'test.h5', 'w' - u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) - dump(u, filename, mode) - u_hkl = load(filename) - - assert type(u) == type(u_hkl) == unicode - assert u == u_hkl - # For those interested, uncomment below to see what those codes are: - # for i, c in enumerate(u_hkl): - # print i, '%04x' % ord(c), unicodedata.category(c), - # print unicodedata.name(c) - else: - pass - - -def test_unicode2(): - if six.PY2: - a = u"unicode test" - dump(a, 'test.hkl', mode='w') - - z = load('test.hkl') - assert a == z - assert type(a) == type(z) == unicode - pprint(z) - else: - pass + filename, mode = 'test.h5', 'w' + string_obj = "The quick brown fox jumps over the lazy dog" + dump(string_obj, filename, mode) + string_hkl = load(filename) + assert isinstance(string_hkl, str) + assert string_obj == string_hkl + + +def test_65bit_int(): + """ Dumping and loading an integer with arbitrary precision + + https://github.com/telegraphic/hickle/issues/113""" + i = 2**65-1 + dump(i, 'test.hdf5') + i_hkl = load('test.hdf5') + assert i == i_hkl + def test_list(): """ Dumping and loading a list """ @@ -96,20 +160,18 @@ def test_list(): list_obj = [1, 2, 3, 4, 5] dump(list_obj, filename, mode=mode) list_hkl = load(filename) - #print(f'Initial list: {list_obj}') - #print(f'Unhickled data: {list_hkl}') try: - assert type(list_obj) == type(list_hkl) == list + assert isinstance(list_hkl, list) assert list_obj == list_hkl import h5py - a = h5py.File(filename) + a = h5py.File(filename, 'r') a.close() except AssertionError: print("ERR:", list_obj, list_hkl) import h5py - raise() + raise def test_set(): @@ -118,15 +180,12 @@ def test_set(): list_obj = set([1, 0, 3, 4.5, 11.2]) dump(list_obj, filename, mode) list_hkl = load(filename) - #print "Initial list: %s"%list_obj - #print "Unhickled data: %s"%list_hkl try: - assert type(list_obj) == type(list_hkl) == set + assert isinstance(list_hkl, set) assert list_obj == list_hkl except AssertionError: print(type(list_obj)) print(type(list_hkl)) - #os.remove(filename) raise @@ -151,7 +210,7 @@ def test_numpy(): def test_masked(): """ Test masked numpy array """ filename, mode = 'test.h5', 'w' - a = np.ma.array([1,2,3,4], dtype='float32', mask=[0,1,0,0]) + a = np.ma.array([1, 2, 3, 4], dtype='float32', mask=[0, 1, 0, 0]) dump(a, filename, mode) a_hkl = load(filename) @@ -165,21 +224,57 @@ def test_masked(): raise +def test_object_numpy(): + """ Dumping and loading a NumPy array containing non-NumPy objects. + + https://github.com/telegraphic/hickle/issues/90""" + + arr = np.array([[NESTED_DICT], ('What is this?',), {1, 2, 3, 7, 1}]) + dump(arr, 'test.hdf5') + arr_hkl = load('test.hdf5') + assert np.all(arr == arr_hkl) + + arr2 = np.array(NESTED_DICT) + dump(arr2, 'test.hdf5') + arr_hkl2 = load('test.hdf5') + assert np.all(arr2 == arr_hkl2) + + +def test_string_numpy(): + """ Dumping and loading NumPy arrays containing Python 3 strings. """ + + arr = np.array(["1313e", "was", "maybe?", "here"]) + dump(arr, 'test.hdf5') + arr_hkl = load('test.hdf5') + assert np.all(arr == arr_hkl) + + +def test_list_object_numpy(): + """ Dumping and loading a list of NumPy arrays with objects. + + https://github.com/telegraphic/hickle/issues/90""" + + lst = [np.array(NESTED_DICT), np.array([('What is this?',), + {1, 2, 3, 7, 1}])] + dump(lst, 'test.hdf5') + lst_hkl = load('test.hdf5') + assert np.all(lst[0] == lst_hkl[0]) + assert np.all(lst[1] == lst_hkl[1]) + + def test_dict(): """ Test dictionary dumping and loading """ filename, mode = 'test.h5', 'w' dd = { - 'name' : b'Danny', - 'age' : 28, - 'height' : 6.1, - 'dork' : True, - 'nums' : [1, 2, 3], - 'narr' : np.array([1,2,3]), - #'unic' : u'dan[at]thetelegraphic.com' + 'name': b'Danny', + 'age': 28, + 'height': 6.1, + 'dork': True, + 'nums': [1, 2, 3], + 'narr': np.array([1, 2, 3]), } - dump(dd, filename, mode) dd_hkl = load(filename) @@ -187,12 +282,11 @@ def test_dict(): try: assert k in dd_hkl.keys() - if type(dd[k]) is type(np.array([1])): + if isinstance(dd[k], np.ndarray): assert np.all((dd[k], dd_hkl[k])) else: - #assert dd_hkl[k] == dd[k] pass - assert type(dd_hkl[k]) == type(dd[k]) + assert isinstance(dd_hkl[k], dd[k].__class__) except AssertionError: print(k) print(dd_hkl[k]) @@ -201,8 +295,26 @@ def test_dict(): raise +def test_odict(): + """ Test ordered dictionary dumping and loading + + https://github.com/telegraphic/hickle/issues/65""" + filename, mode = 'test.hdf5', 'w' + + od = odict(((3, [3, 0.1]), (7, [5, 0.1]), (5, [3, 0.1]))) + dump(od, filename, mode) + od_hkl = load(filename) + + assert od.keys() == od_hkl.keys() + + for od_item, od_hkl_item in zip(od.items(), od_hkl.items()): + assert od_item == od_hkl_item + + def test_empty_dict(): - """ Test empty dictionary dumping and loading """ + """ Test empty dictionary dumping and loading + + https://github.com/telegraphic/hickle/issues/91""" filename, mode = 'test.h5', 'w' dump({}, filename, mode) @@ -242,7 +354,7 @@ def test_dict_int_key(): } dump(dd, filename, mode) - dd_hkl = load(filename) + load(filename) def test_dict_nested(): @@ -255,7 +367,7 @@ def test_dict_nested(): dd_hkl = load(filename) ll_hkl = dd_hkl["level1_3"]["level2_1"]["level3_1"] - ll = dd["level1_3"]["level2_1"]["level3_1"] + ll = dd["level1_3"]["level2_1"]["level3_1"] assert ll == ll_hkl @@ -265,8 +377,8 @@ def test_masked_dict(): filename, mode = 'test.h5', 'w' dd = { - "data" : np.ma.array([1,2,3], mask=[True, False, False]), - "data2" : np.array([1,2,3,4,5]) + "data": np.ma.array([1, 2, 3], mask=[True, False, False]), + "data2": np.array([1, 2, 3, 4, 5]) } dump(dd, filename, mode) @@ -275,15 +387,15 @@ def test_masked_dict(): for k in dd.keys(): try: assert k in dd_hkl.keys() - if type(dd[k]) is type(np.array([1])): + if isinstance(dd[k], np.ndarray): assert np.all((dd[k], dd_hkl[k])) - elif type(dd[k]) is type(np.ma.array([1])): + elif isinstance(dd[k], np.ma.MaskedArray): print(dd[k].data) print(dd_hkl[k].data) assert np.allclose(dd[k].data, dd_hkl[k].data) assert np.allclose(dd[k].mask, dd_hkl[k].mask) - assert type(dd_hkl[k]) == type(dd[k]) + assert isinstance(dd_hkl[k], dd[k].__class__) except AssertionError: print(k) @@ -321,43 +433,6 @@ def test_np_float(): assert dd[str(dt)] == dd_hkl[str(dt)] -def md5sum(filename, blocksize=65536): - """ Compute MD5 sum for a given file """ - hash = hashlib.md5() - - with open(filename, "r+b") as f: - for block in iter(lambda: f.read(blocksize), ""): - hash.update(block) - return hash.hexdigest() - - -def caching_dump(obj, filename, *args, **kwargs): - """ Save arguments of all dump calls """ - DUMP_CACHE.append((obj, filename, args, kwargs)) - return hickle_dump(obj, filename, *args, **kwargs) - - -def test_track_times(): - """ Verify that track_times = False produces identical files """ - hashes = [] - for obj, filename, mode, kwargs in DUMP_CACHE: - if isinstance(filename, hickle.H5FileWrapper): - filename = str(filename.file_name) - kwargs['track_times'] = False - caching_dump(obj, filename, mode, **kwargs) - hashes.append(md5sum(filename)) - - time.sleep(1) - - for hash1, (obj, filename, mode, kwargs) in zip(hashes, DUMP_CACHE): - if isinstance(filename, hickle.H5FileWrapper): - filename = str(filename.file_name) - caching_dump(obj, filename, mode, **kwargs) - hash2 = md5sum(filename) - print(hash1, hash2) - assert hash1 == hash2 - - def test_comp_kwargs(): """ Test compression with some kwargs for shuffle and chunking """ @@ -375,17 +450,16 @@ def test_comp_kwargs(): for sh in shuffles: for so in scaleoffsets: kwargs = { - 'compression' : cc, + 'compression': cc, 'dtype': dt, 'chunks': ch, 'shuffle': sh, 'scaleoffset': so } - #array_obj = np.random.random_integers(low=-8192, high=8192, size=(1000, 1000)).astype(dt) array_obj = NESTED_DICT dump(array_obj, filename, mode, compression=cc) print(kwargs, os.path.getsize(filename)) - array_hkl = load(filename) + load(filename) def test_list_numpy(): @@ -424,34 +498,28 @@ def test_tuple_numpy(): assert isinstance(dd_hkl[0], np.ndarray) -def test_none(): - """ Test None type hickling """ +def test_numpy_dtype(): + """ Dumping and loading a NumPy dtype """ - filename, mode = 'test.h5', 'w' + dtype = np.dtype('float16') + dump(dtype, 'test.hdf5') + dtype_hkl = load('test.hdf5') + assert dtype == dtype_hkl - a = None - dump(a, filename, mode) - dd_hkl = load(filename) - print(a) - print(dd_hkl) - - assert isinstance(dd_hkl, type(None)) - - -def test_dict_none(): +def test_none(): """ Test None type hickling """ filename, mode = 'test.h5', 'w' - a = {'a': 1, 'b' : None} + a = None dump(a, filename, mode) dd_hkl = load(filename) print(a) print(dd_hkl) - assert isinstance(a['b'], type(None)) + assert isinstance(dd_hkl, type(None)) def test_file_open_close(): @@ -467,15 +535,42 @@ def test_file_open_close(): f.close() try: dump(a, f, mode='w') - except hickle.hickle.ClosedFileError: + except hickle.ClosedFileError: print("Tests: Closed file exception caught") +def test_hdf5_group(): + import h5py + file = h5py.File('test.hdf5', 'w') + group = file.create_group('test_group') + a = np.arange(5) + dump(a, group) + file.close() + + a_hkl = load('test.hdf5', path='/test_group') + assert np.allclose(a_hkl, a) + + file = h5py.File('test.hdf5', 'r+') + group = file.create_group('test_group2') + b = np.arange(8) + + dump(b, group, path='deeper/and_deeper') + file.close() + + b_hkl = load('test.hdf5', path='/test_group2/deeper/and_deeper') + assert np.allclose(b_hkl, b) + + file = h5py.File('test.hdf5', 'r') + b_hkl2 = load(file['test_group2'], path='deeper/and_deeper') + assert np.allclose(b_hkl2, b) + file.close() + + def test_list_order(): """ https://github.com/telegraphic/hickle/issues/26 """ d = [np.arange(n + 1) for n in range(20)] - hickle.dump(d, 'test.h5') - d_hkl = hickle.load('test.h5') + dump(d, 'test.h5') + d_hkl = load('test.h5') try: for ii, xx in enumerate(d): @@ -490,9 +585,10 @@ def test_list_order(): def test_embedded_array(): """ See https://github.com/telegraphic/hickle/issues/24 """ - d_orig = [[np.array([10., 20.]), np.array([10, 20, 30])], [np.array([10, 2]), np.array([1.])]] - hickle.dump(d_orig, 'test.h5') - d_hkl = hickle.load('test.h5') + d_orig = [[np.array([10., 20.]), np.array([10, 20, 30])], + [np.array([10, 2]), np.array([1.])]] + dump(d_orig, 'test.h5') + d_hkl = load('test.h5') for ii, xx in enumerate(d_orig): for jj, yy in enumerate(xx): @@ -502,20 +598,18 @@ def test_embedded_array(): print(d_orig) -################ -## NEW TESTS ## -################ - - +############## +# NEW TESTS # +############### def generate_nested(): a = [1, 2, 3] b = [a, a, a] c = [a, b, 's'] d = [a, b, c, c, a] e = [d, d, d, d, 1] - f = {'a' : a, 'b' : b, 'e' : e} - g = {'f' : f, 'a' : e, 'd': d} - h = {'h': g, 'g' : f} + f = {'a': a, 'b': b, 'e': e} + g = {'f': f, 'a': e, 'd': d} + h = {'h': g, 'g': f} z = [f, a, b, c, d, e, f, g, h, g, h] a = np.array([1, 2, 3, 4]) b = set([1, 2, 3, 4, 5]) @@ -529,23 +623,22 @@ def test_is_iterable(): a = [1, 2, 3] b = 1 - assert check_is_iterable(a) == True - assert check_is_iterable(b) == False + assert helpers.check_is_iterable(a) + assert not helpers.check_is_iterable(b) def test_check_iterable_item_type(): - a = [1, 2, 3] b = [a, a, a] c = [a, b, 's'] - type_a = check_iterable_item_type(a) - type_b = check_iterable_item_type(b) - type_c = check_iterable_item_type(c) + type_a = helpers.check_iterable_item_type(a) + type_b = helpers.check_iterable_item_type(b) + type_c = helpers.check_iterable_item_type(c) assert type_a is int assert type_b is list - assert type_c == False + assert not type_c def test_dump_nested(): @@ -555,26 +648,24 @@ def test_dump_nested(): dump(z, 'test.hkl', mode='w') -def test_with_dump(): +def test_with_open_file(): + """ + Testing dumping and loading to an open file + + https://github.com/telegraphic/hickle/issues/92""" + lst = [1] - tpl = (1) + tpl = (1,) dct = {1: 1} arr = np.array([1]) - with h5py.File('test.hkl') as file: + with h5py.File('test.hkl', 'w') as file: dump(lst, file, path='/lst') dump(tpl, file, path='/tpl') dump(dct, file, path='/dct') dump(arr, file, path='/arr') - -def test_with_load(): - lst = [1] - tpl = (1) - dct = {1: 1} - arr = np.array([1]) - - with h5py.File('test.hkl') as file: + with h5py.File('test.hkl', 'r') as file: assert load(file, '/lst') == lst assert load(file, '/tpl') == tpl assert load(file, '/dct') == dct @@ -582,7 +673,6 @@ def test_with_load(): def test_load(): - a = set([1, 2, 3, 4]) b = set([5, 6, 7, 8]) c = set([9, 10, 11, 12]) @@ -605,13 +695,12 @@ def test_sort_keys(): print(keys) print(keys_sorted) - assert sort_keys(keys) == keys_sorted + assert helpers.sort_keys(keys) == keys_sorted def test_ndarray(): - - a = np.array([1,2,3]) - b = np.array([2,3,4]) + a = np.array([1, 2, 3]) + b = np.array([2, 3, 4]) z = (a, b) print("Original:") @@ -624,9 +713,8 @@ def test_ndarray(): def test_ndarray_masked(): - - a = np.ma.array([1,2,3]) - b = np.ma.array([2,3,4], mask=[True, False, True]) + a = np.ma.array([1, 2, 3]) + b = np.ma.array([2, 3, 4], mask=[True, False, True]) z = (a, b) print("Original:") @@ -650,13 +738,8 @@ def test_simple_dict(): def test_complex_dict(): a = {'akey': 1, 'akey2': 2} - if six.PY2: - # NO LONG TYPE IN PY3! - b = {'bkey': 2.0, 'bkey3': long(3.0)} - else: - b = a c = {'ckey': "hello", "ckey2": "hi there"} - z = {'zkey1': a, 'zkey2': b, 'zkey3': c} + z = {'zkey1': a, 'zkey2': a, 'zkey3': c} print("Original:") pprint(z) @@ -666,7 +749,12 @@ def test_complex_dict(): z = load('test.hkl') pprint(z) + def test_multi_hickle(): + """ Dumping to and loading from the same file several times + + https://github.com/telegraphic/hickle/issues/20""" + a = {'a': 123, 'b': [1, 2, 4]} if os.path.exists("test.hkl"): @@ -676,55 +764,54 @@ def test_multi_hickle(): dump(a, "test.hkl", path="/test3", mode="r+") dump(a, "test.hkl", path="/test4", mode="r+") - a = load("test.hkl", path="/test") - b = load("test.hkl", path="/test2") - c = load("test.hkl", path="/test3") - d = load("test.hkl", path="/test4") + load("test.hkl", path="/test") + load("test.hkl", path="/test2") + load("test.hkl", path="/test3") + load("test.hkl", path="/test4") + def test_complex(): """ Test complex value dtype is handled correctly https://github.com/telegraphic/hickle/issues/29 """ - data = {"A":1.5, "B":1.5 + 1j, "C":np.linspace(0,1,4) + 2j} + data = {"A": 1.5, "B": 1.5 + 1j, "C": np.linspace(0, 1, 4) + 2j} dump(data, "test.hkl") data2 = load("test.hkl") for key in data.keys(): - assert type(data[key]) == type(data2[key]) + assert isinstance(data[key], data2[key].__class__) + def test_nonstring_keys(): """ Test that keys are reconstructed back to their original datatypes https://github.com/telegraphic/hickle/issues/36 """ - if six.PY2: - u = unichr(233) + unichr(0x0bf2) + unichr(3972) + unichr(6000) - - data = {u'test': 123, - 'def': 456, - 'hik' : np.array([1,2,3]), - u: u, - 0: 0, - True: 'hi', - 1.1 : 'hey', - #2L : 'omg', - 1j: 'complex_hashable', - (1, 2): 'boo', - ('A', 17.4, 42): [1, 7, 'A'], - (): '1313e was here', - '0': 0 - } - #data = {'0': 123, 'def': 456} - print(data) - dump(data, "test.hkl") - data2 = load("test.hkl") - print(data2) - - for key in data.keys(): - assert key in data2.keys() - - print(data2) - else: - pass + + data = { + u'test': 123, + 'def': [b'test'], + 'hik': np.array([1, 2, 3]), + 0: 0, + True: ['test'], + 1.1: 'hey', + 1j: 'complex_hashable', + (1, 2): 'boo', + ('A', 17.4, 42): [1, 7, 'A'], + (): '1313e was here', + '0': 0, + None: None + } + + print(data) + dump(data, "test.hkl") + data2 = load("test.hkl") + print(data2) + + for key in data.keys(): + assert key in data2.keys() + + print(data2) + def test_scalar_compression(): """ Test bug where compression causes a crash on scalar datasets @@ -732,30 +819,28 @@ def test_scalar_compression(): (Scalars are incompressible!) https://github.com/telegraphic/hickle/issues/37 """ - data = {'a' : 0, 'b' : np.float(2), 'c' : True} + data = {'a': 0, 'b': np.float(2), 'c': True} dump(data, "test.hkl", compression='gzip') data2 = load("test.hkl") print(data2) for key in data.keys(): - assert type(data[key]) == type(data2[key]) + assert isinstance(data[key], data2[key].__class__) + def test_bytes(): """ Dumping and loading a string. PYTHON3 ONLY """ - if six.PY3: - filename, mode = 'test.h5', 'w' - string_obj = b"The quick brown fox jumps over the lazy dog" - dump(string_obj, filename, mode) - string_hkl = load(filename) - #print "Initial list: %s"%list_obj - #print "Unhickled data: %s"%list_hkl - print(type(string_obj)) - print(type(string_hkl)) - assert type(string_obj) == type(string_hkl) == bytes - assert string_obj == string_hkl - else: - pass + + filename, mode = 'test.h5', 'w' + string_obj = b"The quick brown fox jumps over the lazy dog" + dump(string_obj, filename, mode) + string_hkl = load(filename) + print(type(string_obj)) + print(type(string_hkl)) + assert isinstance(string_hkl, bytes) + assert string_obj == string_hkl + def test_np_scalar(): """ Numpy scalar datatype @@ -763,13 +848,34 @@ def test_np_scalar(): https://github.com/telegraphic/hickle/issues/50 """ - fid='test.h5py' - r0={'test': np.float64(10.)} - s = dump(r0, fid) + fid = 'test.h5py' + r0 = {'test': np.float64(10.)} + dump(r0, fid) r = load(fid) print(r) - assert type(r0['test']) == type(r['test']) + assert isinstance(r0['test'], r['test'].__class__) + + +def test_slash_dict_keys(): + """ Support for having slashes in dict keys + https://github.com/telegraphic/hickle/issues/124""" + dct = {'a/b': [1, '2'], 1.4: 3} + + dump(dct, 'test.hdf5', 'w') + dct_hkl = load('test.hdf5') + + assert isinstance(dct_hkl, dict) + for key, val in dct_hkl.items(): + assert val == dct.get(key) + + # Check that having backslashes in dict keys will serialize the dict + dct2 = {'a\\b': [1, '2'], 1.4: 3} + with pytest.warns(loaders.load_builtins.SerializedWarning): + dump(dct2, 'test.hdf5') + + +# %% MAIN SCRIPT if __name__ == '__main__': """ Some tests and examples """ test_sort_keys() @@ -778,13 +884,14 @@ def test_np_scalar(): test_scalar_compression() test_complex() test_file_open_close() - test_dict_none() + test_hdf5_group() test_none() test_masked_dict() test_list() test_set() test_numpy() test_dict() + test_odict() test_empty_dict() test_compression() test_masked() @@ -792,27 +899,18 @@ def test_np_scalar(): test_comp_kwargs() test_list_numpy() test_tuple_numpy() - test_track_times() test_list_order() test_embedded_array() test_np_float() - - if six.PY2: - test_unicode() - test_unicode2() - test_string() - test_nonstring_keys() - - if six.PY3: - test_bytes() - + test_string() + test_nonstring_keys() + test_bytes() # NEW TESTS test_is_iterable() test_check_iterable_item_type() test_dump_nested() - test_with_dump() - test_with_load() + test_with_open_file() test_load() test_sort_keys() test_ndarray() @@ -821,6 +919,16 @@ def test_np_scalar(): test_complex_dict() test_multi_hickle() test_dict_int_key() + test_local_func() + test_binary_file() + test_state_obj() + test_slash_dict_keys() + test_invalid_file() + test_non_empty_group() + test_numpy_dtype() + test_object_numpy() + test_string_numpy() + test_list_object_numpy() # Cleanup - print("ALL TESTS PASSED!") \ No newline at end of file + print("ALL TESTS PASSED!") diff --git a/hickle/tests/test_hickle_helpers.py b/hickle/tests/test_hickle_helpers.py new file mode 100644 index 00000000..f5dab275 --- /dev/null +++ b/hickle/tests/test_hickle_helpers.py @@ -0,0 +1,49 @@ +#! /usr/bin/env python +# encoding: utf-8 +""" +# test_hickle_helpers.py + +Unit tests for hickle module -- helper functions. + +""" + + +# %% IMPORTS +# Package imports +import numpy as np + +# hickle imports +from hickle.helpers import ( + check_is_hashable, check_is_iterable, check_iterable_item_type) +from hickle.loaders.load_numpy import check_is_numpy_array + + +# %% FUNCTION DEFINITIONS +def test_check_is_iterable(): + assert check_is_iterable([1, 2, 3]) + assert not check_is_iterable(1) + + +def test_check_is_hashable(): + assert check_is_hashable(1) + assert not check_is_hashable([1, 2, 3]) + + +def test_check_iterable_item_type(): + assert check_iterable_item_type([1, 2, 3]) is int + assert not check_iterable_item_type([int(1), float(1)]) + assert not check_iterable_item_type([]) + + +def test_check_is_numpy_array(): + assert check_is_numpy_array(np.array([1, 2, 3])) + assert check_is_numpy_array(np.ma.array([1, 2, 3])) + assert not check_is_numpy_array([1, 2]) + + +# %% MAIN SCRIPT +if __name__ == "__main__": + test_check_is_hashable() + test_check_is_iterable() + test_check_is_numpy_array() + test_check_iterable_item_type() diff --git a/hickle/tests/test_legacy_load.py b/hickle/tests/test_legacy_load.py new file mode 100644 index 00000000..caf2bc89 --- /dev/null +++ b/hickle/tests/test_legacy_load.py @@ -0,0 +1,37 @@ +# %% IMPORTS +# Built-in imports +import glob +from os import path +import warnings + +# Package imports +import h5py + +# hickle imports +import hickle as hkl + + +# %% FUNCTION DEFINITIONS +def test_legacy_load(): + dirpath = path.dirname(__file__) + filelist = sorted(glob.glob(path.join(dirpath, 'legacy_hkls/*.hkl'))) + + # Make all warnings show + warnings.simplefilter("always") + + for filename in filelist: + try: + print(filename) + a = hkl.load(filename) + except Exception: + with h5py.File(filename) as a: + print(a.attrs.items()) + print(a.items()) + for key, item in a.items(): + print(item.attrs.items()) + raise + + +# %% MAIN SCRIPT +if __name__ == "__main__": + test_legacy_load() diff --git a/tests/test_scipy.py b/hickle/tests/test_scipy.py similarity index 89% rename from tests/test_scipy.py rename to hickle/tests/test_scipy.py index ab78311d..d7d811f8 100644 --- a/tests/test_scipy.py +++ b/hickle/tests/test_scipy.py @@ -1,15 +1,18 @@ +# %% IMPORTS +# Package imports import numpy as np +from py.path import local from scipy.sparse import csr_matrix, csc_matrix, bsr_matrix +# hickle imports import hickle from hickle.loaders.load_scipy import check_is_scipy_sparse_array -from py.path import local - # Set the current working directory to the temporary directory local.get_temproot().chdir() +# %% FUNCTION DEFINITIONS def test_is_sparse(): sm0 = csr_matrix((3, 4), dtype=np.int8) sm1 = csc_matrix((1, 2)) @@ -19,8 +22,6 @@ def test_is_sparse(): def test_sparse_matrix(): - sm0 = csr_matrix((3, 4), dtype=np.int8).toarray() - row = np.array([0, 0, 1, 2, 2, 2]) col = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6]) @@ -30,7 +31,7 @@ def test_sparse_matrix(): indptr = np.array([0, 2, 3, 6]) indices = np.array([0, 2, 2, 0, 1, 2]) data = np.array([1, 2, 3, 4, 5, 6]).repeat(4).reshape(6, 2, 2) - sm3 = bsr_matrix((data,indices, indptr), shape=(6, 6)) + sm3 = bsr_matrix((data, indices, indptr), shape=(6, 6)) hickle.dump(sm1, 'test_sp.h5') sm1_h = hickle.load('test_sp.h5') @@ -52,6 +53,7 @@ def test_sparse_matrix(): assert sm3_h. shape == sm3.shape +# %% MAIN SCRIPT if __name__ == "__main__": test_sparse_matrix() - test_is_sparse() \ No newline at end of file + test_is_sparse() diff --git a/requirements.txt b/requirements.txt index b3846b87..e6bf7127 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,4 @@ -h5py -numpy -dill +dill>=0.3.0 +h5py>=2.8.0 +numpy>=1.8 +six>=1.11.0 diff --git a/requirements_test.txt b/requirements_test.txt index 754b013a..14882e21 100644 --- a/requirements_test.txt +++ b/requirements_test.txt @@ -1,8 +1,8 @@ -pytest +codecov +pytest>=4.6.0 pytest-cov -pytest-runner -astropy<3.1;python_version>="3" -astropy<3.0;python_version=="2.7.*" -scipy -pandas -coveralls +astropy>=1.3,<4.0 +scipy>=1.0.0 +pandas>=0.24.0 +check-manifest +twine>=1.13.0 \ No newline at end of file diff --git a/setup.cfg b/setup.cfg index 48a6afe9..d1f49f6a 100644 --- a/setup.cfg +++ b/setup.cfg @@ -5,4 +5,11 @@ description-file=README.md test=pytest [tool:pytest] -addopts=--verbose --cov=hickle +addopts=--verbose --cov --cov-config=setup.cfg --cov-report=term-missing + +[coverage:run] +include=hickle/* +omit= + hickle/tests/* + hickle/*/tests/* + hickle/legacy_v3/* diff --git a/setup.py b/setup.py index aba344d6..252ca0d0 100644 --- a/setup.py +++ b/setup.py @@ -6,11 +6,13 @@ # TEST: twine upload --repository-url https://test.pypi.org/legacy/ dist/* # twine upload dist/* +from codecs import open +import re + from setuptools import setup, find_packages import sys -version = '3.4.3' -author = 'Danny Price' +author = "Danny Price, Ellert van der Velden and contributors" with open("README.md", "r") as fh: long_description = fh.read() @@ -21,22 +23,46 @@ with open("requirements_test.txt", 'r') as fh: test_requirements = fh.read().splitlines() +# Read the __version__.py file +with open('hickle/__version__.py', 'r') as f: + vf = f.read() + +# Obtain version from read-in __version__.py file +version = re.search(r"^_*version_* = ['\"]([^'\"]*)['\"]", vf, re.M).group(1) + setup(name='hickle', version=version, - description='Hickle - a HDF5 based version of pickle', + description='Hickle - an HDF5 based version of pickle', long_description=long_description, long_description_content_type='text/markdown', author=author, author_email='dan@thetelegraphic.com', url='http://github.com/telegraphic/hickle', - download_url='https://github.com/telegraphic/hickle/archive/%s.tar.gz' % version, + download_url=('https://github.com/telegraphic/hickle/archive/v%s.zip' + % (version)), platforms='Cross platform (Linux, Mac OSX, Windows)', + classifiers=[ + 'Development Status :: 5 - Production/Stable', + 'Intended Audience :: Developers', + 'Intended Audience :: Science/Research', + 'License :: OSI Approved', + 'Natural Language :: English', + 'Operating System :: MacOS', + 'Operating System :: Microsoft :: Windows', + 'Operating System :: Unix', + 'Programming Language :: Python', + 'Programming Language :: Python :: 3', + 'Programming Language :: Python :: 3.5', + 'Programming Language :: Python :: 3.6', + 'Programming Language :: Python :: 3.7', + 'Programming Language :: Python :: 3.8', + 'Topic :: Software Development :: Libraries :: Python Modules', + 'Topic :: Utilities', + ], keywords=['pickle', 'hdf5', 'data storage', 'data export'], - #py_modules = ['hickle', 'hickle_legacy'], install_requires=requirements, tests_require=test_requirements, -# setup_requires = ['pytest-runner', 'pytest-cov'], - python_requires='>=2.7', + python_requires='>=3.5', packages=find_packages(), zip_safe=False, ) diff --git a/tests/legacy_hkls/hickle_1_1_0.hkl b/tests/legacy_hkls/hickle_1_1_0.hkl deleted file mode 100644 index 9f056b8ed36edaefd4a4fef965e4eaf33ed73fb7..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 7768 zcmeHMO>fgc5M4V7m;xec5tIt5+(7E3he&XuMKs7KK!8vrBwRy69I2!zai#RuWBL=y zi6cjT21kBEe?qx%ExX?Lnl`p$5Ag9`a&WBjBdDZa_ZfgE@LzmNNPR`>t;;JLF@ zUO9U1tXve8LjV@f@0Y*@0yi+s@7|=h_K%F?b0=5XJNf!Abd5XITw^6#)}TIEi0-F0 vc&Q^B`w965*|k5%k<%rU;*9Z3dHjDvy)udC(QDiqzF5(Ihk4YWDnNb$wd|mF diff --git a/tests/legacy_hkls/hickle_1_3_2.hkl b/tests/legacy_hkls/hickle_1_3_2.hkl deleted file mode 100644 index 629a7496e111a95f571a06eb9329b81fbfbfc96f..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 7768 zcmeHMO>fgc5M4V7m;xec5tIt5+(7E3he&XuMKs7KK!8vrBwRy69I2!zai#RuWBL=y zi6cjT21kBEe?qx&E4$wJnl`p$5Ag9`a&WBjBdDZa_ZfgE@LzmNNPR`>t;;JLF@ zUO9U1tXve8LjV@f@0Y*@0yi+s@7|=h_K%F?b0=5XJNf!Abd5XITw^6#)}TIEi0-F0 vc&Q^B`w965*|k5%k<%rU;*9Z3dHjDvy)udC(QDiqzF5(Ihk4YWDnNb$d6S@w diff --git a/tests/legacy_hkls/hickle_1_4_0.hkl b/tests/legacy_hkls/hickle_1_4_0.hkl deleted file mode 100644 index a420e71d44022ab83574f4e2560e5e1e899f9a7b..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 7768 zcmeHMO>fgc5M4V7m;xec5tIt5+(7E3he&XuMKs7KK!8vrBwRy69I2!zai#RuWBL=y zi6cjT21kBEe?qx(DZAeHnl`p$5Ag9`a&WBjBdDZa_ZfgE@LzmNNPR`>t;;JLF@ zUO9U1tXve8LjV@f@0Y*@0yi+s@7|=h_K%F?b0=5XJNf!Abd5XITw^6#)}TIEi0-F0 vc&Q^B`w965*|k5%k<%rU;*9Z3dHjDvy)udC(QDiqzF5(Ihk4YWDnNb$JvyMG diff --git a/tests/legacy_hkls/hickle_2_1_0.hkl b/tests/legacy_hkls/hickle_2_1_0.hkl deleted file mode 100644 index 68288e51e37dbb01d25faf0118e8830568ac3c4c..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 16072 zcmeHO%}-N75Z|^6Pb^y0L-7OOP5g>AqLPSG0@a!jAk-KV52euB*9e8QC4pOy_#agN z0+0R)9QhMCdi01lZ@BYj<~7~&`k)_5?QWp^H9Ncedo$DBnb*(pnb(5@*9Qb7!w~00 z$S(575vBcC*ur)0U#5bO@?FZm`LW;=eFFNz)E*-KW&j(0Q~SHcSWHMt=Fd5GI}8k` zK^J-nfp~0c0rtB@n`SB9Cplv*(y3A^839Sc6z7g6^C6N~Rjz{~Ds=_s=@Zw4A*N@i z5(!x!mJ`F;;2+n363gr9lkJk+PS6$<5#nRWvLW8b787sh=R`=-AX(I*g#xLKq#dVm z=OtZOly){V(0^4jg*kjaoRR0{+{&$fTC*uxBVkm1@=5M6hN)O@59|#Hu}wp~^wzq8 zg#5hfXVNYIGXkpKj?-q#BuBjWRMr+sk>6S`LR)$xn_MYmGHD{nrwRoqTA5tdDzR<} zVN$=EvED{ONo885`5Nhst)qm3I^>izL zFF$B>1b44)^$dLVm=?&-$9fhY#`bM$XD?&rA+|H!9@T$5!uAMeVxJtV zOm|52&?)wiEwJNqF84Pd%as>kmUqkojKUsuJe8jq=uhg^_rb&H39 zUpJX6W!9bjF!D}AO~EGyFs?(uu;pZ&0y6RgWi@^StAcjF-*1yb%mz}xCrrEGNCwuf zP#uVmQX?N4?GJpiZu{HZDOnr&WSZkc9LL$cjq_NT$xoIfr{P2e_4^}o>qNsek%B-_ z#3`N^vg3*L+=^Q(_~^Xiind?gK&I%ccQjdyYK)B=?HR$cdVDz2?X5U zXR(y~V!PjG;rpO^KW%hhYVUopU%3zBy^il=e2%W>r@bEc&PWm=>3-1_3U6bySzl#^fDpiWtV?GMnY|udsCF$?6T_a>6Beo z{gr3MF0=jPJV5?Acz+tcAKd+QriqdSRMk{(kDGfxyI3mZ@>$>;uQG2#Nwv!d&AI#9 zu#)a(;bxZ)$$ws^?RnYd@^d8AcDcLdq1n~D+2sn!5%cYoT~^~hXT>g8&&i0u>EkJP zO&sZUUUs+75Q1|%FXz+phX|=1NfE_qTKrV)^5(M^8TROKH@mz`@y{~h;$@e^FOg8& z<=zw|H@h4mIbu{gWtY{s{ok|8UUBWP?dKRWCn+)=6=N60*by0Q_w(|UiI~-DRQ-9G ye@|H-zi%eKy-dGXw)lbZ3k^=aaY41q#zbR!-FrL&t;V(O-;