Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

馃挭馃徏 making orjson serialization more robust, see #118 #131

Merged
merged 4 commits into from
Nov 8, 2022

Conversation

jonasvdd
Copy link
Member

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented Oct 31, 2022

Codecov Report

Merging #131 (c262153) into main (38ea31c) will decrease coverage by 3.57%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #131      +/-   ##
==========================================
- Coverage   97.44%   93.87%   -3.58%     
==========================================
  Files          11       11              
  Lines         862      865       +3     
==========================================
- Hits          840      812      -28     
- Misses         22       53      +31     
Impacted Files Coverage 螖
...ler/figure_resampler/figure_resampler_interface.py 99.19% <100.00%> (-0.54%) 猬囷笍
...tly_resampler/figure_resampler/figure_resampler.py 64.91% <0.00%> (-25.44%) 猬囷笍

馃摚 We鈥檙e building smart automated test selection to slash your CI/CD build times. Learn more

@jonasvdd jonasvdd requested a review from jvdd October 31, 2022 18:15
@jonasvdd jonasvdd added the enhancement New feature or request label Oct 31, 2022
# NOTE:
# * float16 and float128 aren't supported with latest orjson versions (3.8.1)
# * this method assumes that the it will not get a float128 series
if series.dtype in [np.float16]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, actually this allows for weird / unexpected behavior - see below

y = np.randon.randn(3_000).astype(np.float16)

When len(y) > default_n_shown_samples => works like a charm 馃敟
image

When len(y) <= default_n_shown_samples => crashes 馃挜

image

The eventual error is quite clean tho
TypeError: unsupported datatype in numpy array


What is your opinion on this? @jonasvdd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we better add smth to the FAQ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've cut of the error message, will try to reproduce this locally and see how clean the error message actually is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay indeed the error stacktrace is rather clean 猬囷笍

image

tests/test_figure_resampler.py Outdated Show resolved Hide resolved
@jvdd
Copy link
Member

jvdd commented Nov 5, 2022

After giving it some more thought. I see two paths forward;

  1. We revert orjson to an optional dependency.
    => Once all the major datatypes (16-bit -> 64-bit) are supported we can lock it in as a dependency.

  2. We keep orjson as dependency & add a section to the FAQ about issues when dealing with float16
    Basically we will suggest there to uninstall orjson or change the

So I'll go forward with removing the code that makes orjson "more robust".


On another note, I think the key to add f16 support to orjson (since it is not a primitive data type in rust), is the serialization in the half package. However, this does not seem trivial to me 馃槵

@jonasvdd
Copy link
Member Author

jonasvdd commented Nov 5, 2022

I think if we make orjson a dependency and lock the version to be above 3.8.x we will already support most of the datatypes? However, the float16 datatype will then fail all the time (unless we do some float16->float32 casting), Which I'm actually okay with. This would reduce this method:

    @staticmethod
    def _parse_dtype_orjson(series: np.ndarray) -> np.ndarray:
        """Verify the orjson compatibility of the series and convert it if needed."""
        # NOTE:
        #    * float16 and float128 aren't supported with latest orjson versions (3.8.1)
        #    * this method assumes that the it will not get a float128 series
        if series.dtype in [np.float16]:
            return series.astype(np.float32)

        # orjson < 3.8.0 encoding cannot encode with int16 & uint16 dtype
        if series.dtype in [np.int16, np.uint16]:
            major_v, minor_v = list(map(int, orjson.__version__.split(".")))[:2]
            if major_v < 3 or major_v == 3 and minor_v < 8:
                if series.dtype == np.uint16:
                    return series.astype("uint32")
                return series.astype(np.int32)
        return series

to this code:

    @staticmethod
    def _parse_dtype_orjson(series: np.ndarray) -> np.ndarray:
        """Verify the orjson compatibility of the series and convert it if needed."""
        # NOTE:
        #    * float16 and float128 aren't supported with latest orjson versions (3.8.1)
        #    * this method assumes that the it will not get a float128 series
        if series.dtype in [np.float16]:
            return series.astype(np.float32)
        return series

Note that we do not provide any support for the float128 datatype so-far.

Copy link
Member

@jvdd jvdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree with my latest changes @jonasvdd we can merge it and create a new release

Copy link
Member Author

@jonasvdd jonasvdd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jonasvdd jonasvdd merged commit 3d6b49a into main Nov 8, 2022
@jvdd jvdd deleted the orjson_werkzeug branch December 12, 2022 07:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants