Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json fails to serialise numpy.int64 #68501

Open
thomas-arildsen mannequin opened this issue May 28, 2015 · 19 comments
Open

json fails to serialise numpy.int64 #68501

thomas-arildsen mannequin opened this issue May 28, 2015 · 19 comments
Labels
3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@thomas-arildsen
Copy link
Mannequin

thomas-arildsen mannequin commented May 28, 2015

BPO 24313
Nosy @pitrou, @bitdancer, @njsmith, @eli-b, @serhiy-storchaka, @isidentical, @vlbrown, @mxposed
Files
  • debug_json.py: Minimal example to demonstrate the problem
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2015-05-28.08:32:31.477>
    labels = ['type-bug', '3.7']
    title = 'json fails to serialise numpy.int64'
    updated_at = <Date 2022-02-24.00:38:48.258>
    user = 'https://bugs.python.org/thomas-arildsen'

    bugs.python.org fields:

    activity = <Date 2022-02-24.00:38:48.258>
    actor = 'mxposed'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2015-05-28.08:32:31.477>
    creator = 'thomas-arildsen'
    dependencies = []
    files = ['39530']
    hgrepos = []
    issue_num = 24313
    keywords = []
    message_count = 17.0
    messages = ['244288', '244321', '244352', '244355', '244359', '244363', '244370', '244371', '254734', '257451', '257455', '257459', '350567', '350581', '355133', '355143', '413869']
    nosy_count = 10.0
    nosy_names = ['pitrou', 'r.david.murray', 'njs', 'Eli_B', 'serhiy.storchaka', 'thomas-arildsen', 'Amit Feller', 'BTaskaya', 'vlbrown', 'mxposed']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue24313'
    versions = ['Python 3.7']

    @thomas-arildsen
    Copy link
    Mannequin Author

    thomas-arildsen mannequin commented May 28, 2015

    When I run the attached example in Python 2.7.9, it succeeds. In Python 3.4, it fails as shown below. I use json 2.0.9 and numpy 1.9.2 with both versions of Python. Python and all packages provided by Anaconda 2.2.0.
    The error seems to be caused by the serialised object containing a numpy.int64 type. It might fail with other 64-bit numpy types as well (untested).

    ---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)
    /home/tha/tmp/debug_json/debug_json.py in <module>()
          4 test = {'value': np.int64(1)}
          5 
    ----> 6 obj=json.dumps(test)

    /home/tha/.conda/envs/python3/lib/python3.4/json/init.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    228 cls is None and indent is None and separators is None and
    229 default is None and not sort_keys and not kw):
    --> 230 return _default_encoder.encode(obj)
    231 if cls is None:
    232 cls = JSONEncoder

    /home/tha/.conda/envs/python3/lib/python3.4/json/encoder.py in encode(self, o)
    190 # exceptions aren't as detailed. The list call should be roughly
    191 # equivalent to the PySequence_Fast that ''.join() would do.
    --> 192 chunks = self.iterencode(o, _one_shot=True)
    193 if not isinstance(chunks, (list, tuple)):
    194 chunks = list(chunks)

    /home/tha/.conda/envs/python3/lib/python3.4/json/encoder.py in iterencode(self, o, _one_shot)
    248 self.key_separator, self.item_separator, self.sort_keys,
    249 self.skipkeys, _one_shot)
    --> 250 return _iterencode(o, 0)
    251
    252 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

    /home/tha/.conda/envs/python3/lib/python3.4/json/encoder.py in default(self, o)
    171
    172 """
    --> 173 raise TypeError(repr(o) + " is not JSON serializable")
    174
    175 def encode(self, o):

    TypeError: 1 is not JSON serializable

    @thomas-arildsen thomas-arildsen mannequin added the type-crash A hard crash of the interpreter, possibly with a core dump label May 28, 2015
    @bitdancer
    Copy link
    Member

    All python3 ints are what used to be long ints in python2, so the code that recognized short ints no longer exists. Do the numpy types implement __index__? It looks like json doesn't check for __index__, and I wonder if it should.

    @pitrou
    Copy link
    Member

    pitrou commented May 28, 2015

    It looks like json doesn't check for __index__, and I wonder if it should.

    I don't know. Simply, under 2.7, int64 inherits from int:

    >>> np.int64.__mro__
    (<type 'numpy.int64'>, <type 'numpy.signedinteger'>, <type 'numpy.integer'>, <type 'numpy.number'>, <type 'numpy.generic'>, <type 'int'>, <type 'object'>)
    while it doesn't under 3.x:
    >>> np.int64.__mro__ 
    (<class 'numpy.int64'>, <class 'numpy.signedinteger'>, <class 'numpy.integer'>, <class 'numpy.number'>, <class 'numpy.generic'>, <class 'object'>)

    @pitrou pitrou added type-feature A feature request or enhancement and removed type-crash A hard crash of the interpreter, possibly with a core dump labels May 28, 2015
    @bitdancer
    Copy link
    Member

    Ah, so this is a numpy bug?

    @serhiy-storchaka
    Copy link
    Member

    Yes, it looks as a bug (or rather lack of feature) in numpy, but numpy have no chance to fix it without help from Python. The json module is not flexible enough.

    For now this issue can be workarounded only from user side, with special default handler.

    >>> import numpy, json
    >>> def default(o):
    ...     if isinstance(o, numpy.integer): return int(o)
    ...     raise TypeError
    ... 
    >>> json.dumps({'value': numpy.int64(42)}, default=default)
    '{"value": 42}'

    @pitrou
    Copy link
    Member

    pitrou commented May 29, 2015

    I wouldn't call it a bug in Numpy (a quirk perhaps?). Numpy ints are fixed-width ints, so some of them can inherit from Python int in 2.x, but not in 3.x.
    But not all of them do, since the bitwidth can be different:

    >>> issubclass(np.int64, int)
    True
    >>> issubclass(np.int32, int)
    False
    >>> issubclass(np.int16, int)
    False

    @bitdancer
    Copy link
    Member

    So in python2, some were json serializable and some weren't? Yes, I'd call that a quirk :)

    So back to the question of whether it makes sense for json to look for __index__ to decide if something can be serialized as an int. If not, I don't think there is anything we can do.

    @pitrou
    Copy link
    Member

    pitrou commented May 29, 2015

    I don't know about __index__, but there's the ages-old discussion of allowing some kind of __json__ hook on types. Of course, none of those solutions would allow round-tripping.

    @eli-b
    Copy link
    Mannequin

    eli-b mannequin commented Nov 16, 2015

    On 64-bit Windows, my 64-bit Python 2.7.9 and my 32-bit 2.7.10 Python both reproduce the failure with a similar traceback.

    @thomas-arildsen
    Copy link
    Mannequin Author

    thomas-arildsen mannequin commented Jan 4, 2016

    Is there any possibility that json could implement special handling of NumPy types? This "lack of a feature" seems to have propagated back into Python 2.7 now in some recent update...

    @njsmith
    Copy link
    Contributor

    njsmith commented Jan 4, 2016

    Nothing's changed in python 2.7. Basically: (a) no numpy ints have ever serialized in py3. (b) in py2, either np.int32 *xor* np.int64 will serialize correctly, and which one it is depends on sizeof(long) in the C compiler used to build Python. (This follows from the fact that in py2, the Python 'int' type is always the same size as C 'long'.)

    So the end result is: on OS X and Linux, 32-bit Pythons can JSON-serialize np.int32 objects, and 64-bit Pythons can JSON-serialize np.int64 objects, because 64-bit OS X and Linux is ILP64. On Windows, both 32- and 64-bit Pythons can JSON-serialize np.int32 objects, and can't serialize np.int64 objects, because 64-bit Windows is LLP64.

    @thomas-arildsen
    Copy link
    Mannequin Author

    thomas-arildsen mannequin commented Jan 4, 2016

    Thanks for the clarification.

    @vlbrown
    Copy link
    Mannequin

    vlbrown mannequin commented Aug 26, 2019

    This is still broken. With pandas being popular, it's more likely someone might hit it. Can we fix this?

    At the very least, the error message needs to be made much more specific.

    I have created a dictionary containing pandas stats.

    def summary_stats(s):
        """ 
        Calculate summary statistics for a series or list, s 
        returns a dictionary
        """
        
        stats = {
          'count': 0,
          'max': 0,
          'min': 0,
          'mean': 0,
          'median': 0,
          'mode': 0,
          'std': 0,
          'z': (0,0)
        }
        
        stats['count'] = s.count()
        stats['max'] = s.max()
        stats['min'] = s.min()
        stats['mean'] = round(s.mean(),3)
        stats['median'] = s.median()
        stats['mode'] = s.mode()[0]
        stats['std'] = round(s.std(),3)
    
        std3 = 3* stats['std']
        low_z = round(stats['mean'] - (std3),3)
        high_z = round(stats['mean'] + (std3),3)
        stats['z'] = (low_z, high_z)
            
        return(stats)
    

    Apparently, pandas (sometimes) returns numpy ints and numpy floats.

    Here's a piece of the dictionary:

     {'count': 597,
       'max': 0.95,
       'min': 0.01,
       'mean': 0.585,
       'median': 0.58,
       'mode': 0.59,
       'std': 0.122,
       'z': (0.219, 0.951)}
    ```\`
    
    It looks fine, but when I try to dump the dict to json
    

    with open('Data/station_stats.json', 'w') as fp:
    json.dump(station_stats, fp)

    
    I get this error
    

    TypeError: Object of type int64 is not JSON serializable

    
    \*\*Much searching** led me to discover that I apparently have numpy ints which I have confirmed.
    
    

    for key, value in station_stats['657']['Fluorescence'].items():
    print(key, value, type(value))

    count 3183 <class 'numpy.int64'>
    max 2.8 <class 'float'>
    min 0.02 <class 'float'>
    mean 0.323 <class 'float'>
    median 0.28 <class 'float'>
    mode 0.24 <class 'numpy.float64'>
    std 0.194 <class 'float'>
    z (-0.259, 0.905) <class 'tuple'>

    
    

    Problem description

    pandas statistics sometimes produce numpy numerics.

    numpy ints are not supported by json.dump

    Expected Output

    I expect ints, floats, strings, ... to be JSON srializable.

    INSTALLED VERSIONS

    commit : None
    python : 3.7.3.final.0
    python-bits : 64
    OS : Darwin
    OS-release : 15.6.0
    machine : x86_64
    processor : i386
    byteorder : little
    LC_ALL : None
    LANG : en_US.UTF-8
    LOCALE : en_US.UTF-8

    pandas : 0.25.0
    numpy : 1.16.4
    pytz : 2019.1
    dateutil : 2.8.0
    pip : 19.1.1
    setuptools : 41.0.1
    Cython : 0.29.12
    pytest : 5.0.1
    hypothesis : None
    sphinx : 2.1.2
    blosc : None
    feather : None
    xlsxwriter : 1.1.8
    lxml.etree : 4.3.4
    html5lib : 1.0.1
    pymysql : 0.9.3
    psycopg2 : None
    jinja2 : 2.10.1
    IPython : 7.7.0
    pandas_datareader: None
    bs4 : 4.7.1
    bottleneck : 1.2.1
    fastparquet : None
    gcsfs : None
    lxml.etree : 4.3.4
    matplotlib : 3.1.0
    numexpr : 2.6.9
    odfpy : None
    openpyxl : 2.6.2
    pandas_gbq : None
    pyarrow : None
    pytables : None
    s3fs : None
    scipy : 1.3.0
    sqlalchemy : 1.3.5
    tables : 3.5.2
    xarray : None
    xlrd : 1.2.0
    xlwt : 1.3.0
    xlsxwriter : 1.1.8

    ```

    @vlbrown vlbrown mannequin added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error and removed type-feature A feature request or enhancement labels Aug 26, 2019
    @vlbrown
    Copy link
    Mannequin

    vlbrown mannequin commented Aug 26, 2019

    Note also that pandas DataFrame.to_json() method has no issue with int64. Perhaps you could borrow their code.

    @isidentical
    Copy link
    Sponsor Member

    What is the next step of this 4-year-old issue? I think i can prepare a patch for using __index__ (as suggested by @r.david.murray)

    @serhiy-storchaka
    Copy link
    Member

    We could use __index__ for serializing numpy.int64. But what to do with numpy.float32 and numpy.float128? It is a part of a much larger problem (which includes other numbers, collections, encoded strings, named tuples and data classes, etc). I am working on it, but there is a lot of work.

    @mxposed
    Copy link
    Mannequin

    mxposed mannequin commented Feb 24, 2022

    Just ran into this. Are there any updates? Is there any task to contribute to regarding this?

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @petsuter
    Copy link

    there's the ages-old discussion of allowing some kind of json hook on types

    See #71549

    @iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 23, 2023
    @flofriday
    Copy link

    As far as I can see, this issue can be closed.
    There is no president in giving some libraries like numpy special treatment in the interpreter (even if they are popular) and a more general discussion for allowing custom hooks for json serialisation already has their own discussion as previously mentioned.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    Status: No status
    Development

    No branches or pull requests

    8 participants