Support serializing numpy and pandas types #1180

sethmlarson · 2020-03-25T21:02:00Z

This PR attempts to import numpy and pandas and if either library is found adds to the list of types that the default JSONSerializer supports. Numpy adds the integers, floats, boolean, ndarray, and datetime. Pandas adds support for Series, Timestamp, and NA -> None. Am I missing any important types that can be safely serialized to JSON?

Notably I left out DataFrame and numpy.nan. NaN is already handled by JSON and doesn't have semantics for Elasticsearch (at least I don't think it does?) and DateFrame seemed a bit too heavy to support natively? Better for users to call DataFrame.to_json() themselves?

Also wanted to confirm my thinking that it is appropriate to support Series and ndarray? Or is that also too presumptive of what a user wants from the library?

Closes #1178
Closes elastic/eland#142

stevedodson

LGTM

Winterflower · 2020-03-26T13:51:13Z

elasticsearch/serializer.py

+            elif isinstance(data, np.ndarray):
+                return data.tolist()
+        if pd:
+            if isinstance(data, pd.Series):


I'm definitely not the expert here (so I might asking the wrong question here), but I've been recently working a bit with pandas.dtypes and I'm wondering, how will this serialiser handle the types of the elements inside a pd.Series or a pd.DataFrame. Usually these elements are various numerical types, so probably handled by the Numpy converters, but sometimes a user can set these dtypes to pandas specific stuff like category. For ex. I've recently done something like this in my Jupyter notebook

# finally we will correct the mappings on the remaining columns mappings = {'carat': 'float64', 'cut': 'category', 'color': 'category', 'depth':'float64', 'table':'float64', 'price': 'float64', 'x':'float64', 'y':'float64', 'z':'float64'} df_cleaned = df_cleaned.astype(mappings)

How would the serialisation handle category or some of the other pandas specific dtypes in a Series (here is a list of some more exotic dtypes https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html#dtypes)?

Category is an interesting one, we'd lose the "categorical" aspect of the value if we serialize to a string but maybe that's fine? Maintaining the categorical aspect would require config on the mapping but as long as that's done it'd be a solution, so maybe we do category -> str?

Support serializing numpy and pandas types

8ae60ec

sethmlarson mentioned this pull request Mar 25, 2020

Serializing and ingesting into ES Python data structures with numpy datatypes elastic/eland#142

Closed

sethmlarson requested review from stevedodson and Winterflower March 26, 2020 12:15

Skip pd.NA logic if not implemented

dddcda0

stevedodson approved these changes Mar 26, 2020

View reviewed changes

Winterflower reviewed Mar 26, 2020

View reviewed changes

Add support for pandas.Categorical

fed9a00

sethmlarson added 6.x 7.x labels Mar 30, 2020

Skip TestIndicesPutIndexTemplate10Basic

998a91b

sethmlarson merged commit 47d5e6f into master Mar 30, 2020

sethmlarson deleted the numpy-serializer branch March 30, 2020 14:30

mesejo mentioned this pull request Jun 22, 2020

Migrate all rST code blocks from :: to code-block: python #1287

Closed

dependabot bot mentioned this pull request Mar 15, 2021

Bump elasticsearch from 6.3.1 to 6.8.2 in /requirements mozilla-releng/buildhub2#751

Open

This was referenced Sep 9, 2021

startup time regression traced to elasticsearch #1715

Closed

[BUG] startup time regression traced to elasticsearch opensearch-project/opensearch-py#67

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support serializing numpy and pandas types #1180

Support serializing numpy and pandas types #1180

sethmlarson commented Mar 25, 2020 •

edited

stevedodson left a comment

Winterflower Mar 26, 2020

sethmlarson Mar 26, 2020

Support serializing numpy and pandas types #1180

Support serializing numpy and pandas types #1180

Conversation

sethmlarson commented Mar 25, 2020 • edited

stevedodson left a comment

Choose a reason for hiding this comment

Winterflower Mar 26, 2020

Choose a reason for hiding this comment

sethmlarson Mar 26, 2020

Choose a reason for hiding this comment

sethmlarson commented Mar 25, 2020 •

edited