## A JSON format compatible with the Pandas data structure 
-------     
      
### Introduction
The data type is not explicitely taken into account in the current JSON interface.     
    
The existing solution is to use a data schema in addition.

### Proposal
To have a simple, compact and reversible solution, I propose to use the [JSON-NTV format (Named and Typed Value)](https://github.com/loco-philippe/NTV#readme) - which integrates the notion of type - and its JSON-TAB variation for tabular data.    
This solution allows to include a large number of types (not necessarily Pandas dtype).

### Content
This NoteBook uses examples to present some key points

*(active link on jupyter Notebook or Nbviewer)*
- [0 - Simple example](#0---Simple-example)
- [1 - Current Json interface](#1---Current-Json-interface)
    - [Example : simple column](#Example-:-simple-column)
    - [Types and Json interface](#Types-and-Json-interface)
    - [Data compactness](#data-compactness)
    - [External types](#external-types)
- [2 - Series](#2---Series)
    - [Simple example](#Simple-example)
    - [Typed example](#Typed-example)
    - [Examples with a non-Pandas type](#Examples-with-a-non-Pandas-type)
    - [Categorical examples](#Categorical-examples)
- [3 - DataFrame](#3---DataFrame)
    - [Initial example](#Initial-example)
    - [Complete example](#Complete-example)
    - [Json data can be annotated](#Json-data-can-be-annotated)
    - [Categorical data can be included](#Categorical-data-can-be-included)
- [4 - Annexe : Series tests](#4---Annexe-:-Series-tests)     
        
### References
- [JSON-NTV specification](https://github.com/loco-philippe/NTV/blob/main/documentation/JSON-NTV-standard.pdf)
- [JSON-TAB specification](https://github.com/loco-philippe/NTV/blob/main/documentation/JSON-TAB-standard.pdf)
- [JSON-NTV classes and methods](https://loco-philippe.github.io/NTV/json_ntv.html)

This Notebook can also be viewed at [nbviewer](http://nbviewer.org/github/loco-philippe/NTV/tree/main/example)

In [1]:
import math
from pprint import pprint
from json_ntv import Ntv, NtvConnector
import pandas as pd
from shapely.geometry import Point
from datetime import date, datetime

## 0 - Simple example

- The example is a Dataframe with several dtype

In [2]:


tab_data = {'index':           [100, 200, 300, 400, 500, 600],
            'dates::date':     pd.Series([date(1964,1,1), date(1985,2,5), date(2022,1,21), date(1964,1,1), date(1985,2,5), date(2022,1,21)]), 
            'value':           [10, 10, 20, 20, 30, 30],
            'value32':         pd.Series([12, 12, 22, 22, 32, 32], dtype='int32'),
            'res':             [10, 20, 30, 10, 20, 30],
            'coord::point':    pd.Series([Point(1,2), Point(3,4), Point(5,6), Point(7,8), Point(3,4), Point(5,6)]),
            'names':           pd.Series(['john', 'eric', 'judith', 'mila', 'hector', 'maria'], dtype='string'),
            'unique':          True }
df = pd.DataFrame(tab_data).set_index('index')
print(df.dtypes)
df

dates::date     object
value            int64
value32          int32
res              int64
coord::point    object
names           string
unique            bool
dtype: object


  arr = construct_1d_object_array_from_listlike(values)


Unnamed: 0_level_0,dates::date,value,value32,res,coord::point,names,unique
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
100,1964-01-01,10,12,10,POINT (1 2),john,True
200,1985-02-05,10,12,20,POINT (3 4),eric,True
300,2022-01-21,20,22,30,POINT (5 6),judith,True
400,1964-01-01,20,22,10,POINT (7 8),mila,True
500,1985-02-05,30,32,20,POINT (3 4),hector,True
600,2022-01-21,30,32,30,POINT (5 6),maria,True


- the example has a simple and compact JSON representation including dtype

In [3]:
df_json = Ntv.obj(df)
pprint(df_json.to_obj(), compact=True, width=120)

{':tab': {'coord::point': [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0], [7.0, 8.0], [3.0, 4.0], [5.0, 6.0]],
          'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'],
          'index': [100, 200, 300, 400, 500, 600],
          'names::string': ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],
          'res': [10, 20, 30, 10, 20, 30],
          'unique': [True, True, True, True, True, True],
          'value': [10, 10, 20, 20, 30, 30],
          'value32::int32': [12, 12, 22, 22, 32, 32]}}


- The json conversion is reversible : df_from_json equals initial df

In [4]:
df_from_json = df_json.to_obj(format='obj')
print('df created from JSON is equal to initial df ? ', df_from_json.equals(df))
df_from_json

df created from JSON is equal to initial df ?  True


  subarr = construct_1d_object_array_from_listlike(arr)


Unnamed: 0,dates::date,value,value32,res,coord::point,names,unique
100,1964-01-01,10,12,10,POINT (1 2),john,True
200,1985-02-05,10,12,20,POINT (3 4),eric,True
300,2022-01-21,20,22,30,POINT (5 6),judith,True
400,1964-01-01,20,22,10,POINT (7 8),mila,True
500,1985-02-05,30,32,20,POINT (3 4),hector,True
600,2022-01-21,30,32,30,POINT (5 6),maria,True


## 1 - Current Json interface

### Example : simple column
- the interface is not reversible with this example

In [5]:
df = pd.read_json('{"test integer":[1,2,3], "test string": ["a", "b", "c"]}')
print(df)
# but it is impossible with to_json() to recreate the initial data

   test integer test string
0             1           a
1             2           b
2             3           c


### Types and Json interface 

- the only way to keep the types in the json interface is to use the orient='table' option

In [6]:
df  = pd.DataFrame(pd.Series([10,20], name='int32', dtype='Int32'))

# dtype is not included in usual json interface
df.to_json()

'{"int32":{"0":10,"1":20}}'

- only few types are allowed in json-table interface : int64, float64, bool, datetime64, timedelta64, categorical

In [7]:
# 'int32' is lost in json-table interface
df2 = pd.read_json(df.to_json(orient='table'), orient='table')
print(df2.dtypes)
print('\nis Json translation reversible ? ', df.equals(df2))

int32    int64
dtype: object

is Json translation reversible ?  False


- allowed types are not always kept in json interface 

In [8]:
df = pd.DataFrame(pd.Series([10,20], name='float64', dtype='float64'))
print(df.dtypes, '\n')
df2 = pd.read_json(df.to_json(orient='records'), orient='records')
print(df2.dtypes)
print('\nis Json translation reversible ? ', df.equals(df2))

float64    float64
dtype: object 

float64    int64
dtype: object

is Json translation reversible ?  False


In [9]:
sr = pd.Series([math.nan,math.nan], name='nan')
print(sr.dtype, '\n')
sr2 = pd.read_json(sr.to_json(), typ='series')
print(sr2)
print('\nis Json translation reversible ? ', sr.equals(sr2))

float64 

0   NaT
1   NaT
dtype: datetime64[ns]

is Json translation reversible ?  False


- data with 'object' dtype is kept only in certain cases

In [10]:
dfd = pd.DataFrame({'dates': [date(2021, 3, 1), date(2021, 3, 3)]})

print(dfd.to_json(default_handler=date.isoformat), '\n')
print(dfd.to_json(orient='table'), '\n')

dfd2 = pd.read_json(dfd.to_json(orient='table'), orient='table')
print(dfd2)

print('\nis Json translation reversible ? ', dfd.equals(dfd2))

{"dates":{"0":1614556800000,"1":1614729600000}} 

{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"dates","type":"string"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":0,"dates":"2021-03-01T00:00:00.000"},{"index":1,"dates":"2021-03-03T00:00:00.000"}]} 

                     dates
0  2021-03-01T00:00:00.000
1  2021-03-03T00:00:00.000

is Json translation reversible ?  False


In [11]:
dfd = pd.DataFrame({'tuple': [(2021, 3, 1), (2021, 3, 3)]})
print(dfd, '\n')
print(dfd.to_json(), '\n')
print(dfd.to_json(orient='table'), '\n')
dfd2 = pd.read_json(dfd.to_json(orient='table'), orient='table')
print(dfd2)
print('\nis Json translation reversible ? ', dfd.equals(dfd2))

          tuple
0  (2021, 3, 1)
1  (2021, 3, 3) 

{"tuple":{"0":[2021,3,1],"1":[2021,3,3]}} 

{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"tuple","type":"string"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":0,"tuple":[2021,3,1]},{"index":1,"tuple":[2021,3,3]}]} 

          tuple
0  [2021, 3, 1]
1  [2021, 3, 3]

is Json translation reversible ?  False


- with categorical dtype, the underlying dtype is not included in json interface

In [12]:
df = pd.DataFrame(pd.Series([10,20], name='float', dtype='float64'), dtype='category')
print(df.to_json(orient='table'))

{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"float","type":"any","constraints":{"enum":[10.0,20.0]},"ordered":false}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":0,"float":10.0},{"index":1,"float":20.0}]}


### Data compactness
- json-table interface is not compact (in this example the size is double or triple the size of the compact format

In [13]:
tab_data = {'dates':           ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'], 
            'value':           [10, 10, 20, 20, 30, 30],
            'names':           ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],
            'unique':          [True, True, True, True, True, True] }

df = pd.DataFrame(tab_data)
print(df, '\n')

# length with compact interface
print(Ntv.obj(df).to_obj(encoded=True))
print(len(Ntv.obj(df).to_obj(encoded=True)), '\n')

# length with actual interface
print(df.to_json(orient='table'))
print(len(df.to_json(orient='table')), '\n')

        dates  value   names  unique
0  1964-01-01     10    john    True
1  1985-02-05     10    eric    True
2  2022-01-21     20  judith    True
3  1964-01-01     20    mila    True
4  1985-02-05     30  hector    True
5  2022-01-21     30   maria    True 

{":tab": {"index": [0, 1, 2, 3, 4, 5], "dates": ["1964-01-01", "1985-02-05", "2022-01-21", "1964-01-01", "1985-02-05", "2022-01-21"], "value": [10, 10, 20, 20, 30, 30], "names": ["john", "eric", "judith", "mila", "hector", "maria"], "unique": [true, true, true, true, true, true]}}
281 

{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"dates","type":"string"},{"name":"value","type":"integer"},{"name":"names","type":"string"},{"name":"unique","type":"boolean"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":0,"dates":"1964-01-01","value":10,"names":"john","unique":true},{"index":1,"dates":"1985-02-05","value":10,"names":"eric","unique":true},{"index":2,"dates":"2022-01-21","value":20,"names":"judit

In [14]:
tab_data = {'dates':           ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'], 
            'value':           [10, 10, 20, 20, 30, 30],
            'names':           ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],
            'unique':          [True, True, True, True, True, True] }

df = pd.DataFrame(tab_data, dtype='category')
print(df, '\n')

# length with compact interface
print(Ntv.obj(df).to_obj(encoded=True))
print(len(Ntv.obj(df).to_obj(encoded=True)), '\n')

# length with actual interface
print(df.to_json(orient='table'))
print(len(df.to_json(orient='table')), '\n')

        dates value   names unique
0  1964-01-01    10    john   True
1  1985-02-05    10    eric   True
2  2022-01-21    20  judith   True
3  1964-01-01    20    mila   True
4  1985-02-05    30  hector   True
5  2022-01-21    30   maria   True 

{":tab": {"index": [0, 1, 2, 3, 4, 5], "dates": [["1964-01-01", "1985-02-05", "2022-01-21"], [0, 1, 2, 0, 1, 2]], "value": [[10, 20, 30], [0, 0, 1, 1, 2, 2]], "names": [["eric", "hector", "john", "judith", "maria", "mila"], [2, 0, 3, 5, 1, 4]], "unique": [[true], [0, 0, 0, 0, 0, 0]]}}
285 

{"schema":{"fields":[{"name":"index","type":"integer"},{"name":"dates","type":"any","constraints":{"enum":["1964-01-01","1985-02-05","2022-01-21"]},"ordered":false},{"name":"value","type":"any","constraints":{"enum":[10,20,30]},"ordered":false},{"name":"names","type":"any","constraints":{"enum":["eric","hector","john","judith","maria","mila"]},"ordered":false},{"name":"unique","type":"boolean","constraints":{"enum":[true]},"ordered":false}],"primaryKey":["i

### Interface is reversible only with json dtype
- see previous examples

### External types
- the interface does not accept external types
- to integrate external types, it is necessary to first create ExtensionArray and ExtensionDtype objects

## 2 - Series

### Simple example

In [15]:
field_data = {'value': [1, 2, 3]}
field = Ntv.obj({':field': field_data})
sr = field.to_obj(format='obj')
# pandas dtype conform to Ntv type
print('pandas object :\n' + str(sr))
print('\nJson representation : \n    ', Ntv.obj(sr))
print('\nis Json translation reversible ? ', sr.equals(Ntv.obj(sr).to_obj(format='obj')))

pandas object :
0    1
1    2
2    3
Name: value, dtype: int64

Json representation : 
     {":field": {"value": [1, 2, 3]}}

is Json translation reversible ?  True


### Typed example

In [16]:
field_data = {'dates::datetime': ['1964-01-01', '1985-02-05', '2022-01-21']}
field = Ntv.obj({':field': field_data})
sr = field.to_obj(format='obj')
# pandas dtype conform to Ntv type
print('pandas object :\n', sr)
print('\nJson representation : \n    ', Ntv.obj(sr))
print('\nis Json translation reversible ? ', sr.equals(Ntv.obj(sr).to_obj(format='obj')))

pandas object :
 0   1964-01-01
1   1985-02-05
2   2022-01-21
Name: dates, dtype: datetime64[ns]

Json representation : 
     {":field": {"dates::datetime": ["1964-01-01T00:00:00.000", "1985-02-05T00:00:00.000", "2022-01-21T00:00:00.000"]}}

is Json translation reversible ?  True


### Examples with a non-Pandas type

In [17]:
field_data = {'dates::date': ['1964-01-01', '1985-02-05', '2022-01-21']}
field = Ntv.obj({':field': field_data})
sr = field.to_obj(format='obj')
# pandas dtype : object
print('pandas object :\n' + str(sr))
print('\nJson representation : \n    ', Ntv.obj(sr))
print('\nis Json translation reversible ? ', sr.equals(Ntv.obj(sr).to_obj(format='obj')))

pandas object :
0    1964-01-01
1    1985-02-05
2    2022-01-21
Name: dates::date, dtype: object

Json representation : 
     {":field": {"dates::date": ["1964-01-01", "1985-02-05", "2022-01-21"]}}

is Json translation reversible ?  True


In [18]:
field_data = {'coord::point':    [[1,2], [3,4], [5,6]]}
field = Ntv.obj({':field': field_data})
sr = field.to_obj(format='obj')
# pandas dtype : object
print('pandas object :\n' + str(sr))
print('\nJson representation : \n    ', Ntv.obj(sr))
print('\nis Json translation reversible ? ', sr.equals(Ntv.obj(sr).to_obj(format='obj')))

pandas object :
0    POINT (1 2)
1    POINT (3 4)
2    POINT (5 6)
Name: coord::point, dtype: object

Json representation : 
     {":field": {"coord::point": [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]}}

is Json translation reversible ?  True


  subarr = construct_1d_object_array_from_listlike(arr)


### Categorical examples
- available only with hashable data

In [19]:
field_data = {"integer": [[1, 2], [0, 1, 0, 1]]}
field = Ntv.obj({':field': field_data})
sr = field.to_obj(format='obj')
# pandas dtype : object
print('pandas object :\n' + str(sr))
print('\nJson representation : \n    ', Ntv.obj(sr))
print('\nis Json translation reversible ? ', sr.equals(Ntv.obj(sr).to_obj(format='obj')))

pandas object :
0    1
1    2
2    1
3    2
Name: integer, dtype: category
Categories (2, Int64): [1, 2]

Json representation : 
     {":field": {"integer": [[1, 2], [0, 1, 0, 1]]}}

is Json translation reversible ?  True


In [20]:
field_data = {'dates': [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [0, 1, 0, 2]]}
field = Ntv.obj({':field': field_data})
sr = field.to_obj(format='obj')
# pandas dtype : object
print('pandas object :\n' + str(sr))
print('\nJson representation : \n    ', Ntv.obj(sr))
print('\nis Json translation reversible ? ', sr.equals(Ntv.obj(sr).to_obj(format='obj')))

pandas object :
0    1964-01-01
1    1985-02-05
2    1964-01-01
3    2022-01-21
Name: dates::date, dtype: category
Categories (3, object): [1964-01-01, 1985-02-05, 2022-01-21]

Json representation : 
     {":field": {"dates": [{"::date": ["1964-01-01", "1985-02-05", "2022-01-21"]}, [0, 1, 0, 2]]}}

is Json translation reversible ?  True


In [21]:
field_data = {'test_array': [{'::array': [[1,2], [3,4], [5,6]]}, [0, 1, 0, 2]]}
field = Ntv.obj({':field': field_data})
sr = field.to_obj(format='obj')
# pandas dtype : object
print('pandas object :\n' + str(sr))
print('\nJson representation : \n    ', Ntv.obj(sr))
print('\nis Json translation reversible ? ', sr.equals(Ntv.obj(sr).to_obj(format='obj')))

pandas object :
0    (1, 2)
1    (3, 4)
2    (1, 2)
3    (5, 6)
Name: test_array::array, dtype: category
Categories (3, object): [(1, 2), (3, 4), (5, 6)]

Json representation : 
     {":field": {"test_array": [{"::array": [[1, 2], [3, 4], [5, 6]]}, [0, 1, 0, 2]]}}

is Json translation reversible ?  True


## 3 - DataFrame

### Initial example

In [22]:
df = pd.DataFrame({"A": list("abca"), "B": list("bccd")})

print('pandas dtype :\n' + str(df.dtypes))
print('\npandas object :\n' + str(df))
print('\nJson representation : \n    ', Ntv.obj(df))
print('\nis Json translation reversible ? ', df.equals(Ntv.obj(df).to_obj(format='obj')))

pandas dtype :
A    object
B    object
dtype: object

pandas object :
   A  B
0  a  b
1  b  c
2  c  c
3  a  d

Json representation : 
     {":tab": {"index": [0, 1, 2, 3], "A": ["a", "b", "c", "a"], "B": ["b", "c", "c", "d"]}}

is Json translation reversible ?  True


### Complete example
- index data
- Pandas dtype (int32, bool, string)
- NTV type (date, point) -> object dtype
- data unique

In [23]:
tab_data = {'index':           [100, 200, 300, 400, 500, 600],
            'dates::date':     ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'], 
            'value':           [10, 10, 20, 20, 30, 30],
            'value32::int32':  [12, 12, 22, 22, 32, 32],
            'res':             [10, 20, 30, 10, 20, 30],
            'coord::point':    [[1,2], [3,4], [5,6], [7,8], [3,4], [5,6]],
            'names::string':   ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],
            'unique':          True }
tab = Ntv.obj({':tab'  : tab_data})
df = tab.to_obj(format='obj')

print('pandas dtype :\n' + str(df.dtypes))
print('\npandas object :\n' + str(df))
print('\nJson representation : \n    ', Ntv.obj(df))
print('\nis Json translation reversible ? ', df.equals(Ntv.obj(df).to_obj(format='obj')))

pandas dtype :
dates::date     object
value            int64
value32          int32
res              int64
coord::point    object
names           string
unique            bool
dtype: object

pandas object :
    dates::date  value  value32  res coord::point   names  unique
100  1964-01-01     10       12   10  POINT (1 2)    john    True
200  1985-02-05     10       12   20  POINT (3 4)    eric    True
300  2022-01-21     20       22   30  POINT (5 6)  judith    True
400  1964-01-01     20       22   10  POINT (7 8)    mila    True
500  1985-02-05     30       32   20  POINT (3 4)  hector    True
600  2022-01-21     30       32   30  POINT (5 6)   maria    True

Json representation : 
     {":tab": {"index": [100, 200, 300, 400, 500, 600], "dates::date": ["1964-01-01", "1985-02-05", "2022-01-21", "1964-01-01", "1985-02-05", "2022-01-21"], "value": [10, 10, 20, 20, 30, 30], "value32::int32": [12, 12, 22, 22, 32, 32], "res": [10, 20, 30, 10, 20, 30], "coord::point": [[1.0, 2.0], [3.0, 4.0

  subarr = construct_1d_object_array_from_listlike(arr)
  subarr = construct_1d_object_array_from_listlike(arr)


### Json data can be annotated

In [24]:
tab_data = {'index':           [100, 200, 300, 400, 500, 600],
            'dates::date':     ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'], 
            'value':           [10, 10, 20, 20, {'valid?': 30}, 30],
            'value32::int32':  [12, 12, 22, 22, 32, 32],
            'res':             {'res1': 10, 'res2': 20, 'res3': 30, 'res4': 10, 'res5': 20, 'res6': 30},
            'coord::point':    [[1,2], [3,4], [5,6], [7,8], {'same as 2nd point': [3,4]}, [5,6]],
            'names::string':   ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],
            'unique':          True }
tab_data = {'index':           [100, 200, 300, 400, 500, 600],
            'dates::date':     ['1964-01-01', '1985-02-05', '2022-01-21', '1964-01-01', '1985-02-05', '2022-01-21'], 
            'value':           [10, 10, 20, 20, 30, 30],
            'value32::int32':  [12, 12, 22, 22, 32, 32],
            'res':             {'res1': 10, 'res2': 20, 'res3': 30, 'res4': 10, 'res5': 20, 'res6': 30},
            'coord::point':    [[1,2], [3,4], [5,6], [7,8], [3,4], [5,6]],
            'names::string':   ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],
            'unique':          True }
tab = Ntv.obj({':tab'  : tab_data})
df2 = tab.to_obj(format='obj', annotated=True)

print('is DataFrame identical ? ', df.equals(df2))

is DataFrame identical ?  True


  subarr = construct_1d_object_array_from_listlike(arr)


### Categorical data can be included

In [25]:
df = pd.DataFrame({"A": list("abca"), "B": list("bccd")}, dtype="category")

print('pandas dtype :\n' + str(df.dtypes))
print('\npandas object :\n' + str(df))
print('\nJson representation : \n    ', Ntv.obj(df))
print('\nis Json translation reversible ? ', df.equals(Ntv.obj(df).to_obj(format='obj')))

pandas dtype :
A    category
B    category
dtype: object

pandas object :
   A  B
0  a  b
1  b  c
2  c  c
3  a  d

Json representation : 
     {":tab": {"index": [0, 1, 2, 3], "A": [["a", "b", "c"], [0, 1, 2, 0]], "B": [["b", "c", "d"], [0, 1, 1, 2]]}}

is Json translation reversible ?  True


In [26]:
tab_data = {'index':           [100, 200, 300, 400, 500, 600],
            'dates':           [{'::date': ['1964-01-01', '1985-02-05', '2022-01-21']}, [0, 1, 2, 0, 1, 2]],
            'value':           [[10, 20, {'valid?': 30}], [0, 0, 1, 1, 2, 2]],
            'value32::int32':  [12, 12, 22, 22, 32, 32],
            'res':             {'res1': 10, 'res2': 20, 'res3': 30, 'res4': 10, 'res5': 20, 'res6': 30},
            'coord::point':    [[1,2], [3,4], [5,6], [7,8], {'same as 2nd point': [3,4]}, [5,6]],
            'names::string':   ['john', 'eric', 'judith', 'mila', 'hector', 'maria'],
            'unique:boolean':  True }
tab = Ntv.obj({':tab'  : tab_data})
df = tab.to_obj(format='obj', annotated=True)

print('pandas dtype :\n' + str(df.dtypes))
print('\npandas object :\n' + str(df))
print('\nJson representation : \n    ', Ntv.obj(df))
print('\nis Json translation reversible ? ', df.equals(Ntv.obj(df).to_obj(format='obj')))

pandas dtype :
dates::date        category
value              category
value32               int32
res                   int64
coord::point         object
names                string
unique::boolean        bool
dtype: object

pandas object :
    dates::date value  value32  res coord::point   names  unique::boolean
100  1964-01-01    10       12   10  POINT (1 2)    john             True
200  1985-02-05    10       12   20  POINT (3 4)    eric             True
300  2022-01-21    20       22   30  POINT (5 6)  judith             True
400  1964-01-01    20       22   10  POINT (7 8)    mila             True
500  1985-02-05    30       32   20  POINT (3 4)  hector             True
600  2022-01-21    30       32   30  POINT (5 6)   maria             True

Json representation : 
     {":tab": {"index": [100, 200, 300, 400, 500, 600], "dates": [{"::date": ["1964-01-01", "1985-02-05", "2022-01-21"]}, [0, 1, 2, 0, 1, 2]], "value": [[10, 20, 30], [0, 0, 1, 1, 2, 2]], "value32::int32": [12, 12, 2

  subarr = construct_1d_object_array_from_listlike(arr)
  subarr = construct_1d_object_array_from_listlike(arr)


## 4 - Annexe : Series tests

In [27]:
srs = [# without ntv_type
       pd.Series([{'a': 2, 'e':4}, {'a': 3, 'e':5}, {'a': 4, 'e':6}]),  
       pd.Series([[1,2], [3,4], [5,6]]),  
       pd.Series([[1,2], [3,4], {'a': 3, 'e':5}]),  
       pd.Series([True, False, True]),
       pd.Series(['az', 'er', 'cd']),
       pd.Series([1,2,3]),
       pd.Series([1.1,2,3]),
       pd.Series([math.nan, math.nan]), # bug conversion json : datetime NaT
       
       pd.Series([10,20,30], dtype='Int64'),
       pd.Series([True, False, True], dtype='boolean'),
       pd.Series([1,2,3], dtype='Int64'), 
       pd.Series([math.nan, math.nan], dtype='float64'), # bug conversion json : datetime NaT
    
       # with ntv_type only in json data
       pd.Series([pd.NaT, pd.NaT, pd.NaT]),
       pd.Series([datetime(2022, 1, 1), datetime(2022, 1, 2)], dtype='datetime64[ns]'),
       pd.Series(pd.to_timedelta(['1D', '2D'])),
       pd.Series([1,2,3], dtype='Int32'), 
       pd.Series(['az', 'er', 'cd'], dtype='string'), 
       pd.Series([1,2,3], dtype='UInt64'), # not reversible
    
       # with ntv_type in Seroes name and in json data
       pd.Series([1,2,3], name='::int64'),
       pd.Series([1,2,3], dtype='UInt64', name='::uint64'),   # name inutile
       pd.Series([1,2,3], dtype='Float64', name='::float64'), # force dtype dans la conversion json
       pd.Series([[1,2], [3,4], [5,6]], name='::array'),  
       pd.Series([None, None, None], name='::null'), 
       
       # with ntv_type unknown in pandas
       pd.Series([date(2022, 1, 1), date(2022, 1, 2)], name='::date'),
       pd.Series([Point(1, 0), Point(1, 1), Point(1, 2)], name='::point'),
       pd.Series([1,2,3], dtype='object', name='::day'), 

       #pd.Series([datetime(2022, 1, 1), datetime(2022, 1, 2), datetime(2022, 1, 3)], dtype='datetime64[ns, UTC]'), #à traiter
]
for sr in srs:
    print(sr.equals(Ntv.obj(sr).to_obj(format='obj')) or sr.equals(Ntv.obj(sr).to_obj(format='obj', alias=True)), 
          Ntv.obj(sr).to_obj(format='obj').name == sr.name,
          Ntv.obj(sr))


True True {":field": [{"a": 2, "e": 4}, {"a": 3, "e": 5}, {"a": 4, "e": 6}]}
True True {":field": [[1, 2], [3, 4], [5, 6]]}
True True {":field": [[1, 2], [3, 4], {"a": 3, "e": 5}]}
True True {":field": [true, false, true]}
True True {":field": ["az", "er", "cd"]}


  arr = construct_1d_object_array_from_listlike(values)


True True {":field": [1, 2, 3]}
True True {":field": [1.1, 2.0, 3.0]}
False True {":field": [null, null]}
True True {":field": [10, 20, 30]}
True True {":field": [true, false, true]}
True True {":field": [1, 2, 3]}
False True {":field": [null, null]}
False True {":field": {"::datetime": [{":json": null}, {":json": null}, {":json": null}]}}
True True {":field": {"::datetime": ["2022-01-01T00:00:00.000", "2022-01-02T00:00:00.000"]}}
True True {":field": {"::durationiso": ["P1DT0H0M0S", "P2DT0H0M0S"]}}
True True {":field": {"::int32": [1, 2, 3]}}
True True {":field": {"::string": ["az", "er", "cd"]}}
True True {":field": {"::uint64": [1, 2, 3]}}
True True {":field": {"::int64": [1, 2, 3]}}
True False {":field": {"::uint64": [1, 2, 3]}}
True True {":field": {"::float64": [1.0, 2.0, 3.0]}}
True True {":field": {"::array": [[1, 2], [3, 4], [5, 6]]}}
True True {":field": {"::null": [{":json": null}, {":json": null}, {":json": null}]}}
True True {":field": {"::date": ["2022-01-01", "2022-01-02

  subarr = construct_1d_object_array_from_listlike(arr)


In [28]:
for a in [{'test::int32': [1,2,3]},
          {'test': [1,2,3]},
          [1.0, 2.1, 3.0],
          ['er', 'et', 'ez'],
          [True, False, True],
          {'::boolean': [True, False, True]},
          {'::string': ['er', 'et', 'ez']},
          {'test::float32': [1.0, 2.5, 3.0]},
          {'::int64': [1,2,3]},
          {'::datetime': ["2021-12-31T23:00:00.000","2022-01-01T23:00:00.000"] },
          {'::object': [{'a': 3, 'e':5}, {'a': 4, 'e':6}]}]:
    ntv = Ntv.from_obj({':field': a})
    print(Ntv.obj(ntv.to_obj(format='obj')) == ntv, ntv)

True {":field": {"test::int32": [1, 2, 3]}}
True {":field": {"test": [1, 2, 3]}}
True {":field": [1.0, 2.1, 3.0]}
True {":field": ["er", "et", "ez"]}
True {":field": [true, false, true]}
True {":field": {"::boolean": [true, false, true]}}
True {":field": {"::string": ["er", "et", "ez"]}}
True {":field": {"test::float32": [1.0, 2.5, 3.0]}}
True {":field": {"::int64": [1, 2, 3]}}
True {":field": {"::datetime": ["2021-12-31T23:00:00.000", "2022-01-01T23:00:00.000"]}}
True {":field": {"::object": [{"a": 3, "e": 5}, {"a": 4, "e": 6}]}}


In [29]:
for a in [{'test': [{'::int32': [1,2,3]}, [0,1,2,0,1,2]]},
          {'test': [[1,2,3], [0,1,2,0,1,2]]},
          [[1.0, 2.1, 3.0], [0,1,2,0,1,2]],
          [['er', 'et', 'ez'], [0,1,2,0,1,2]],
          [[True, False], [0,1,0,1,0,1]],
          [{'::string': ['er', 'et', 'ez']}, [0,1,2,0,1,2]],
          {'test':[{'::float32': [1.0, 2.5, 3.0]}, [0,1,2,0,1,2]]},
          [{'::int64': [1,2,3]}, [0,1,2,0,1,2]],
          [{'::datetime': ["2021-12-31T23:00:00.000","2022-01-01T23:00:00.000"] }, [0,1,0,1,0,1]],
          [{'::boolean': [True, False]}, [0,1,0,1,0,1]],
          {'test_array': [{'::array': [[1,2], [3,4], [5,6]]}, [0, 1, 0, 2]]}]:
    ntv = Ntv.from_obj({':field': a}, def_type='json')
    print(Ntv.obj(ntv.to_obj(format='obj')) == ntv, ntv)


True {":field": {"test": [{"::int32": [1, 2, 3]}, [0, 1, 2, 0, 1, 2]]}}
True {":field": {"test": [[1, 2, 3], [0, 1, 2, 0, 1, 2]]}}
True {":field": [[1.0, 2.1, 3.0], [0, 1, 2, 0, 1, 2]]}
True {":field": [["er", "et", "ez"], [0, 1, 2, 0, 1, 2]]}
True {":field": [[true, false], [0, 1, 0, 1, 0, 1]]}
True {":field": [{"::string": ["er", "et", "ez"]}, [0, 1, 2, 0, 1, 2]]}
True {":field": {"test": [{"::float32": [1.0, 2.5, 3.0]}, [0, 1, 2, 0, 1, 2]]}}
True {":field": [{"::int64": [1, 2, 3]}, [0, 1, 2, 0, 1, 2]]}
True {":field": [{"::datetime": ["2021-12-31T23:00:00.000", "2022-01-01T23:00:00.000"]}, [0, 1, 0, 1, 0, 1]]}
True {":field": [{"::boolean": [true, false]}, [0, 1, 0, 1, 0, 1]]}
True {":field": {"test_array": [{"::array": [[1, 2], [3, 4], [5, 6]]}, [0, 1, 0, 2]]}}
