# JSON format for multidimensional data
version : 2024-03-11

## Introduction
This memo is a proposal to implement a compact and reversible (lossless round-trip) JSON interface for multidimensional data and in particular for Numpy (see issue #12481).

The [JSON-NTV](https://www.ietf.org/archive/id/draft-thomy-json-ntv-02.html) (Named and Typed value) format is a JSON format which integrates a notion of type .
    
In particular, it makes it possible to provide a reversible (lossless round-trip) interface for multidimensional data.
     
This format has also been implemented for [tabular data](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html) (see NTV-pandas package available in the [pandas ecosystem](https://pandas.pydata.org/community/ecosystem.html) and the [PDEP12](https://pandas.pydata.org/pdeps/0012-compact-and-reversible-JSON-interface.html) specification). .

This memo presents the implementation for multidimensional data.

## Benefits
The use of this format has the following advantages:

- Taking into account data types not known to Numpy,
- Reversible format
- Interoperability with other tools for tabular or multidimensional data (e.g. pandas, Xarray)
- Ease of sharing Json format
- Binary coding possible (e.g. CBOR format)
- Format integrating data of different nature


## Terminology

- darray (unidimensional array) is a ordered collection of 'items'. A 'darray' can be represented with several formats (e.g. simple list, categorical format, sparse format)
- ndarray (multidimensional array) is a N-dimensional array of homogeneous data types. A ndarray entity is defined by:
    - a darray of "items" of the same type
    - a shape defines the order and the length of indexes
    - a data type
    The darray is the flattened multidimensional data ordered with row_major order
- xndarray (labelled multidimensional array) is a ndarray defined by a name. A xndarray entity has optional additional data:
    - add_name : the name of a property (additional name to the ndarray name)
    - ntv_type : an extension of the ndarray data type
    - links : the names of indexes
    - metadata : json object metadata
    The ndarray can be included in the xndarray or only represented by a resolvable URI
    Xndarray data can be:
    
        - named-array : ndarray with name (without additional data)
        - variable : named-array with named indexes (links).
        - additional-array : named-array where name is extended (add_name)
- xdataset (coordinated multidimensional array) is a collection of xndarray. This collection can be interpreted as a simple group or as an interconnected collection (names are used as pointers between xndarray items).
In the context of a xdataset, xndarray data can be:

    - dimension : named-array where his name is present in links of a variable of the xdataset
    - data-array : named-array where his name is not present in links of a variable of the xdataset
    - data-var : variable where links equals the list of dimensions of the xdataset
    - coordinate : variable where links not equals the list of dimensions of the xdataset

 A xdataset is valid if :  
    - included Xndarray are valid
    - a names in links is the names of a xndarray
    - the shape of a variable xndarray is consistent with the shape of Xndarray defined in his links
 
  A xdataset is multidimensional if it is valid and contains more than one data-var.
  
  A xdataset is unidimensional if it is valid and contains a single data-var.
  
  In the other cases, a xdataset is a simple group of xndarray.

Example
- Numpy.array corresponds to ndarray
- Xarray.DataArray, Xarray.Dataset, scipp.DataArray, scipp.DataGroup, scipp.Dataset corresponds to xdataset

## NTV data
NTV format is a data representation with three attributes:
- NTVname (string)
- NTVtype (enumerate string)
- NTVvalue (JSON object) 

Two entities are defined:
- NTVlist : ordered list of entities
- NTVsingle : entity not composed with other entities

The JSON representation of NTVsingle entities is :
- value :
```json
    25, 'test', [1,2]
```

- name and value : 
```json
    {'test': 25}, {'test:': [1,2]}
```
- type and value :
```json
    {':day': 25}, {':point': [1,2]}
```
- type name and value : 
```json
    {'equinox:date': '2023-09-23'}, {'paris:point': [2.35, 48.86]}
```

The JSON representation of NTVlist entities is :

- { 'name_NTVlist:type_NTVlist': { JSON_entity1, ... JSONentityn } } if entities have JSON_member representation 
```json
    { 'example': {'equinox:date': '2023-09-23', 'paris:point': [2.35, 48.86] }}
```

- { 'name_NTVlist:type_NTVlist': [ JSON_entity1, ... JSONentityn ] } in the other cases
```json
    { 'example': [25, {'paris:point': [2.35, 48.86] }, 'test']}
```    

## NTV multidimensional data

### Multidimensional types
Multidimensional data is characterized by four NTVtype (see [Appendix](#Appendix---JSON-representation)):

- `darray` for darray data
- `ndarray` for ndarra data
- `xndarray` for xndarray data
- `xdataset` for xdataset data

The JSON representation of these NTVtype is defined below.

### JSON representation of `darray` data
JSON `darray`is a JsonArray 

The JSON representation is obtained by:

- Converting structure of Numpy data into JSON structure
- Conversion of elementary data into a JSON primitive,
- Mapping the Numpy dtype and the nature of the object into an NTV type

Example of formats available:

Simple format:
```json
[ "apple", "apple", "orange", "apple", "apple", "pepper", "banana", "apple" ]
```
Categorical format:
```json
[["orange","pepper","apple","banana"], [2, 2, 0, 2, 2, 1, 3, 2] ]
```
Sparse format (8: length, -1: default value):
```json
[["orange","pepper","banana", "apple"], [8], [2, 5, 6, -1]]
```
Periodic format (representation of [10, 10, 20, 20, 30, 30, 10, 10, 20, 20, 30, 30, 10, 10, 20, 20, 30, 30]

18: length, 2:repetition coefficient):
```json
[[10, 20, 30], [18], [2]]
```

### JSON representation of `ndarray` data
JSON `ndarray` is a JsonArray with three JsonElements (NTVtype, shape, darray) where darray is a JSON `darray`:

Example: 

ndarray with simple format
```json
["int32", [2, 2], [30, 40, 30, 40]]
```
ndarray with categorical format
```json
["int32", [2, 2], [[30, 40], [0, 1, 0, 1]]]
```
ndarray with sparse format
```json
["int32", [2, 2], [[30, 30, 40], [4], [0, 2, -1]]]
```
ndarray with implicit format (format defined by the linked 'x' ndarray)
```json
["int32", [[10, 20], 'x']]
```
ndarray with relative format (format defined by the linked 'x' ndarray)
```json
["int32", [[10, 20], 'x', [ 0, 1, 0, 0 ]]]
```
ndarray with extended ntv_type
```json
["int32[kg]", [1, 2, 3, 4]]
```
ndarray defined by a URI
```json
'https://raw.githubusercontent.com/loco-philippe/ntv-numpy/master/example/ex_ndarray.ntv'
```
Note:
- shape is optional with unidimensional data (deduced from darray)

### JSON representation of `xndarray` data

JSON `xndarray` is a JsonObject with a single JsonMember {'name': xvalue}.

xvalue is a JsonArray with three optional JsonElements 

    - ndarray: JSON representation or uri string,
    - links: JsonArray with the name of linked ndarray,
    - meta: JsonObject with metadata

If ndarray or links are note present, meta can be a single string.

`xndarray` entity can be metadata, ndarray, dimension, variable or additional ndarray:

metadata:
```json
{'unit': 'kg'}
{'meta': {'dict': 'everything'}}
```
named-array (data-array or dimension in the context of a dataset):
```json
{':xndarray': [['int64[kg]', [10, 20]]]}
{'y': [['string', [2], ['y1', 'y2']]]}
```
named-array with metadata:
```json
{'x': [[['x1', 'x2']], {'test': 21}]}
```
additional-array:
```json
{'x.mask': [[[True, False]]]}
{'x.uncertainty': [[[0.1, 0.2]]]}
{'z.variance': [[[0.1, 0.2]]]}
```
Variable (data-vars or coordinate in the context of a dataset):
```json
{'var2': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]], ['x', 'y']]}
{'z': [[['z1', 'z2']], ['x']]}
{'ranking': [[[2, 2], [10, 20, 20, 10]], ['var1']]}
{'ranking': [[[2, 2], [[10, 20], [0, 1, 1, 0]]], ['var1']]}
```
Variables (ndarray defined by an URI):
```json
{'var1': ['https://raw.githubusercontent.com/loco-philippe/ntv-numpy/master/example/ex_ndarray.ntv', ['x', 'y']]}
```

### JSON representation of `xdataset` data

JSON `xdataset` is a JsonObject where each JsonMember is a `xndarray`. 

Example:
```json
{
'var1': ['https://github.com/loco-philippe/ntv-numpy/tree/main/example/ex_ndarray.ntv', ['x', 'y']],
'var2': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]], ['x', 'y']],
'ranking': [[[2, 2], [1, 2, 3, 4]], ['var1']],
'x': [[['x1', 'x2']], {'test': 21}],
'y': [[['y1', 'y2']]],
'z': [[['z1', 'z2']], ['x']],
'z_bis': [[['z1_bis', 'z2_bis']]],
'x.mask': [[[True, False]]],
'x.uncertainty': [[[0.1, 0.2]], ['x']],
'z.variance': [[[0.1, 0.2]]],
'unit': 'kg',
'info': {'example': 'everything'}}
}
```
This example is a Xdataset group with :
- two dimensions ('x' and 'y')
- two data_vars ('var1' and 'var2') defined with the 'x' and 'y' dimensions 
- one data_array ('z_bis')
- two coordinates ('z' and 'ranking'). The 'ranking' coordinate is associated to 'x' and 'y' dimensions
- two metadata ('unit' and 'info')
- two additional Xndarray for 'x' dimension ('uncertainty' and 'mask')
- one additional Xndarray for 'z' coordinate ('variance')

If 'var1' and 'z_bis' xndarray are removed, the xdataset becomes a unidimensional xdataset.


## Numpy Ndarray
Numpy Ndarray are represented with `ndarray` NTVtype with two cases:

- mapping: Numpy dtype is associated with a NTVtype
- conversion: python type of data is associated with a NTVtype

These functions are similar to those used for the JSON-NTV representation of Pandas data ([see pandas examples](https://nbviewer.org/github/loco-philippe/ntv-pandas/blob/main/example/example_ntv_pandas.ipynb)).

### Mapping
This is the most frequent case and concerns Numpy dtype:

- datetime[x] (D, Y, M, s, ms, us, ns, ps, fs)
- timedelta[x] (D, Y, M, s, ms, us, ns, ps, fs)
- bool, bytes, str
- int, uint (8,16,32,64)
- float (16, 32, 64)

For these dtype, the conversion is reversible.

In [1]:
import numpy as np
from numpy_ntv_connector import read_json, to_json

ex = np.array([10, 10, 20, 10, 30, 50]).astype('int64').reshape((2, 3))

print("example (with and without dtype) :\n")
print(to_json(ex))
print(to_json(ex, extension='kg'))
print(to_json(ex, notype=True))
print(to_json(ex, format='complete'))

ex2 = read_json(to_json(ex), header=False)
print("\nreversibility : ", np.array_equal(ex2, ex))

print("\nother examples :\n")
print(to_json(np.array(['test1', 'test2'], dtype='str')))
print(to_json(np.array(['2022-01-01', '2023-01-01'], dtype='datetime64[D]')))
print(to_json(np.array(['2022', '2023'], dtype='datetime64[Y]')))
print(to_json(np.array([b'abc\x09', b'abc'], dtype='bytes')))
print(to_json(np.array([True, False], dtype='bool')))

ImportError: Error importing numpy: you should not try to import numpy from
        its source directory; please exit the numpy source tree, and relaunch
        your python interpreter from there.

### Conversion
This case concerns Numpy dtype 'object' with specific Python data :

- datetime.time
- decimal.Decimal, 
- shapely classes (Point, LineString, Polygon, geojson, geometry)
- list, dict, NoneType, bytes
- ndarray, xndarray, xdataset, field, tab, ntv

For these Python classes, the conversion is reversible.

In [2]:
from datetime import time
from decimal import Decimal
from shapely.geometry import Point
from json_ntv import NtvSingle, Ntv
import pandas as pd

ex = np.array([time(10, 2, 3), time(20, 2, 3)])

print("example (with and without dtype) :\n")
print(to_json(ex))

ex2 = read_json(to_json(ex), header=False)
print("\nreversibility : ", np.array_equal(ex2, ex))

print("\nother examples (len = 2) :\n")
print(to_json(np.array([Decimal('10.5'), Decimal('20.5')])))
print(to_json(np.array([Point([1,2]), Point([3,4])])))
print(to_json(np.array([None, None])))
print(to_json(np.array([{'one':1}, {'two':2}])))
print(to_json(np.fromiter([[1,2], [3,4]], dtype='object')))
print(to_json(np.fromiter([np.array([1, 2], dtype='int64'), 
                           np.array(['test1', 'test2'], dtype='str')], dtype='object')))
print(to_json(np.fromiter([Ntv.obj({':point':[1,2]}), NtvSingle(12, 'noon', 'hour')], dtype='object')))
print(to_json(np.fromiter([pd.Series([1,2,3]), pd.Series([4,5,6])], dtype='object')))

example (with and without dtype) :

{':ndarray': ['time', ['10:02:03', '20:02:03']]}

reversibility :  True

other examples (len = 2) :

{':ndarray': ['decimal64', [10.5, 20.5]]}
{':ndarray': ['point', [[1.0, 2.0], [3.0, 4.0]]]}
{':ndarray': ['null', [None, None]]}
{':ndarray': ['object', [{'one': 1}, {'two': 2}]]}
{':ndarray': ['array', [[1, 2], [3, 4]]]}
{':ndarray': ['ndarray', [['int64', [1, 2]], ['string', ['test1', 'test2']]]]}
{':ndarray': ['NtvSingle', [{":point": [1, 2]}, {"noon:hour": 12}]]}
{':ndarray': ['field', [[1, 2, 3], [4, 5, 6]]]}


### Other NTVtype
The JSON data with other NTVtype can be converted in ndarray but this conversion is not reversible. 

The reversibility is obtained with Xndarray entities.

This case concerns :

- data with type extension (e.g. number with unit : 'int64[kg]')
- data with generic format (NTVtype : json, int, float, number)
- number with semantic value (NTVtype : month, day, wday, yday, week, hour, minute, second)
- string with semantic value (NTVtype : base16, base32, base64, period, duration, jpointer, uri, uriref, iri, iriref, email, regex, hostname, ipv4, ipv6, file, geojson)
- object with semantic value (geometry, timearray)
- data with type from an other namespace (e.g. schemaorg type : 'org.propertyID')

In [3]:
print('example of ndarray:\n')
print(read_json(['int64[kg]', [1, 2, 3, 4]]))
print(read_json(['month', [1, 2, 3, 4]]))
print(read_json(['json', [1, 'two', {'three': 3}]]))
print(read_json(['email', ['John Doe <jdoe@mac.example>', 'Anna Doe <adoe@mac.example>']]))

example of ndarray:

[1 2 3 4]
[1 2 3 4]
[1 'two' {'three': 3}]
['John Doe <jdoe@mac.example>' 'Anna Doe <adoe@mac.example>']


## Xndarray
Xndarray is the entity associated to `xndarray` NTVtype.

The `ndarray` entity included is a Numpy ndarray.


### Metadata

In [4]:
from ntv_numpy import Xndarray

print("example with only metadata :\n")
print(Xndarray.read_json({'unit': 'kg'}))
print(Xndarray.read_json({'meta': {'everything': 1}}))

print("\nreversibility :", to_json(Xndarray.read_json({'unit': 'kg'}), header=False) == {'unit': 'kg'})


example with only metadata :

{"unit:xndarray": "kg"}
{"meta:xndarray": {"everything": 1}}

reversibility : True


### Simple ndarray

In [5]:
print("example with Numpy ndarray :\n")
xn = Xndarray(nda=np.array([1, 'two', {'three': 3}], dtype='object'), ntv_type='json')
print(xn.to_json())

print("\nreversibility : ", Xndarray.read_json(xn.to_json()) == xn)

print('\nexample of ndarray:\n')
print(Xndarray(nda=np.array([1, 2, 3, 4], dtype='object'), ntv_type='int64[kg]'))
print(Xndarray(nda=np.array([[1, 2], [3, 4]], dtype='object'), ntv_type='int'))
print(Xndarray(nda=np.array([1, 2, 3, 4], dtype='object'), ntv_type='month'))
print(Xndarray(nda=np.array(['1F23', '236A5E'], dtype='object'), ntv_type='base16'))
print(Xndarray(nda=np.array(['P3Y6M4DT12H30M5S'], dtype='object'), ntv_type='duration'))
print(Xndarray(nda=np.array(['P3Y6M4DT12H30M5S'], dtype='object'), ntv_type='duration'))
print(Xndarray(nda=np.array(['geo:13.4125,103.86673'], dtype='object'), ntv_type='uri'))
print(Xndarray(nda=np.array(['192.168.1.1'], dtype='object'), ntv_type='ipv4'))
print(Xndarray(nda=np.array(['John Doe <jdoe@mac.example>', 'Ann Doe <adoe@mac.example>'], dtype='object'), ntv_type='email'))

example with Numpy ndarray :

{':xndarray': [['json', [1, 'two', {'three': 3}]]]}

reversibility :  True

example of ndarray:

{":xndarray": [["int64[kg]", [1, 2, 3, 4]]]}
{":xndarray": [["int", [2, 2], [1, 2, 3, 4]]]}
{":xndarray": [["month", [1, 2, 3, 4]]]}
{":xndarray": [["base16", ["1F23", "236A5E"]]]}
{":xndarray": [["duration", ["P3Y6M4DT12H30M5S"]]]}
{":xndarray": [["duration", ["P3Y6M4DT12H30M5S"]]]}
{":xndarray": [["uri", ["geo:13.4125,103.86673"]]]}
{":xndarray": [["ipv4", ["192.168.1.1"]]]}
{":xndarray": [["email", ["John Doe <jdoe@mac.example>", "Ann Doe <adoe@mac.example>"]]]}


### Named-ndarray

In [6]:
print("example with named ndarray :\n")
xn = Xndarray('example', nda=np.array(['x1', 'x2']))
print(xn.to_json())

print("\nreversibility : ", Xndarray.read_json(xn.to_json()) == xn)

example with named ndarray :

{'example:xndarray': [['string', ['x1', 'x2']]]}

reversibility :  True


### Additional-ndarray

In [7]:
print("example with additional ndarray :\n")
xn = Xndarray('x.mask', nda=np.array([True, False]))
print(xn.to_json())

print("\nreversibility : ", Xndarray.read_json(xn.to_json()) == xn)

example with additional ndarray :

{'x.mask:xndarray': [['boolean', [True, False]]]}

reversibility :  True


### Variable

In [8]:
print("example with variable :\n")
xn = Xndarray('var2', nda=np.array([10.1, 0.4, 3.4, 8.2]).reshape([2,2]), ntv_type='float[kg]', links = ['x', 'y'])
print(xn.to_json())

print("\nreversibility : ", Xndarray.read_json(xn.to_json()) == xn)

example with variable :

{'var2:xndarray': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]], ['x', 'y']]}

reversibility :  True


### URI Variable

In [9]:
print("example with URI variable :\n")
xn = Xndarray('var1', uri='https://raw.githubusercontent.com/loco-philippe/ntv-numpy/master/example/ex_ndarray.ntv', 
              links = ['x', 'y'])
print(xn.to_json())

print("\nreversibility : ", Xndarray.read_json(xn.to_json()) == xn)

example with URI variable :

{'var1:xndarray': ['https://raw.githubusercontent.com/loco-philippe/ntv-numpy/master/example/ex_ndarray.ntv', ['x', 'y']]}

reversibility :  True


## Xdataset
Xdataset is the entity associated to `xdataset` NTVtype.

The `xndarray` entities included are Xndarray.


### Simple example

In [10]:
from ntv_numpy import Xdataset

xn1 = Xndarray.read_json({'x': [['string', ['x1', 'x2']]]})
xn2 = Xndarray.read_json({'z': [['string', ['z1', 'z2']], ['x']]})

xd = Xdataset([xn1, xn2])

print(xd.to_json())
print("\nreversibility : ", Xdataset.read_json(xd.to_json()) == xd)

{':xdataset': {'x': [['string', [2], ['x1', 'x2']]], 'z': [['string', [2], ['z1', 'z2']], ['x']]}}

reversibility :  True


### Complete example

In [18]:
from pprint import pprint

example = {'test': {
                'var1': ['https://github.com/loco-philippe/ntv-numpy/tree/main/example/ex_ndarray.ntv', 
                         ['x', 'y']],
                'var2': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]], ['x', 'y']],
                'ranking': [[[2, 2], [1, 2, 3, 4]], ['var2']],
                'x': [[['x1', 'x2']], {'test': 21}],
                'y': [[['y1', 'y2']]],
                'z': [[['z1', 'z2']], ['x']],
                'z_bis': [[['z1_bis', 'z2_bis']]],
                'x.mask': [[[True, False]], ['x']],
                'x.variance': [[[0.1, 0.2]], ['x']],
                'z.variance': [[[0.1, 0.2]], ['x']],
                'unit': 'kg',
                'info': {'example': 'everything'}}}
xd = Xdataset.read_json(example)        

pprint(xd.to_json())
print("\nreversibility : ", Xdataset.read_json(xd.to_json()) == xd)

{'test:xdataset': {'info': {'example': 'everything'},
                   'ranking': [['int32', [2, 2], [1, 2, 3, 4]], ['var2']],
                   'unit': 'kg',
                   'var1': ['https://github.com/loco-philippe/ntv-numpy/tree/main/example/ex_ndarray.ntv',
                            ['x', 'y']],
                   'var2': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]],
                            ['x', 'y']],
                   'x': [['string', [2], ['x1', 'x2']], {'test': 21}],
                   'x.mask': [['boolean', [2], [True, False]], ['x']],
                   'x.variance': [['float64', [2], [0.1, 0.2]], ['x']],
                   'y': [['string', [2], ['y1', 'y2']]],
                   'z': [['string', [2], ['z1', 'z2']], ['x']],
                   'z.variance': [['float64', [2], [0.1, 0.2]], ['x']],
                   'z_bis': [['string', [2], ['z1_bis', 'z2_bis']]]}}

reversibility :  True


### Analysis
The `info` property return a dict with the properties of a Xdataset:
- name : name of the Xdataset
- xtype : meta (only metadata), mono (unidimensional array), multi (multidimensional array), group (collection of Xndarray)
- data-vars : list of the names of data-var Xndarray
- data-arrays : list of the names of data-arrays Xndarray
- dimensions : list of the names of dimension Xndarray
- coordinates : list of the names of coordinate Xndarray
- additionals : list of the names of additional Xndarray
- metadata : list of the names of metadata
- validity : undefined (xtype of a Xndarray has value 'relative' or 'inconsistent'), inconsistent ('shape' or 'links' of a variable is inconsistent), valid (other cases)
- width : number of Xndarray included

In [12]:
xd.info

{'name': 'test',
 'xtype': 'group',
 'data_vars': ['var1', 'var2'],
 'data_arrays': ['z_bis'],
 'dimensions': ['x', 'y'],
 'coordinates': ['ranking', 'z'],
 'additionals': ['x.mask', 'x.variance', 'z.variance'],
 'metadata': ['info', 'unit'],
 'validity': 'undefined',
 'width': 12}

In [19]:
del(xd[('var1')])
xd.info

{'name': 'test',
 'xtype': 'mono',
 'data_vars': ['var2'],
 'data_arrays': ['z_bis'],
 'dimensions': ['x', 'y'],
 'coordinates': ['ranking', 'z'],
 'additionals': ['x.mask', 'x.variance', 'z.variance'],
 'metadata': ['info', 'unit'],
 'validity': 'valid',
 'length': 2,
 'width': 11}

## Using in NTV data
The NTV format makes it possible to group data of different types in the same structure or in specific structure ([see NTV overview](https://nbviewer.org/github/loco-philippe/NTV/blob/main/example/example_ntv.ipynb)).

### Include multidimensional data in NTV structure

In [14]:
from json_ntv import NtvList

nd = np.array([1, 2, 3, 4])
xn1 = Xndarray('example', nda=np.array(['x1', 'x2']))
xn2 = Xndarray.read_json({'z': [['string', ['z1', 'z2']], ['x']]})
xd = Xdataset([xn1, xn2], 'test')

print('example NTVsingle\n')
print(NtvSingle(nd))

print('\nexample NTVlist\n')
print(NtvList([nd, xn1, xd]))

print('\nexample with mixed data (Json representation)\n')
mixte = Ntv.obj({'mixed': [xn1, {'coordinate':Point(1,2)}, {'pandas series': pd.Series([1,2,3])}]})
print(mixte)

print('\nexample with mixed data (object representation)\n')
pprint(mixte.to_obj(format='obj'), width=150)

example NTVsingle

{":ndarray": ["int32", [1, 2, 3, 4]]}

example NTVlist

{":ndarray": ["int32", [1, 2, 3, 4]], ":xndarray": [["string", ["x1", "x2"]]], ":xdataset": {"test:xdataset": {"example": [["string", ["x1", "x2"]]], "z": [["string", ["z1", "z2"]], ["x"]]}}}

example with mixed data (Json representation)

{"mixed": {":xndarray": [["string", ["x1", "x2"]]], "coordinate:point": [1.0, 2.0], "pandas series:field": [1, 2, 3]}}

example with mixed data (object representation)

{'mixed': [Xndarray[], {'coordinate': <POINT (1 2)>}, {'pandas series': 0    1
1    2
2    3
dtype: int64}]}


### Include multidimensional data in other objects

In [15]:
sr = pd.Series([1, 2, nd, xn1])
mixin = Ntv.obj({'mixin': sr})
print(mixin)

{"mixin:field": [1, 2, {":ndarray": ["int32", [1, 2, 3, 4]]}, {":xndarray": [["string", ["x1", "x2"]]]}]}


## Equivalence of tabular format and multi-dimensional format
The conversion between the two tabular and multi-dimensional formats is simple and lossless.

We can therefore share data between a tabular tool and a multidimensional tool via this format.

#### Format conversion

In [16]:
from numpy_ntv_connector import to_json_tab, read_json_tab

print("example without axes :\n")
pprint(to_json(ex))
pprint(to_json_tab(ex))

"""print("\nexample with axes :\n")
pprint(to_json(a, add=add), width=150)
pprint(to_json_tab(a, add), width=150)"""

example without axes :

{':ndarray': ['time', ['10:02:03', '20:02:03']]}
{':tab': {'data::object': [datetime.time(10, 2, 3), datetime.time(20, 2, 3)],
          'dim_0': [[0, 1], [1]]}}


'print("\nexample with axes :\n")\npprint(to_json(a, add=add), width=150)\npprint(to_json_tab(a, add), width=150)'

#### Compatibility with tabular tools

In [17]:
import ntv_pandas as npd

print('pandas DataFrame :')
ex_df = npd.read_json(to_json_tab(ex))
display(ex_df)

a_df = npd.read_json(to_json_tab(a, add))
display(a_df)
print('dtypes:\n' + str(a_df.dtypes))

pandas DataFrame :


NtvError: ntv_value is not compatible with ntv_type

### tabular data to multidimensional data
Tabular data can be analysed to identify the dimension and the fields associated to each axe

In [None]:
analys = a_df.npd.analysis(True)
print('dimension:\n', analys.dimension)
partition = analys.field_partition(mode='id')
print('\npartition:\n', partition)

'primary' fields are converted into axes.

The shape is deduced from the length of axes (categorical format)

In [None]:
a_df_sort = a_df.sort_values(partition['primary'])
a3, add3 = read_json_tab(a_df_sort.npd.to_json(header=False, index=False))

print("\nreversibility :\n")
print(np.array_equal(a3, a), Ntv.obj(add3) == Ntv.obj(add))

## Astropy specific points
This chapter presents some points related to the Astropy data structure that can be integrated into the JSON-NTV format.

### Units and quantities
- 'unit' is a specif type
- three options are available for quantities:

    - option 1 : add specific types including unit
    - option 2 : add unit as type extension for existing types
    - option 3 : including unit in the name

- Option 2 is retained 

    - This option can be extended to other usages. For example:
    
    ```
    {"comment:string[fr]": "Paris est une belle ville"}
    ```

    - This option is compatible with NTV structure. For example 
    
    ```
    {"list_of_ndarray::ndarray[kg]": { "array1": [1, 2, 3, 4], "array2": [5, 6, 7, 8]}}
    ```

In [None]:
ntv = Ntv.obj({"list_of_ndarray::ndarray[kg]": { "array1": [[2, 2], [1, 2, 3, 4]], "array2": [[2, 2], [5, 6, 7, 8]]}})
print('json representation :\n', ntv[0])
print('\nNdarray representation :\n', ntv[0].to_obj(format='obj', type=True))

Example Unit

```
{'mass:unit': 'kg'}
```
    
Example Quantities

```
{'ex_simple:[m/s]':            0.47}
{'ex_simple_typ:float64[m/s]': 0.47}
{'ex_array:ndarray[m/s]':      [2., 2.5, 3., 3.5, 4., 4.5, 5.]}
{'ex_array_typ:ndarray[m/s]':  ['float64', [2., 2.5, 3., 3.5, 4., 4.5, 5.]]}
```

### Coordinates
The existing 'point' type (and also the other types: pointstr, line, polygon, multipolygon, box...) can be used with the coordinate object (perhaps with a type extension). e.g.

```
{':point[icrs]' : [ 10.625, 41.2] }
```

### Table
The `Table` or `Qtable` object is represented using the `tab` format dedicated to tabular structures. 

```
{'ex_table:tab': {'x:string': ['x1', 'x1', 'x2', 'x2'], 
                  'y:string': ['y1', 'y2', 'y1', 'y2'], 
                  'value:float64[kg]': [1.0, 2.0, 3.0, 4.0],
                  'meta': 'everything'}
```

This ensures interoperability between tabular tools (e.g. Pandas).


### Other structures

The other structures were not examined. They can be integrated using the following tools:

- Addition of new types: for types having a transversal character
- Adding type via a Namespace: for types associated with specific data (e.g. “astro.xxx)
- Definition of an imposed structure for a given type
- Using specific type extension


## Appendix - JSON representation

### Multidimension
Multidimensional data is expressed with the JSON structure as a nested JSON-array. 
This representation has an drawback: It is not possible to include objects represented by an array in this structure.

    e.g. What is the JSON representation of this 1-dimension ndarray : array([list([1, 2]), list([3, 4])], dtype=object) ? 

A first solution is to use a JSON-object instead of a JSON-array :

```json
    [{":array":[1, 2]}, {":array":[3, 4]}]
```

A second solution is to convert an array to a string :

```json
    ["[1, 2]", "[3, 4]"]
```

A third solution is to represent the shape and the flattened array :

```json
    [[2], [1, 2, 3, 4]]
```       


### Type representation

Another constraint is to represent the data type. Three options :

option 1 : use NTV representation  -> The examples above will become

```json
    {":int32": [{":array":[1, 2]}, {":array":[3, 4]}]}
    {":int32": ["[1, 2]", "[3, 4]"]}
    {":int32": [[2], [1, 2, 3, 4]]}
```

option 2 : use JSON-array representation  -> The examples above will become

```json
    ["int32", [{":array":[1, 2]}, {":array":[3, 4]}]]
    ["int32", ["[1, 2]", "[3, 4]"]]
    ["int32", [2], [1, 2, 3, 4]]
```

option 3 : use JSON_object representation  -> The examples above will become

```json
    {"type": "int32", "data": [{":array":[1, 2]}, {":array":[3, 4]}]]}
    {"type": "int32", "data": ["[1, 2]", "[3, 4]"]]}
    {"type": "int32", "shape": [2], "data": [1, 2, 3, 4]]]
```


### Additional data

Multidimensional data is associated with complementary data :
- Name of axes (one axis per dimension)
- Axis values (optional)
- Additional variables associated with one or more axes (optional)
- Metadata (optional)
- name (optional)

The JSON representation must take this additional data into account.

Two solutions :
- solution 1 : structure multi-dimensional data + additional data
```json
 {'data'   : ['int32', [2, 3, 2, 2], 
              [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]], 
  'dims'   : ['x', 'y', 'z', 'option'],
  'coords': {'x': [['x1', 'x2']] , 
             'y': [['y1', 'y2', 'y3']],
             'z': ['string', ['z1', 'z2']],
             'option': [[True, False]],
             'xy': {'dims': ['x','y'], 'data': [[2,3], ['x1y1','x1y2','x1y3','x2y1','x2y2','x2y3']]}, 
             'opt_num': {'dims': ['option'], 'data': [[0, 1]]}},
   'attrs'  : {'meta': 'everything'}}
```
- solution 2 : unique structure for multi-dimensional data and additional data
```json
 {'type'   : 'int32',
  'shape'  : [2, 3, 2, 2],
  'data'   : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24], 
  'dims'   : ['x', 'y', 'z', 'option']
  'dimsco' : [['x1', 'x2'],['y1', 'y2', 'y3'], ['z1', 'z2'], [True, False]],
  'coords' : {'xy':{'dims': ['x', 'y'], 'data': [[2,3], ['x1y1', 'x1y2', 'x1y3', 'x2y1', 'x2y2', 'x2y3']]}, 
              'opt_num': {'dims': 'option', 'data': [0, 1]}},
  'attrs'  : {'meta': 'everything'}}
```

### NTV representation

Both structures can be represented with 
- NTVtype : `ndarray` or `xndarray`
- NTVname : name of the structure (optional)
- NTVvalue : JSON structure of `ndarray` or `xndarray`

Example:

```json
{ ':ndarray': ndarray_structure }
{ 'test:xndarray': xndarray_structure }
```

### Structure retained

The proposal is to retain two structures:
- `ndarray` to represent the data without additional data. This structure is the simplest and most compact. 

    `ndarray` is a JSON-array composed of :
    - values of data (JSON-array of flattened data)
    - shape of the data (JSON-array of axis length) - optional if the dimension is 1
    - type of data (string) - optional if the type is implicit

 ```json
              ["int32", [2, 2], [1, 2, 3, 4]]
              ["int32", [1, 2, 3, 4]]
              [[2, 2], [1, 2, 3, 4]]
              [[1, 2, 3, 4]]
 ```
 
- `xndarray` to represent data and additional data. This structure completes the `ndarray` structure.

    `xndarray` is a JSON-object composed of :

    - `data` (JSON-ndarray): values of data (see above)
    - `dims` (JSON-array): Name of axis (one axe per dimension) - optional
    - `coords` (JSON-object of `ndarray` or `xndarray`): Additional variables associated with one or more axis - optional
    - `attrs` (JSON-object): Metadata - optional
    - `name` (string): name - optional
        
 ```json
{'example1:ndarray':
   ['int32', [2, 3, 2], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
}
{'example2:xndarray':
    {'data'  : ['int32', [2, 3, 2], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]], 
     'dims'   : ['x', 'y', 'option'],
     'coords' : {
         'x': [['x1', 'x2']] , 
         'y': ['string', ['y1', 'y2', 'y3']],
         'option': [[True, False]],
         'xy': {'dims': ['x','y'], 'data': [[2,3], ['x1y1','x1y2','x1y3','x2y1','x2y2','x2y3']]}, 
         'opt_num': {'dims': ['option'], 'data': [[0, 1]]}},
     'attrs'  : {'meta': 'everything'}}
}
```