# Neutral format for multidimensional data
-------    
version : 2024-03-26

## Introduction

This memo is a proposal to implement a neutral format for multidimensional data. 

The proposed format is based on the following principles:

- neutral format available for tabular or multidimensional tools (e.g. Numpy, pandas, xarray, scipp),
- equivalence between tabular format and multidimensional format,
- Taking into account a wide variety of data types as defined in NTV format,
- reversible (lossless round-trip) interface with multidimensional tools,
- reversible and compact JSON format (including categorical and sparse format),
- Ease of sharing and exchanging multidimensional data,

It follows :

- the definition of the [JSON-NTV](https://www.ietf.org/archive/id/draft-thomy-json-ntv-02.html) format (Named and Typed value) which integrates a notion of type in JSON format (see [JSON-NTV package](https://pypi.org/project/json-ntv/)).
- its variation for tabular data ( [NTV-TAB](https://www.ietf.org/archive/id/draft-thomy-ntv-tab-00.html) specification) and its implementation for pandas ([NTV-pandas package](https://pypi.org/project/ntv-pandas/) available in the [pandas ecosystem](https://pandas.pydata.org/community/ecosystem.html) and the [PDEP12](https://pandas.pydata.org/pdeps/0012-compact-and-reversible-JSON-interface.html))
- analysis of tabular structures to identify multi-dimensional data ([TAB-analysis](https://pypi.org/project/tab-analysis/) package)

## Contents
*(active links on Jupyter Notebook or [Nbviewer](http://nbviewer.org/github/loco-philippe/Environmental-Sensing/tree/main/python/Validation/irve/Analyse/IRVE_indicateurs.ipynb))*
- [Introduction](#Introduction)
- [Contents](#Contents)
- [Benefits](#Benefits)
- [Data structure](#Data-structure)
    - [Terminology](#Terminology)
    - [JSON example](#JSON-example)
- [Interoperability](#Interoperability)
    - [Xarray interoperability](#Xarray-interoperability)
    - [Scipp interoperability](#Scipp-interoperability)
- [Exchanging and sharing data](#Exchanging-and-sharing-data)
    - [Neutral format](#Neutral-format)
    - [URI usage](#URI-usage)
    - [Summary](#Summary)

      
- [Appendix - NTV data](#Appendix---NTV-data)
- [Appendix - JSON representation](#Appendix---JSON-representation)
    - [Multidimensional NTVtype](#Multidimensional-NTVtype)
    - [JSON darray data](#JSON-darray-data)
    - [JSON ndarray data](#JSON-ndarray-data)
    - [JSON xndarray data](#JSON-xndarray-data)
    - [JSON xdataset data](#JSON-xdataset-data)
- [Appendix - Python entities](#Appendix---Python-entities)
    - [Ndarray and Numpy ndarray](#Ndarray-and-Numpy-ndarray)
    - [Xndarray](#Xndarray)
    - [Xdataset](#Xdataset)
- [Appendix - Using in NTV data](#Appendix---Using-in-NTV-data)
    - [Including in NTV structure](#Including-in-NTV-structure)
    - [Including in other objects](#Including-in-other-objects)
- [Appendix - Equivalence with tabular format](#Appendix---Equivalence-with-tabular-format)
- [Appendix - Astropy specific points](#Appendix---Astropy-specific-points)

## Benefits

The use of this format has the following advantages:

- neutral format available for tabular or multidimensional tools (e.g. Numpy, pandas, xarray, scipp),
- Taking into account a wide variety of data types as defined in NTV format,
- High level of Interoperability between tools
- reversible and compact JSON format (lossless round-trip, categorical and sparse format, binary coding structure mixing)
- Ease of sharing multi-dimensional data

## Data structure
The data structure includes the structures defined by other multi-dimensional tools (e.g. variables, dimensions, coordinates, variances, masks, units, metadata).

We distinguish:
- elementary data: made up of ordered collections and which can be present in specific formats (darray, ndarray)
- structural data: represented with JSON format (xndarray, xdataset)

This data structure makes it possible to build reversible interfaces and have interoperable tools

### Terminology

- **darray (unidimensional array)** is an ordered collection of 'items' of the same type. A 'darray' can be represented with several formats (e.g. simple list, categorical format, sparse format)
- **ndarray (multidimensional array)** is a N-dimensional array of homogeneous data types. A ndarray entity is defined by:   
    - a darray (flattened multidimensional data ordered with row_major order) or an URI (string location of data)
    - a shape (order and length of axes)
    - a NTVtype (semantic data type)
    
    ndarray can be *absolute* (defined by a darray) or *relative* (defined by a resolvable URI).
- **xndarray (labelled multidimensional array)** is a ndarray defined by a name. A xndarray entity has optional additional data:
    - add_name : the name of a property (additional name to the ndarray name)
    - links : the names of linked xndarray
    - metadata : Json-object metadata
    
    xndarray data can be: 
    - *metadata* : the ndarray is empty (None), add_name and links are not present
    - *named-array* : without additional data
    - *variable* : named-array with links and without add_name (e.g. dims).
    - *additional-array* : named-array where name is extended (add_name is present)
- **xdataset (coordinated multidimensional array)** is a collection of xndarray. This collection can be interpreted as a simple group or as an interconnected collection (names are used as pointers between xndarray items).

    In the context of a xdataset, an included xndarray data can be:

    - *dimension* : named-array where his name is present in links of a variable of the xdataset
    - *data-array* : named-array where his name is not present in links of a variable of the xdataset
    - *data-var* : variable where links equals the list of dimensions of the xdataset
    - *coordinate* : variable where links not equals the list of dimensions of the xdataset
    - *mask* : additional-array where data-type is boolean
    - *data-add* : additional-array where data-type is not boolean

     A xdataset is valid if :  
    - included Xndarray are valid
    - names in links are names of xndarray
    - the shape of a Xndarray is consistent with the shape of Xndarray defined in his links
 
  A xdataset is multidimensional if it is valid and contains more than one data-var.
  
  A xdataset is unidimensional if it is valid and contains a single data-var.
  
  In the other cases, a xdataset is a simple group of xndarray.

*Note*
- Numpy.ndarray corresponds to ndarray
- Xarray.DataArray, Xarray.Dataset, scipp.DataArray, scipp.DataGroup, scipp.Dataset correspond to xdataset

### JSON example

The example below is a Json representation of the xdataset named 'example'.

The Json format is detailed in Appendix.


In [1]:
example = {
    'example:xdataset': {
        'var1': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]], ['x', 'y']],
        'var1.variance': [[[2, 2], [0.1, 0.2, 0.3, 0.4]]],
        'var1.mask1': [[[True, False]], ['x']],
        'var1.mask2': [[[2, 2], [True, False, False, True]]],
    
        'var2': [['var2.ntv'], ['x', 'y']],    
        
        'x': [['string', ['23F0AE', '578B98']], {'test': 21}],
        'y': [['date', ['2021-01-01', '2022-02-02']]],
        
        'ranking': [['month', [2, 2], [1, 2, 3, 4]], ['var1']],
        'z': [['float', [10, 20]], ['x']],
        'z.uncertainty': [[[0.1, 0.2]]],
        
        'z_bis': [[['z1_bis', 'z2_bis']]],
    
        'info': {'path': 'https://github.com/loco-philippe/ntv-numpy/tree/main/example/'}
    }
}

The first ligne is the NTV representation ( {'NTVname:NTVtype': NTVvalue}).

The other lines are the xndarray included in the xdataset (JsonObjects):
   - `x` and `y` are *dimension* (*named_array* present in *links*)
   - `var1` and `var2` are  *data_var* (*links* equals to the list of *dimensions*)
   - `var1.variance` and `z.uncertainty` are *data_add* (*add_name* and not boolean NTVtype)
   - `var1.mask1` and `var1.mask2` are *mask* (*add_name* and boolean NTVtype)
   - `ranking` and `z` are *coordinate* (*links* not equals the list of *dimensions*)
   - `z_bis` is *data_array* (*named_array* not present in *links*)
   - `info` is *metadata*
   - `var1` has a NTVtype with an extension (`kg`)
   - `var2` has a *relative* ndarray
   - `x` has metadata

The `Xdataset` object is associated to this xdataset structure:

In [2]:
from ntv_numpy import Xdataset
import ntv_pandas as npd

x_example = Xdataset.read_json(example)
x_example.info

{'name': 'example',
 'xtype': 'group',
 'data_vars': ['var1', 'var2'],
 'data_arrays': ['z_bis'],
 'dimensions': ['x', 'y'],
 'coordinates': ['ranking', 'z'],
 'additionals': ['var1.mask1', 'var1.mask2', 'var1.variance', 'z.uncertainty'],
 'metadata': ['info'],
 'validity': 'undefined',
 'length': 4,
 'width': 12}

Note:
- The JSON representation is equivalent to the `Xdataset` entity (Json conversion reversible)

In [3]:
x_json = x_example.to_json()
x_example_json = Xdataset.read_json(x_json)
x_example_json == x_example

True

## Interoperability

The xdataset structure is compatible with multi-dimensional tools.

### Xarray interoperability

In [4]:
x_xarray = x_example.to_xarray()
x_xarray

Notes:
- the `Xdataset` is translated in a `xr.Dataset` (or `xr.DataArray`) 
- the NTVtype defined for each array is loaded as attribute (e.g. 'month' is an attribute for the 'ranking' coordinate)
- data not compatible with a `xr.Dataset` is present as attribute:
    - `var2` is a relative *data_var* and is not included in the `xr.Dataset`
    - the `Xdataset` name is not available with `xr.Dataset`
    - `z_bis` is a *data_array* and is not included as a `xr.Dataset`

In [5]:
x_example_xr = Xdataset.from_xarray(x_xarray)
x_example_xr == x_example_json == x_example

True

The interface is lossless and reversible.

### Scipp interoperability

In [6]:
x_scipp = x_example.to_scipp()
print(x_scipp['example'])

<scipp.Dataset>
Dimensions: Sizes[x:string:2, y:date:2, ]
Coordinates:
* ranking:month               int32  [dimensionless]  (x:string, y:date)  [1, 2, 3, 4]
* x:string                   string  [dimensionless]  (x:string)  ["23F0AE", "578B98"]
* y:date                  datetime64             [ns]  (y:date)  [2021-01-01T00:00:00.000000000, 2022-02-02T00:00:00.000000000]
* z:float                   float64  [dimensionless]  (x:string)  [10, 20]
Data:
  var1:float                float64             [kg]  (x:string, y:date)  [10.1, 0.4, 3.4, 8.2]  [0.1, 0.2, 0.3, 0.4]
    Masks:
        mask1:boolean                bool  [dimensionless]  (x:string)  [True, False]
        mask2:boolean                bool  [dimensionless]  (x:string, y:date)  [True, False, False, True]




In [7]:
x_scipp

Notes:
- the `Xdataset` is translated as a `sc.Dataset` and in a `sc.DataGroup` 
- the NTVtype is added to the name of `sc.DataArray`
- the variance is included in `sc.Variable`
- masks are associated to the `sc.DataArray`
- data not compatible with a `sc.Dataset` is present in the `sc.DataGroup`:
    - `var2` is a relative *data_var* and is not included in the `sc.Dataset`
    - `z.uncertainty` is a *additional* and is not included as variance
    - the `Xdataset` name is not available with xr.Dataset
    - `z_bis` is a *data_array* and is not included in the `sc.Dataset`
    - attributes are included in the `sc.Dataset`

In [8]:
x_example_sc = Xdataset.from_scipp(x_scipp)
x_example_sc == x_example_xr == x_example_json == x_example

True

The interface is lossless and reversible.

## Exchanging and sharing data

### Neutral format
The JSON format presented does not use any data specific to any of the existing tools. 
The interface with multi-dimensional data processing tools is also reversible.

The proposed format is therefore a neutral format and can be used for the exchange of multi-dimensional data between different tools or platforms.

### URI usage
An alternative approach consists of exchanging only the structural data and making the elementary data downloadable.

This also has the advantage of managing elementary data independently of the dataset.

The application of this approach to the previous example is shown below.

In [9]:
#only structural data
example = {
    'example:xdataset': {
        'var1': [['float[kg]', [2, 2], 'var1.ntv'], ['x', 'y']],
        'var1.variance': [[[2, 2], 'var1_variance.ntv']],
        'var1.mask1': [['var1_mask1.ntv'], ['x']],
        'var1.mask2': [[[2, 2], 'var1_mask2.ntv']],
    
        'var2': [['var2.ntv'], ['x', 'y']],    
        
        'x': [['x.ntv'], {'test': 21}],
        'y': [['date', 'y.ntv']],
        
        'ranking': [['month', [2, 2], 'ranking.ntv'], ['var1']],
        'z': [['float', 'z.ntv'], ['x']],
        'z.uncertainty': [['z_uncertainty.ntv']],
        
        'z_bis': [['z_bis.ntv']],
    
        'info': {'path': 'https://github.com/loco-philippe/ntv-numpy/tree/main/example/'}
    }
}

x_example_mixte = Xdataset.read_json(example)
x_example_mixte.info

{'name': 'example',
 'xtype': 'group',
 'data_vars': ['var1', 'var2'],
 'data_arrays': ['z_bis'],
 'dimensions': ['x', 'y'],
 'coordinates': ['ranking', 'z'],
 'additionals': ['var1.mask1', 'var1.mask2', 'var1.variance', 'z.uncertainty'],
 'metadata': ['info'],
 'validity': 'undefined',
 'width': 12}

Addition of elementary data.

In [10]:
from ntv_numpy import Ndarray
from copy import copy

# simulation of reading ".ntv" json files at the indicated "path"
var1          = [[10.1, 0.4, 3.4, 8.2]]
var1_variance = ['float', [0.1, 0.2, 0.3, 0.4]]
var1_mask1    = [[True, False]]
var1_mask2    = [[True, False, False, True]]
var2          = ['var2.ntv']
x             = ['string', ['23F0AE', '578B98']]
y             = ['date', ['2021-01-01', '2022-02-02']]
ranking       = [[1, 2, 3, 4]]
z             = [[10.0, 20.0]]
z_uncertainty = [[0.1, 0.2]]
z_bis         = [['z1_bis', 'z2_bis']]

json_files = [var1, var1_variance, var1_mask1, var1_mask2, var2, x, y, ranking, z, z_uncertainty, z_bis]

x_example_mixte_json =copy(x_example_mixte)

for data, xnda in zip(json_files, x_example_mixte_json.xnd):
    xnda.set_ndarray(Ndarray.read_json(data))

x_example_mixte_json == x_example_sc == x_example_xr == x_example_json == x_example

True

In [11]:
import numpy as np

# simulation of reading files at the indicated "path"
var1          = np.array([10.1, 0.4, 3.4, 8.2])
var1_variance = Ndarray([0.1, 0.2, 0.3, 0.4], ntv_type='float')
var1_mask1    = np.array([True, False])
var1_mask2    = np.array([True, False, False, True])
var2          = Ndarray('var2.ntv')
x             = np.array(['23F0AE', '578B98'])
y             = np.array(['2021-01-01', '2022-02-02'], dtype='datetime64[D]')
ranking       = np.array([1, 2, 3, 4])
z             = np.array([10.0, 20.0])
z_uncertainty = np.array([0.1, 0.2])
z_bis         = np.array(['z1_bis', 'z2_bis'])

array_data = [var1, var1_variance, var1_mask1, var1_mask2, var2, x, y, ranking, z, z_uncertainty, z_bis]

x_example_mixte_numpy = copy(x_example_mixte)
for data, xnda in zip(array_data, x_example_mixte_numpy.xnd):
    xnda.set_ndarray(Ndarray(data))

x_example_mixte_numpy == x_example_mixte_json == x_example_sc == x_example_xr == x_example_json == x_example

True

### Summary
This example shows that the use of a neutral format associated with lossless interfaces provides complete interoperability.

## Appendix - NTV data
NTV format is a data representation with three attributes:
- NTVname (string)
- NTVtype (enumerate string)
- NTVvalue (JSON object) 

Two entities are defined:
- NTVlist : ordered list of entities
- NTVsingle : entity not composed with other entities

The JSON representation of NTVsingle entities is :
- value :
```json
    25, 'test', [1,2]
```

- name and value : 
```json
    {'test': 25}, {'test:': [1,2]}
```
- type and value :
```json
    {':day': 25}, {':point': [1,2]}
```
- type name and value : 
```json
    {'equinox:date': '2023-09-23'}, {'paris:point': [2.35, 48.86]}
```

The JSON representation of NTVlist entities is :

- if entities have JSON_member representation 
```json
    { 'name_NTVlist: type_NTVlist': {JSON_entity1  , ...  JSONentityn }}
    
    { 'example': {'equinox:date': '2023-09-23' , 'paris:point': [2.35 , 48.86] }}
```

- in the other cases :
```json
    { 'name_NTVlist: type_NTVlist': [ JSON_entity1 , ... JSONentityn ] } 
    
    { 'example': [25, {'paris:point': [2.35, 48.86] }, 'test']}
```    

## Appendix - JSON representation

### Multidimensional NTVtype
Multidimensional data is characterized by four NTVtype :

- `darray` for darray data
- `ndarray` for ndarra data
- `xndarray` for xndarray data
- `xdataset` for xdataset data

The JSON representation of these NTVtype is defined below.

### JSON darray data
JSON `darray`is a JsonArray 

The JSON representation is obtained by:

- Converting structure of darray data into JSON structure
- Conversion of elementary data into a JSON primitive,

Example of formats available:

Simple format:
```json
[ "apple", "apple", "orange", "apple", "apple", "pepper", "banana", "apple" ]
```
Categorical format:
```json
[["orange","pepper","apple","banana"], [2, 2, 0, 2, 2, 1, 3, 2] ]
```
Sparse format (8: length, -1: default value):
```json
[["orange","pepper","banana", "apple"], [8], [2, 5, 6, -1]]
```
Periodic format (18: length, 2:repetition coefficient)

   *representation of [10, 10, 20, 20, 30, 30, 10, 10, 20, 20, 30, 30, 10, 10, 20, 20, 30, 30]* :
```json
[[10, 20, 30], [18], [2]]
```

### JSON ndarray data
JSON `ndarray` is a JsonArray with three JsonElements (NTVtype, shape, darray) where darray is a JSON `darray`:

Example: 

ndarray with simple format
```json
["int32", [2, 2], [30, 40, 30, 40]]
```
ndarray with categorical format
```json
["int32", [2, 2], [[30, 40], [0, 1, 0, 1]]]
```
ndarray with sparse format
```json
["int32", [2, 2], [[30, 30, 40], [4], [0, 2, -1]]]
```
ndarray with implicit format (format defined by the linked 'x' ndarray)
```json
["int32", [[10, 20], 'x']]
```
ndarray with relative format (format defined by the linked 'x' ndarray)
```json
["int32", [[10, 20], 'x', [ 0, 1, 0, 0 ]]]
```
ndarray with simple format and list data
```json
["point", [2, 2], [[5, 15], [25, 35], [45, 55], [65, 75]]]
```
ndarray with extended ntv_type
```json
["int32[kg]", [1, 2, 3, 4]]
```
ndarray without ntv_type
```json
[[1, 2, 3, 4]]
```
ndarray defined by a URI
```json
["https://raw.githubusercontent.com/loco-philippe/ntv-numpy/master/example/ex_ndarray.ntv"]
["int32[kg]", "https://raw.githubusercontent.com/loco-philippe/ntv-numpy/master/example/ex_ndarray.ntv"]
```
Note:
- shape is optional with unidimensional data (deduced from darray)
- NTVtype is optional (deduced from data type)

### JSON xndarray data

JSON `xndarray` is a JsonObject with a single JsonMember {'name': xvalue}.

xvalue is a JsonArray with three optional JsonElements 

    - ndarray: JSON representation or uri string,
    - links: JsonArray with the name of linked ndarray,
    - meta: JsonObject with metadata

If ndarray or links are note present, meta can be a single string.

`xndarray` entity can be metadata, ndarray, dimension, variable or additional ndarray:

metadata:
```json
{'unit': 'kg'}
{'meta': {'dict': 'everything'}}
```
named-array (data-array or dimension in the context of a dataset):
```json
{':xndarray': [['int64[kg]', [10, 20]]]}
{'y': [['string', [2], ['y1', 'y2']]]}
```
named-array with metadata:
```json
{'x': [[['x1', 'x2']], {'test': 21}]}
```
additional-array:
```json
{'x.mask': [[[True, False]]]}
{'x.uncertainty': [[[0.1, 0.2]]]}
{'z.variance': [[[0.1, 0.2]]]}
```
Variable (data-vars or coordinate in the context of a dataset):
```json
{'var1': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]], ['x', 'y']]}
{'z': [[['z1', 'z2']], ['x']]}
{'ranking': [[[2, 2], [10, 20, 20, 10]], ['var1']]}
{'ranking': [[[2, 2], [[10, 20], [0, 1, 1, 0]]], ['var1']]}
```
Variable (ndarray defined by an URI):
```json
{'var2': [['https://raw.githubusercontent.com/loco-philippe/ntv-numpy/master/example/ex_ndarray.ntv'], ['x', 'y']]}
```

### JSON xdataset data

JSON `xdataset` is a JsonObject where each JsonMember is a `xndarray`. 

Example:
```json
{
    'var2': ['https://github.com/loco-philippe/ntv-numpy/tree/main/example/ex_ndarray.ntv', ['x', 'y']],
    'var1': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]], ['x', 'y']],
    'ranking': [[[2, 2], [1, 2, 3, 4]], ['var1']],
    'x': [[['x1', 'x2']], {'test': 21}],
    'y': [[['y1', 'y2']]],
    'z': [[['z1', 'z2']], ['x']],
    'z_bis': [[['z1_bis', 'z2_bis']]],
    'x.mask': [[[True, False]]],
    'x.uncertainty': [[[0.1, 0.2]], ['x']],
    'z.variance': [[[0.1, 0.2]]],
    'unit': 'kg',
    'info': {'example': 'everything'}}
}
```

## Appendix - Python entities

### Ndarray and Numpy ndarray
numpy.ndarray is used to represent elementary data in Ndarray.

A numpy.ndarray is equivalent to a Ndarray with two cases:

- mapping: numpy dtype is associated with a NTVtype
- conversion: python type of data is associated with a NTVtype

In the other case, a NTVtype is required to define a Ndarray from a numpy.ndarray.

These functions are similar to those used for the JSON-NTV representation of Pandas data ([see pandas examples](https://nbviewer.org/github/loco-philippe/ntv-pandas/blob/main/example/example_ntv_pandas.ipynb)).

#### Mapping
This is the most frequent case and concerns those numpy dtype:

- datetime[x] (D, Y, M, s, ms, us, ns, ps, fs)
- timedelta[x] (D, Y, M, s, ms, us, ns, ps, fs)
- bool, bytes, str
- int, uint (8,16,32,64)
- float (16, 32, 64)

For these dtype, the conversion is reversible.

In [12]:
import numpy as np
from ntv_numpy import Ndarray

examples = [
    np.array([10, 10, 20, 10, 30, 50]).astype('int64').reshape((2, 3)),
    np.array(['test1', 'test2'], dtype='str'),
    np.array(['2022-01-01', '2023-01-01'], dtype='datetime64[D]'),
    np.array(['2022', '2023'], dtype='datetime64[Y]'),
    np.array([b'abc\x09', b'abc'], dtype='bytes'),
    np.array([True, False], dtype='bool')]

for example in examples:
    equal = np.array_equal(example, Ndarray(example).ndarray)
    jsn = Ndarray(example).to_json(header=False)
    print('reversibility : ', equal, ', JSON representation : ', jsn)

reversibility :  True , JSON representation :  ['int64', [2, 3], [10, 10, 20, 10, 30, 50]]
reversibility :  True , JSON representation :  ['string', ['test1', 'test2']]
reversibility :  True , JSON representation :  ['date', ['2022-01-01', '2023-01-01']]
reversibility :  True , JSON representation :  ['year', ['2022', '2023']]
reversibility :  True , JSON representation :  ['base16', [b'abc\t', b'abc']]
reversibility :  True , JSON representation :  ['boolean', [True, False]]


#### Conversion
This case concerns Numpy dtype 'object' with specific Python data :

- datetime.time
- decimal.Decimal, 
- shapely classes (Point, LineString, Polygon, geojson, geometry)
- list, dict, NoneType
- ndarray, xndarray, xdataset, field, tab, ntv

For these Python classes, the conversion is reversible.

In [13]:
from datetime import time
from decimal import Decimal
from shapely.geometry import Point
from json_ntv import NtvSingle, Ntv
import pandas as pd

examples = [
    np.array([time(10, 2, 3), time(20, 2, 3)]),
    np.array([Decimal('10.5'), Decimal('20.5')]),
    np.array([Point([1,2]), Point([3,4])]),
    np.array([None, None]),
    np.array([{'one':1}, {'two':2}]),
    np.fromiter([[1,2], [3,4]], dtype='object')]

for example in examples:
    reverse = Ndarray(example).ndarray
    equal = np.array_equal(example, reverse)
    jsn_example = Ndarray(example).to_json(header=False)
    print('reversibility : ', equal, ', JSON representation : ', jsn_example)

examples = [
    np.fromiter([np.array([1, 2], dtype='int64'), 
                 np.array(['test1', 'test2'], dtype='str')], dtype='object'),
    np.fromiter([Ntv.obj({':point':[1,2]}), NtvSingle(12, 'noon', 'hour')], dtype='object'),
    np.fromiter([pd.Series([1,2,3]), pd.Series([4,5,6])], dtype='object')]

for example in examples:
    reverse = Ndarray(example).ndarray
    jsn_example = Ndarray(example).to_json(header=False)
    jsn_reverse = Ndarray(reverse).to_json(header=False)
    equal = jsn_example == jsn_reverse

    print('reversibility : ', equal, ', JSON representation : ', jsn_example)

reversibility :  True , JSON representation :  ['time', ['10:02:03', '20:02:03']]
reversibility :  True , JSON representation :  ['decimal64', [10.5, 20.5]]
reversibility :  True , JSON representation :  ['point', [[1.0, 2.0], [3.0, 4.0]]]
reversibility :  True , JSON representation :  ['null', [None, None]]
reversibility :  True , JSON representation :  ['object', [{'one': 1}, {'two': 2}]]
reversibility :  True , JSON representation :  ['array', [[1, 2], [3, 4]]]
reversibility :  True , JSON representation :  ['ndarray', [['int64', [1, 2]], ['string', ['test1', 'test2']]]]
reversibility :  True , JSON representation :  ['object', [{":point": [1, 2]}, {"noon:hour": 12}]]
reversibility :  True , JSON representation :  ['field', [[1, 2, 3], [4, 5, 6]]]


#### Other NTVtype
The JSON data with other NTVtype can be converted in ndarray but this conversion is not reversible. 

The reversibility is obtained with Ndarray entities.

This case concerns :

- data with type extension (e.g. number with unit : 'int64[kg]')
- data with generic format (NTVtype : json, int, float, number)
- number with semantic value (NTVtype : month, day, wday, yday, week, hour, minute, second)
- string with semantic value (NTVtype : base16, base32, base64, period, duration, jpointer, uri, uriref, iri, iriref, email, regex, hostname, ipv4, ipv6, file, geojson)
- object with semantic value (geometry, timearray)
- data with type from an other namespace (e.g. schemaorg type : 'org.propertyID')

In [14]:
examples_json  = [
    ['int64[kg]', [1, 2, 3, 4]],
    ['month', [2, 2], [1, 2, 3, 4]],
    ['json', [1, 'two', {'three': 3}]],
    ['email', ['John Doe <jdoe@mac.example>', 'Anna Doe <adoe@mac.example>']]]
            
for ex_json in examples_json:
    nda = Ndarray.read_json(ex_json)  
    equal = nda == Ndarray.read_json(nda.to_json())
    print('reversibility : ', equal, ', Numpy entity : ', repr(nda.ndarray))

reversibility :  True , Numpy entity :  array([1, 2, 3, 4], dtype=int64)
reversibility :  True , Numpy entity :  array([[1, 2],
       [3, 4]])
reversibility :  True , Numpy entity :  array([1, 'two', {'three': 3}], dtype=object)
reversibility :  True , Numpy entity :  array(['John Doe <jdoe@mac.example>', 'Anna Doe <adoe@mac.example>'],
      dtype='<U27')


### Xndarray
Xndarray is the entity associated to `xndarray` NTVtype.

The `ndarray` entity included is a Numpy ndarray.


#### Metadata

In [15]:
from ntv_numpy import Xndarray

examples_json  = [
    {'unit': 'kg'},
    {'meta': {'everything': 1}}]

for ex_json in examples_json:
    xnda = Xndarray.read_json(ex_json)  
    equal = xnda == Xndarray.read_json(xnda.to_json())
    print('reversibility : ', equal, ', Xndarray entity : ', xnda.to_json(header=False))

reversibility :  True , Xndarray entity :  {'unit': 'kg'}
reversibility :  True , Xndarray entity :  {'meta': {'everything': 1}}


#### Simple ndarray

In [16]:
examples  = [Ndarray(np.array([1, 'two', {'three': 3}],  dtype='object'), ntv_type='json'),
             Ndarray(np.array([1, 2, 3, 4],              dtype='object'), ntv_type='int64[kg]'),
             Ndarray(np.array([1, 2, 3, 4],              dtype='object'), ntv_type='month'),
             Ndarray(np.array([[1, 2], [3, 4]],          dtype='object'), ntv_type='int'),
             Ndarray(np.array(['1F23', '236A5E'],        dtype='object'), ntv_type='base16'),
             Ndarray(np.array(['P3Y6M4DT12H30M5S'],      dtype='object'), ntv_type='duration'),
             Ndarray(np.array(['geo:13.4125,103.86673'], dtype='object'), ntv_type='uri'),
             Ndarray(np.array(['192.168.1.1'],           dtype='object'), ntv_type='ipv4'),
             Ndarray(np.array(['John Doe <jdoe@mac.example>', 'Ann Doe <adoe@mac.example>'], dtype='object'), ntv_type='email')
            ]

for example in examples:
    xnda = Xndarray('example', example)
    equal = xnda == Xndarray.read_json(xnda.to_json())
    print('reversibility : ', equal, ', Xndarray entity : ', xnda.to_json(header=False))

reversibility :  True , Xndarray entity :  {'example': [['json', [1, 'two', {'three': 3}]]]}
reversibility :  True , Xndarray entity :  {'example': [['int64[kg]', [1, 2, 3, 4]]]}
reversibility :  True , Xndarray entity :  {'example': [['month', [1, 2, 3, 4]]]}
reversibility :  True , Xndarray entity :  {'example': [['int', [2, 2], [1, 2, 3, 4]]]}
reversibility :  True , Xndarray entity :  {'example': [['base16', [b'1F23', b'236A5E']]]}
reversibility :  True , Xndarray entity :  {'example': [['duration', ['P3Y6M4DT12H30M5S']]]}
reversibility :  True , Xndarray entity :  {'example': [['uri', ['geo:13.4125,103.86673']]]}
reversibility :  True , Xndarray entity :  {'example': [['ipv4', ['192.168.1.1']]]}
reversibility :  True , Xndarray entity :  {'example': [['email', ['John Doe <jdoe@mac.example>', 'Ann Doe <adoe@mac.example>']]]}


#### Named-ndarray

In [17]:
xnda = Xndarray('example', np.array(['x1', 'x2']))
equal = xnda == Xndarray.read_json(xnda.to_json())

print('reversibility : ', equal, ', Xndarray entity : ', xnda.to_json(header=False))

reversibility :  True , Xndarray entity :  {'example': [['string', ['x1', 'x2']]]}


#### Additional-ndarray

In [18]:
xnda = Xndarray('x.mask', np.array([True, False]))
equal = xnda == Xndarray.read_json(xnda.to_json())

print('reversibility : ', equal, ', Xndarray entity : ', xnda.to_json(header=False))

reversibility :  True , Xndarray entity :  {'x.mask': [['boolean', [True, False]]]}


#### Variable

In [19]:
xnda = Xndarray('var2', Ndarray(np.array([10.1, 0.4, 3.4, 8.2]).reshape([2, 2]), ntv_type='float[kg]'), links = ['x', 'y'])
equal = xnda == Xndarray.read_json(xnda.to_json())

print('reversibility : ', equal, ', Xndarray entity : ', xnda.to_json(header=False))

reversibility :  True , Xndarray entity :  {'var2': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]], ['x', 'y']]}


#### URI Variable

In [20]:
xnda = Xndarray('var1', Ndarray('https://raw.githubusercontent.com/loco-philippe/ntv-numpy/master/example/ex_ndarray.ntv'), links = ['x', 'y'])
equal = xnda == Xndarray.read_json(xnda.to_json())

print('reversibility : ', equal, ', Xndarray entity : ', xnda.to_json(header=False))

reversibility :  True , Xndarray entity :  {'var1': [['https://raw.githubusercontent.com/loco-philippe/ntv-numpy/master/example/ex_ndarray.ntv'], ['x', 'y']]}


### Xdataset
Xdataset is the entity associated to `xdataset` NTVtype.

The `xndarray` entities included are Xndarray.


#### Simple example

In [21]:
from ntv_numpy import Xdataset

xn1 = Xndarray.read_json({'x': [['string', ['x1', 'x2']]]})
xn2 = Xndarray.read_json({'z': [['string', ['z1', 'z2']], ['x']]})

xd = Xdataset([xn1, xn2])

print()
print("\nreversibility : ", Xdataset.read_json(xd.to_json()) == xd, ', Xdataset entity : ',  xd.to_json(header=False))



reversibility :  True , Xdataset entity :  {'x': [['string', ['x1', 'x2']]], 'z': [['string', ['z1', 'z2']], ['x']]}


#### Complete example

In [22]:
from pprint import pprint

example = {'test': {
                'var1': [['https://github.com/loco-philippe/ntv-numpy/tree/main/example/ex_ndarray.ntv'], 
                         ['x', 'y']],
                'var2': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]], ['x', 'y']],
                'ranking': [[[2, 2], [1, 2, 3, 4]], ['var2']],
                'x': [[['x1', 'x2']], {'test': 21}],
                'y': [[['y1', 'y2']]],
                'z': [[['z1', 'z2']], ['x']],
                'z_bis': [[['z1_bis', 'z2_bis']]],
                'x.mask': [[[True, False]], ['x']],
                'x.variance': [[[0.1, 0.2]], ['x']],
                'z.variance': [[[0.1, 0.2]], ['x']],
                'unit': 'kg',
                'info': {'example': 'everything'}}}
xd = Xdataset.read_json(example)        

pprint(xd.to_json())
print("\nreversibility : ", Xdataset.read_json(xd.to_json()) == xd)

{'test:xdataset': {'info': {'example': 'everything'},
                   'ranking': [['int', [2, 2], [1, 2, 3, 4]], ['var2']],
                   'unit': 'kg',
                   'var1': [['https://github.com/loco-philippe/ntv-numpy/tree/main/example/ex_ndarray.ntv'],
                            ['x', 'y']],
                   'var2': [['float[kg]', [2, 2], [10.1, 0.4, 3.4, 8.2]],
                            ['x', 'y']],
                   'x': [['string', ['x1', 'x2']], {'test': 21}],
                   'x.mask': [['boolean', [True, False]], ['x']],
                   'x.variance': [['float64', [0.1, 0.2]], ['x']],
                   'y': [['string', ['y1', 'y2']]],
                   'z': [['string', ['z1', 'z2']], ['x']],
                   'z.variance': [['float64', [0.1, 0.2]], ['x']],
                   'z_bis': [['string', ['z1_bis', 'z2_bis']]]}}

reversibility :  True


#### Analysis
The `info` property return a dict with the properties of a Xdataset:
- name : name of the Xdataset
- xtype : meta (only metadata), mono (unidimensional array), multi (multidimensional array), group (collection of Xndarray)
- data-vars : list of the names of data-var Xndarray
- data-arrays : list of the names of data-arrays Xndarray
- dimensions : list of the names of dimension Xndarray
- coordinates : list of the names of coordinate Xndarray
- additionals : list of the names of additional Xndarray
- metadata : list of the names of metadata
- validity : undefined (xtype of a Xndarray has value 'relative' or 'inconsistent'), inconsistent ('shape' or 'links' of a variable is inconsistent), valid (other cases)
- width : number of Xndarray included

In [23]:
xd.info

{'name': 'test',
 'xtype': 'group',
 'data_vars': ['var1', 'var2'],
 'data_arrays': ['z_bis'],
 'dimensions': ['x', 'y'],
 'coordinates': ['ranking', 'z'],
 'additionals': ['x.mask', 'x.variance', 'z.variance'],
 'metadata': ['info', 'unit'],
 'validity': 'undefined',
 'width': 12}

In [24]:
del(xd[('var1')])
xd.info

{'name': 'test',
 'xtype': 'mono',
 'data_vars': ['var2'],
 'data_arrays': ['z_bis'],
 'dimensions': ['x', 'y'],
 'coordinates': ['ranking', 'z'],
 'additionals': ['x.mask', 'x.variance', 'z.variance'],
 'metadata': ['info', 'unit'],
 'validity': 'valid',
 'length': 4,
 'width': 11}

## Appendix - Using in NTV data
The NTV format makes it possible to group data of different types in the same structure or in specific structure ([see NTV overview](https://nbviewer.org/github/loco-philippe/NTV/blob/main/example/example_ntv.ipynb)).

### Including in NTV structure

In [25]:
from json_ntv import NtvList

nd = np.array([1, 2, 3, 4])
xn1 = Xndarray('example', nda=np.array(['x1', 'x2']))
xn2 = Xndarray.read_json({'z': [['string', ['z1', 'z2']], ['x']]})
xd = Xdataset([xn1, xn2], 'test')

print('example NTVsingle\n')
print(NtvSingle(nd))

example NTVsingle

{":narray": ["int32", [1, 2, 3, 4]]}


In [26]:
print('example NTVlist\n')
print(NtvList([nd, xn1, xd]))

example NTVlist

{":narray": ["int32", [1, 2, 3, 4]], ":xndarray": {"example:xndarray": [["string", ["x1", "x2"]]]}, ":xdataset": {"test:xdataset": {"example": [["string", ["x1", "x2"]]], "z": [["string", ["z1", "z2"]], ["x"]]}}}


In [27]:
print('example with mixed data (Json representation)\n')
mixte = Ntv.obj({'mixed': [xn1, {'coordinate':Point(1,2)}, {'pandas series': pd.Series([1,2,3])}]})
print(mixte)

example with mixed data (Json representation)

{"mixed": {":xndarray": {"example:xndarray": [["string", ["x1", "x2"]]]}, "coordinate:point": [1.0, 2.0], "pandas series:field": [1, 2, 3]}}


In [28]:
print('example with mixed data (object representation)\n')
pprint(mixte.to_obj(format='obj'), width=20)

example with mixed data (object representation)

{'example:xndarray': [['string', ['x1', 'x2']]]}
{'mixed': [Xndarray[example],
           {'coordinate': <POINT (1 2)>},
           {'pandas series': 0    1
1    2
2    3
dtype: int64}]}


### Including in other objects

In [29]:
sr = pd.Series([1, 2, nd, xn1])
mixin = Ntv.obj({'mixin': sr})
print(mixin)

{"mixin:field": [1, 2, {":narray": ["int32", [1, 2, 3, 4]]}, {":xndarray": {"example:xndarray": [["string", ["x1", "x2"]]]}}]}


## Appendix - Equivalence with tabular format
The conversion between the two tabular and multi-dimensional formats is simple and lossless.

We can therefore share data between a tabular tool and a multidimensional tool via this format.

### Format conversion

In [30]:
from numpy_ntv_connector import to_json_tab, read_json_tab

example = np.array([10, 10, 20, 10, 30, 50]).astype('int64').reshape((2, 3))
print("example without axes :\n")
#pprint(to_json(ex))
pprint(to_json_tab(example))

"""print("\nexample with axes :\n")
pprint(to_json(a, add=add), width=150)
pprint(to_json_tab(a, add), width=150)"""

example without axes :

{':tab': {'data::int64': [10, 10, 20, 10, 30, 50],
          'dim_0': [[0, 1], [3]],
          'dim_1': [[0, 1, 2], [1]]}}


'print("\nexample with axes :\n")\npprint(to_json(a, add=add), width=150)\npprint(to_json_tab(a, add), width=150)'

### Compatibility with tabular tools

In [31]:
import ntv_pandas as npd

print('pandas DataFrame :')
ex_df = npd.read_json(to_json_tab(example))
display(ex_df)

"""a_df = npd.read_json(to_json_tab(a, add))
display(a_df)
print('dtypes:\n' + str(a_df.dtypes))"""

pandas DataFrame :


Unnamed: 0,dim_0,dim_1,data::int64
0,0,0,10
1,0,1,10
2,0,2,20
3,1,0,10
4,1,1,30
5,1,2,50


"a_df = npd.read_json(to_json_tab(a, add))\ndisplay(a_df)\nprint('dtypes:\n' + str(a_df.dtypes))"

### tabular data to multidimensional data
Tabular data can be analysed to identify the dimension and the fields associated to each axe

'''analys = a_df.npd.analysis(True)
print('dimension:\n', analys.dimension)
partition = analys.field_partition(mode='id')
print('\npartition:\n', partition)'''

'primary' fields are converted into axes.

The shape is deduced from the length of axes (categorical format)

'''a_df_sort = a_df.sort_values(partition['primary'])
a3, add3 = read_json_tab(a_df_sort.npd.to_json(header=False, index=False))

print("\nreversibility :\n")
print(np.array_equal(a3, a), Ntv.obj(add3) == Ntv.obj(add))'''

## Appendix - Astropy specific points
This chapter presents some points related to the Astropy data structure that can be integrated into the JSON-NTV format.

### Units and quantities
- 'unit' is a specif type
- three options are available for quantities:

    - option 1 : add specific types including unit
    - option 2 : add unit as type extension for existing types
    - option 3 : including unit in the name

- Option 2 is retained 

    - This option can be extended to other usages. For example:
    
    ```
    {"comment:string[fr]": "Paris est une belle ville"}
    ```

    - This option is compatible with NTV structure. For example 
    
    ```
    {"list_of_ndarray::ndarray[kg]": { "array1": [1, 2, 3, 4], "array2": [5, 6, 7, 8]}}
    ```

#### Unit as ntv_type

In [32]:
NtvSingle('kg', 'mass', 'unit')

{"mass:unit": "kg"}

#### Quantities

In [33]:
examples = [
    Ndarray([2., 2.5, 3., 3.5, 4., 4.5, 5.], ntv_type='float[m/s]'),
    Xndarray('example', Ndarray([2., 2.5, 3., 3.5, 4., 4.5, 5.], ntv_type='float[m/s]'))
]
for example in examples:
    print(example.to_json(header=False))
    print(example.to_json())

['float[m/s]', [2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]]
{':ndarray': ['float[m/s]', [2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]]}
{'example': [['float[m/s]', [2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]]]}
{'example:xndarray': [['float[m/s]', [2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0]]]}


In [34]:
ntv = Ntv.obj({"list_of_ndarray::ndarray[kg]": { "array1": [[2, 2], [1, 2, 3, 4]], "array2": [[2, 2], [5, 6, 7, 8]]}})

print('json representation :\n', ntv[0])
print('\njson representation of a list :\n', ntv)
print('\nNdarray representation :\n', ntv[0].to_obj(format='obj', type=True))

json representation :
 {"array1:ndarray[kg]": [[2, 2], [1, 2, 3, 4]]}

json representation of a list :
 {"list_of_ndarray::ndarray[kg]": {"array1": [[2, 2], [1, 2, 3, 4]], "array2": [[2, 2], [5, 6, 7, 8]]}}

Ndarray representation :
 {'array1:ndarray[kg]': Ndarray(int, [2, 2])}


### Coordinates
The existing 'point' type (and also the other types: pointstr, line, polygon, multipolygon, box...) can be used with the coordinate object (perhaps with a type extension). e.g.

```
{':point[icrs]' : [ 10.625, 41.2] }
```

### Table
The `Table` or `Qtable` object is represented using the `tab` format dedicated to tabular structures. 

```
{'ex_table:tab': {'x:string': ['x1', 'x1', 'x2', 'x2'], 
                  'y:string': ['y1', 'y2', 'y1', 'y2'], 
                  'value:float64[kg]': [1.0, 2.0, 3.0, 4.0],
                  'meta': 'everything'}
```

This ensures interoperability between tabular tools (e.g. Pandas).


### Other structures

The other structures were not examined. They can be integrated using the following tools:

- Addition of new types: for types having a transversal character
- Adding type via a Namespace: for types associated with specific data (e.g. “astro.xxx)
- Definition of an imposed structure for a given type
- Using specific type extension
