# Xarray - Pandas converter
---------------------------

## simple example

This first example is the example used in Xarray user-guide (section ["working with pandas"](https://docs.xarray.dev/en/stable/user-guide/pandas.html)) with an additional `attrs` metadata.

### Xarray interface

In [1]:
import numpy as np
import pandas as pd
import xarray as xr

import ntv_pandas as npd
import ntv_numpy as nnp

ds = xr.Dataset(
    {"foo": (("x", "y"), np.random.randn(2, 3))},
    coords={
        "x": [10, 20],
        "y": ["a", "b", "c"],
        "along_x": ("x", np.random.randn(2)),
        "scalar": 123,
    },
    attrs={"example": "Xarray user-guide"}
)
ds

In [2]:
df = ds.to_dataframe()
df

Unnamed: 0_level_0,Unnamed: 1_level_0,foo,along_x,scalar
x,y,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
10,a,0.660686,-1.701165,123
10,b,0.473133,-1.701165,123
10,c,0.887423,-1.701165,123
20,a,0.456966,0.777313,123
20,b,-1.205474,0.777313,123
20,c,0.645359,0.777313,123


In [3]:
xr.Dataset.from_dataframe(df)

This example shows that the conversion is not reversible (lossy roundtrip) and that the size of the ``dataset`` increases.

Particularly after a roundtrip, the following deviations are noted:

- a non-dimension Dataset ``coordinate`` is converted into ``variable`` 
- a non-dimension DataArray ``coordinate`` is not converted 
- ``dtype`` is not allways the same (e.g. "str" is converted to "object")
- ``attrs`` metadata is not converted

### ntv_pandas converter : Dataset -> DataFrame

Three options are available :

- **ntv_type**: Boolean (default True) - if False the `ntv_type` is not included in the columns name
- **info**: Boolean (default True) - if True, the `DataFrame.attrs` contains the multidimensional structure
- **index**: Boolean (default True) - if True, dimensions are translated into `indexes`

In [4]:
df_min = ds.nnp.to_dataframe(ntv_type=False, info=False, index=False)
df_min

Unnamed: 0,x,y,along_x,foo,scalar
0,10,a,-1.701165,0.660686,123
1,10,b,-1.701165,0.473133,123
2,10,c,-1.701165,0.887423,123
3,20,a,0.777313,0.456966,123
4,20,b,0.777313,-1.205474,123
5,20,c,0.777313,0.645359,123


In [5]:
df_full = ds.nnp.to_dataframe()
df_full

Unnamed: 0_level_0,Unnamed: 1_level_0,along_x:float64,foo:float64,scalar:int32
x:int32,y:string,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
10,a,-1.701165,0.660686,123
10,b,-1.701165,0.473133,123
10,c,-1.701165,0.887423,123
20,a,0.777313,0.456966,123
20,b,0.777313,-1.205474,123
20,c,0.777313,0.645359,123


### ntv_pandas converter : DataFrame -> Dataset

The conversion is done without loss, by reading the `DataFrame.attrs` or by finding the multidimensional structure hidden by the tabular structure.

Three options are available:

- **dims**: list of string (default None) - order of dimensions to apply
- **dataset** : Boolean (default True) - if False and a single data_var,
return a xr.DataArray
- **info** : Boolean (default True) - if True, use `DataFrame.attrs`

In [6]:
ds_min = df_min.npd.to_xarray()
ds_min

In [7]:
ds_full = df_full.npd.to_xarray()
ds_full

Note :

- The multidimensional structure is preserved with both options
- The `dtype` is preserved with both options 