# Analysis : Example

----
## Goal
- show on a simple example the main uses of the analysis module for tabular data

## Presentation of the example
Let's take the example of a table containing the price of some fruits and vegetables.

|product|plants   |plts |quantity|price|price level|group  |id   |supplier|location|valid|
|:-----:|:-------:|:---:|:-----:|:----:|:---------:|:-----:|:---:|:------:|:------:|:---:|
|apple  |fruit	  |fr   |1 kg	|1	   |low        |fruit1 |1001 |sup1    |fr      |ok   |
|apple  |fruit	  |fr   |10 kg	|10    |low        |fruit10|1002 |sup1    |gb      |ok   |
|orange |fruit	  |fr   |1 kg   |2     |high       |fruit1 |1003 |sup1    |es      |ok   |
|orange |fruit	  |fr   |10 kg	|20	   |high       |veget  |1004 |sup2    |ch      |ok   |
|peppers|vegetable|ve   |1 kg	|1.5   |low        |veget  |1005 |sup2    |gb      |ok   |
|peppers|vegetable|ve   |10 kg  |15    |low        |veget  |1006 |sup2    |fr      |ok   |
|carrot |vegetable|ve   |1 kg	|1.5   |high       |veget  |1007 |sup2    |es      |ok   |
|carrot |vegetable|ve   |10 kg	|20    |high       |veget  |1008 |sup1    |ch      |ok   |


The price is different depending on the product and the packaging of 1 or 10 kg.

In [1]:
fruits = {'plants':      ['fruit', 'fruit', 'fruit', 'fruit', 'vegetable', 'vegetable', 'vegetable', 'vegetable'],
          'plts':        ['fr', 'fr', 'fr', 'fr', 've', 've', 've', 've'], 
          'quantity':    ['1 kg', '10 kg', '1 kg', '10 kg', '1 kg', '10 kg', '1 kg', '10 kg'],
          'product':     ['apple', 'apple', 'orange', 'orange', 'peppers', 'peppers', 'carrot', 'carrot'],
          'price':       [1, 10, 2, 20, 1.5, 15, 1.5, 20],
          'price level': ['low', 'low', 'high', 'high', 'low', 'low', 'high', 'high'],
          'group':       ['fruit 1', 'fruit 10', 'fruit 1', 'veget', 'veget', 'veget', 'veget', 'veget'],
          'id':          [1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008],
          'supplier':    ["sup1", "sup1", "sup1", "sup2", "sup2", "sup2", "sup2", "sup1"],
          'location':    ["fr", "gb", "es", "ch", "gb", "fr", "es", "ch"],
          'valid':       ["ok", "ok", "ok", "ok", "ok", "ok", "ok", "ok"]}

In [2]:
import pandas as pd
import ntv_pandas as npd

dts = pd.DataFrame(fruits)
adts = dts.npd.analysis(distr=True)

## Partitions
A partition is a minimum list of Field where combinations are all different in the dataset.

In [3]:
adts.partitions(mode='id')

[['plants', 'price level', 'quantity'],
 ['price level', 'quantity', 'supplier'],
 ['location', 'plants'],
 ['location', 'supplier'],
 ['product', 'quantity'],
 ['id']]

The dimension of a Dataset is the highest size of a partition.

In [4]:
adts.dimension

3

The Dataset is composed for a partition of:
- primary: partition fields
- secondary: fields derived from or coupled to primary fields
- unique: unique fields
- variable: other fields


In [5]:
adts.field_partition(mode='id') #first partition

{'primary': ['plants', 'quantity', 'price level'],
 'secondary': ['plts'],
 'mixte': ['product'],
 'unique': ['valid'],
 'variable': ['price', 'group', 'id', 'supplier', 'location']}

In [6]:
adts.relation_partition()

{'plants': ['plants'],
 'price level': ['price level'],
 'quantity': ['quantity'],
 'plts': ['plants'],
 'product': ['plants', 'price level'],
 'valid': [],
 'price': ['plants', 'price level', 'quantity'],
 'group': ['plants', 'price level', 'quantity'],
 'id': ['plants', 'price level', 'quantity'],
 'supplier': ['plants', 'price level', 'quantity'],
 'location': ['plants', 'price level', 'quantity']}

In [7]:
adts.field_partition(mode='id', partition=['product', 'quantity'])

{'primary': ['product', 'quantity'],
 'secondary': ['plants', 'plts', 'price level'],
 'mixte': [],
 'unique': ['valid'],
 'variable': ['price', 'group', 'id', 'supplier', 'location']}

In [8]:
adts.relation_partition(partition=['product', 'quantity'])

{'product': ['product'],
 'quantity': ['quantity'],
 'plants': ['product'],
 'plts': ['plants'],
 'price level': ['product'],
 'valid': [],
 'price': ['product', 'quantity'],
 'group': ['product', 'quantity'],
 'id': ['product', 'quantity'],
 'supplier': ['product', 'quantity'],
 'location': ['product', 'quantity']}

## Use of Partitions
For a partition, a Dataset can be converted into a multi-dimensional entity. 

In [None]:
display(Image(url="https://mermaid.ink/img/" + b64encode(open('fruits.mmd', 'r', encoding="utf-8").read().encode("ascii")).decode("ascii")))

In [10]:
from ntv_numpy import Xdataset

xdt = Xdataset.from_dataframe(dts) # Xdataset is a neutral format
xdt.to_xarray(json_name=False)

In [11]:
df3 = xdt.to_dataframe(json_name=False).reset_index()
df2 = dts.sort_values(adts.partitions(mode='id')[0]).reset_index(drop=True)
df4 = df3.sort_values(adts.partitions(mode='id')[0]).reset_index(drop=True)[df2.columns]

In [12]:
df4.equals(df2)

True

In [13]:
df4

Unnamed: 0,plants,plts,quantity,product,price,price level,group,id,supplier,location,valid
0,fruit,fr,1 kg,orange,2.0,high,fruit 1,1003,sup1,es,ok
1,fruit,fr,10 kg,orange,20.0,high,veget,1004,sup2,ch,ok
2,fruit,fr,1 kg,apple,1.0,low,fruit 1,1001,sup1,fr,ok
3,fruit,fr,10 kg,apple,10.0,low,fruit 10,1002,sup1,gb,ok
4,vegetable,ve,1 kg,carrot,1.5,high,veget,1007,sup2,es,ok
5,vegetable,ve,10 kg,carrot,20.0,high,veget,1008,sup1,ch,ok
6,vegetable,ve,1 kg,peppers,1.5,low,veget,1005,sup2,gb,ok
7,vegetable,ve,10 kg,peppers,15.0,low,veget,1006,sup2,fr,ok


In [14]:
df2

Unnamed: 0,plants,plts,quantity,product,price,price level,group,id,supplier,location,valid
0,fruit,fr,1 kg,orange,2.0,high,fruit 1,1003,sup1,es,ok
1,fruit,fr,10 kg,orange,20.0,high,veget,1004,sup2,ch,ok
2,fruit,fr,1 kg,apple,1.0,low,fruit 1,1001,sup1,fr,ok
3,fruit,fr,10 kg,apple,10.0,low,fruit 10,1002,sup1,gb,ok
4,vegetable,ve,1 kg,carrot,1.5,high,veget,1007,sup2,es,ok
5,vegetable,ve,10 kg,carrot,20.0,high,veget,1008,sup1,ch,ok
6,vegetable,ve,1 kg,peppers,1.5,low,veget,1005,sup2,gb,ok
7,vegetable,ve,10 kg,peppers,15.0,low,veget,1006,sup2,fr,ok


In [15]:
xdt = Xdataset.from_dataframe(dts, dims=['product', 'quantity'])
xdt.to_xarray(idxname=['product', 'quantity'])