# Object: Ilist first Example 
## Goal
- show on a simple example the capacities of analysis of tabular data
--------------------------------

## Presentation of the example
The example describe the tool's ability to understand and modify a data structure.

Let's take the example of a csv file containing the price of some fruits and vegetables.

|product|plants	|quantity	|price|
|:----:|:----:|:----:|:----:|					
|apple|fruit	|1 kg	|1	|
|apple|fruit	|10 kg	|9|
|orange|fruit	|1 kg|2|
|orange|fruit	|10 kg	|18	|
|peppers|vegetable	|1 kg	|1.5|
|peppers|vegetable	|10 kg|13|
|banana|fruit	|1 kg	|0.5|
|banana|fruit	|10 kg	|4|

The price is different depending on the product and the packaging of 1 or 10 kg.


In [1]:
import os
os.chdir('../../ES')
from ilist import Ilist

prices = Ilist.Idic( {'plants':   ['fruit', 'fruit','fruit','fruit','vegetable', 'vegetable', 'fruit', 'fruit'],
                      'quantity': ['1 kg', '10 kg', '1 kg', '10 kg','1 kg', '10 kg','1 kg', '10 kg'], 
                      'product':  ['apple', 'apple', 'orange', 'orange', 'peppers', 'peppers', 'banana', 'banana'], 
                      'price':    [1, 10, 2, 20, 1.5, 15, 0.5, 5]}, var = 3)

## Matrix transformation
In a single command, i can turn this array into a matrix while keeping the dataset.

In reality this command searches for the columns whose data are "crossed" (product and quantity) as well as those which are associated with others (plants). Then it transfers this information to a tool adapted to the analysis of indexed matrices (e.g. Xarray)


In [2]:
print(prices.to_xarray())

<xarray.DataArray 'Ilist' (quantity: 2, product: 4)>
array([[5, 20, 15, 10],
       [0.5, 2, 1.5, 1]], dtype=object)
Coordinates:
  * quantity      (quantity) object '10 kg' '1 kg'
    quantity_row  (quantity) int32 0 1
    quantity_str  (quantity) object '10 kg' '1 kg'
  * product       (product) object 'banana' 'orange' 'peppers' 'apple'
    product_row   (product) int32 0 1 2 3
    product_str   (product) object 'banana' 'orange' 'peppers' 'apple'
    plants        (product) object 'fruit' 'fruit' 'vegetable' 'fruit'


## Aggregation
We can also imagine that these data were produced by several people and then accumulated to form the object 'prices' :
- one person for the fruit data, 
- one for the vegetable data 
- and another to put everything together.

In [3]:
fruit      = Ilist.Idic({'product':  ['apple', 'apple', 'orange', 'orange', 'banana', 'banana'],
                         'quantity': ['1 kg', '10 kg', '1 kg', '10 kg', '1 kg', '10 kg'], 
                         'price':    [1, 10, 2, 20, 0.5, 5]}, var=2)

vegetable  = Ilist.Idic({'product':  ['peppers', 'peppers'],
                         'quantity': ['1 kg', '10 kg'], 
                         'price':    [1.5, 15]}, var=2)
                         
total      = Ilist.Idic({'plants':   ['fruit', 'vegetable'],
                         'price':    [fruit, vegetable]}, var=1)

The 'prices' object is then a representation of the 'total' object.

This approach makes it possible to maintain data traceability and to build aggregation processes in line with business processes.

In [4]:
prices = total.merge(mergeidx=True)
print(prices)

    ["price", [1, 10, 2, 20, 0.5, 5, 1.5, 15]]

    ["plants", ["fruit", "fruit", "fruit", "fruit", "fruit", "fruit", "vegetable", "vegetable"]]
    ["product", ["apple", "apple", "orange", "orange", "banana", "banana", "peppers", "peppers"]]
    ["quantity", ["1 kg", "10 kg", "1 kg", "10 kg", "1 kg", "10 kg", "1 kg", "10 kg"]]



## what if...
...there is an error in the data documentation (e.g. a 'vegetable' instead of a 'fruit') ?

The Ilist object no longer understands that 'plant' is associated with 'product' ('banana' is 'fruit' or 'vegetable'?)

In [5]:
prices = Ilist.Idic( {'plants':   ['fruit', 'fruit','fruit','fruit','vegetable', 'vegetable', 'vegetable', 'fruit'],
                      'quantity': ['1 kg', '10 kg', '1 kg', '10 kg','1 kg', '10 kg','1 kg', '10 kg'], 
                      'product':  ['apple', 'apple', 'orange', 'orange', 'peppers', 'peppers', 'banana', 'banana'], 
                      'price':    [1, 10, 2, 20, 1.5, 15, 0.5, 5]}, var = 3)
print(prices.to_xarray())

<xarray.DataArray 'Ilist' (plants: 2, quantity: 2, product: 4)>
array([[['?', '?', 15, '?'],
        [0.5, '?', 1.5, '?']],

       [[5, 20, '?', 10],
        ['?', 2, '?', 1]]], dtype=object)
Coordinates:
  * plants        (plants) object 'vegetable' 'fruit'
    plants_row    (plants) int32 0 1
    plants_str    (plants) object 'vegetable' 'fruit'
  * quantity      (quantity) object '10 kg' '1 kg'
    quantity_row  (quantity) int32 0 1
    quantity_str  (quantity) object '10 kg' '1 kg'
  * product       (product) object 'banana' 'orange' 'peppers' 'apple'
    product_row   (product) int32 0 1 2 3
    product_str   (product) object 'banana' 'orange' 'peppers' 'apple'


## But fortunately...
... there is a solution !

We can impose that 'plant' is indeed associated with 'product' and in this case, the Ilist object translates this from the fact that there is a 'banana-fruit' and a 'banana-vegetable'. 


In [6]:
prices.nindex('product').coupling(prices.nindex('plants'))
print(prices.to_xarray())

<xarray.DataArray 'Ilist' (quantity: 2, product: 5)>
array([[5, '?', 10, 20, 15],
       ['?', 0.5, 1, 2, 1.5]], dtype=object)
Coordinates:
  * quantity      (quantity) object '10 kg' '1 kg'
    quantity_row  (quantity) int32 0 1
    quantity_str  (quantity) object '10 kg' '1 kg'
  * product       (product) object 'banana' 'banana' 'apple' 'orange' 'peppers'
    product_row   (product) int32 0 1 2 3 4
    product_str   (product) object 'banana' 'banana' 'apple' 'orange' 'peppers'
    plants        (product) object 'fruit' 'vegetable' ... 'fruit' 'vegetable'


We then find our initial matrix with a price for 1 kg of 'banana-fruit' and a price for 10 kg of 'banana-vegetable'.

------
# Conclusion

This example demonstrates several very interesting points:
- we can build a tabular dataset by respecting a business process and guaranteeing the integrity of the data,
- you can automatically analyze tabular data (such as csv or Excel files) to deduce the type of relationship that links the fields together,
- you can restructure the data of these fields without having to modify them (reversible),
- we can impose relationships between fields and measure the differences between the specification and the result,
- we can extend the notion of tabular data to complex data (eg Ilist data included)
- we can interface data analysis tools (eg Xarray)