# Dataset : structure analysis

## Goals

- understand the structure of Dataset object
- introduce methods for managing structure


-----

## Example

The reuse Dataset is :

<img src="https://loco-philippe.github.io/ES/ilist_merge.png" width="700">


In [1]:
from observation import Sdataset
from pprint import pprint

img = Sdataset.from_file('score.il')                # reuse Dataset from aggregation Notebook

## relationship

the relationships between Field can be shown with the img Dataset

<img src="https://loco-philippe.github.io/ES/ilist_canonical.png" width="600">


In [2]:
pprint(img.category)
print('\n', img.tree())

{'course': 'secondary',
 'examen': 'secondary',
 'first name': 'secondary',
 'full name': 'secondary',
 'group': 'secondary',
 'last name': 'secondary',
 'score': 'secondary',
 'surname': 'coupled',
 'year': 'unique'}

 -1: root-derived (13)
   0 : score (11)
      1 : course (3)
   2 : year (1)
   3 : examen (3)
   4 : full name (4)
      5 : last name (3)
      6 : first name (3)
      7 : surname (4)
      8 : group (3)


## Transformation to complete Dataset

When primary Field are crossed, a Dataset is complete (can be converted into matrix or in file with low size).

In [3]:
img.full(fillvalue=float('nan'), idxname=['full name', 'course', 'examen'])   # convert secondary indexes in primary indexes

pprint(img.category)

print('\nlength :', len(img), 'dimension :', img.dimension)

{'course': 'primary',
 'examen': 'primary',
 'first name': 'secondary',
 'full name': 'primary',
 'group': 'secondary',
 'last name': 'secondary',
 'score': 'variable',
 'surname': 'coupled',
 'year': 'unique'}

length : 36 dimension : 3


## Indexed matrix

With codec or values ajustement, a Dataset can be transformed in a matrix with choosen dimension

In [4]:
print(img.to_xarray())
print('\nObject img is complete and have the canonical order ? ', img.complete, img.iscanonorder())


<xarray.DataArray 'score' (course: 3, examen: 3, full name: 4)>
array([[[nan, nan, nan, nan],
        [nan, nan, nan, 18],
        [nan, nan, 18, 17]],

       [[11, 15, nan, nan],
        [13, nan, nan, nan],
        [15, nan, nan, nan]],

       [[nan, nan, 6, 2],
        [10, 8, nan, 4],
        [12, nan, nan, nan]]], dtype=object)
Coordinates:
  * course      (course) object 'software' 'math' 'english'
  * examen      (examen) object 't1' 't2' 't3'
  * full name   (full name) object 'anne white' ... 'camille red'
    last name   (full name) object 'white' 'white' 'black' 'red'
    first name  (full name) object 'anne' 'philippe' 'philippe' 'camille'
    group       (full name) object 'gr1' 'gr2' 'gr3' 'gr3'
    surname     (full name) object 'skyler' 'heisenberg' 'gus' 'saul'
Attributes:
    year:     2021

Object img is complete and have the canonical order ?  True True


In [5]:
img.nindex('score').tostdcodec(inplace=True)
print('absolute keys is not necessary in json object when Dataset is complete :\n')   
pprint(img.to_ntv().to_obj(), width=200)
print('\nconversion is reversible ? ', img.from_ntv(img.to_ntv()) == img)

absolute keys is not necessary in json object when Dataset is complete :

{'course': [['software', 'math', 'english'], [12]],
 'examen': [['t1', 't2', 't3'], [4]],
 'first name': [['camille', 'philippe', 'anne'], 2, [2, 1, 1, 0]],
 'full name': [['anne white', 'philippe white', 'philippe black', 'camille red'], [1]],
 'group': [['gr1', 'gr3', 'gr2'], 2, [0, 2, 1, 1]],
 'last name': [['red', 'black', 'white'], 2, [2, 2, 1, 0]],
 'score': [nan, nan, nan, nan, nan, nan, nan, 18, nan, nan, 18, 17, 11, 15, nan, nan, 13, nan, nan, nan, 15, nan, nan, nan, nan, nan, 6, 2, 10, 8, nan, 4, 12, nan, nan, nan],
 'surname': [['skyler', 'heisenberg', 'gus', 'saul'], 2],
 'year': 2021}

conversion is reversible ?  True


In [6]:
# matrix with dimension 2
img.nindex('course').coupling(img.nindex('examen'))   # transform two linked Field in two derived or coupled Field
print('new dimension : ', img.dimension, '\n')
img.to_xarray()

new dimension :  2 

