# Dataset : creation

## Goals

- show the different ways to create a Dataset object

## Prerequisite

- creating and updating Field objects 
-----

In [1]:
from observation import Sdataset, Ndataset, Sfield, Nfield

## creation modes
An Dataset object is a list of Field objects with the same length.

Several modes of creation exist
- Dataset         : creation with a list of Iindex objects
- Dataset.ext     : creation with a list of Iindex values
- Dataset.ntv     : creation with a json object

In [2]:
month = ['jan', 'feb', 'apr', 'jan', 'sep', 'dec', 'apr', 'may', 'jan']
quarter = ['q1',  'q1',  'q2',  'q1',  'q3',  'q4',  'q2',  'q2',  'q1']
idxmonth   = Sfield.ntv({'month': month})
idxquarter = Sfield.ntv({'quarter': quarter})

il  = Sdataset([idxmonth, idxquarter])
il1 = Sdataset.ntv({'month': month, 'quarter': quarter})
il2 = Sdataset.ext([month, quarter], ['month', 'quarter'])
il3 = Sdataset(il)

print('same object ? ', il == il1 == il2 == il3, '\n')

print(il)
print('infos (number of record, number of index, first record ) : ', len(il), il.lenindex, il[0])
il

same object ?  True 

variables :
    {'month': ['jan', 'feb', 'apr', 'jan', 'sep', 'dec', 'apr', 'may', 'jan']}
    {'quarter': ['q1', 'q1', 'q2', 'q1', 'q3', 'q4', 'q2', 'q2', 'q1']}

infos (number of record, number of index, first record ) :  9 2 ['jan', 'q1']


Sdataset[9, 2]

## indexing
Element indexing works exactly like that for other standard Python sequences.

In [3]:
print(il1[1])
print(il1.record(1))        # 'record' method access at the values in a row.
print(il1[1,6])
print(il1[4:])
print(il1[:2])
print(il1[1:7:2])
print(il1[-4:8])
print(il1[-3,3,1])

['feb', 'q1']
['feb', 'q1']
[['feb', 'apr'], ['q1', 'q2']]
[['sep', 'dec', 'apr', 'may', 'jan'], ['q3', 'q4', 'q2', 'q2', 'q1']]
[['jan', 'feb'], ['q1', 'q1']]
[['feb', 'jan', 'dec'], ['q1', 'q1', 'q4']]
[['dec', 'apr', 'may'], ['q4', 'q2', 'q2']]
[['apr', 'jan', 'feb'], ['q2', 'q1', 'q1']]


## Representation
Dataset objects have several representation.

Note: The codec representation for quarter is different from codec representation for month because quarter is 'derived' from month.

In [4]:
print('object with codec and keys : \n', il1.to_ntv())
print('\nobject with values : \n', il1.to_ntv(modecodec='full'))
print('\nstring with codec and keys : \n', il1.to_ntv().to_obj(encoded=True))
print('\nbinary with codec and keys : \n', il1.to_ntv().to_obj(encoded=True, format='cbor'))

object with codec and keys : 
 {"month": [["jan", "feb", "apr", "sep", "dec", "may"], [0, 1, 2, 0, 3, 4, 2, 5, 0]], "quarter": [["q1", "q2", "q3", "q4"], 0, [0, 0, 1, 2, 3, 1]]}

object with values : 
 {"month": ["jan", "feb", "apr", "jan", "sep", "dec", "apr", "may", "jan"], "quarter": ["q1", "q1", "q2", "q1", "q3", "q4", "q2", "q2", "q1"]}

string with codec and keys : 
 {"month": [["jan", "feb", "apr", "sep", "dec", "may"], [0, 1, 2, 0, 3, 4, 2, 5, 0]], "quarter": [["q1", "q2", "q3", "q4"], 0, [0, 0, 1, 2, 3, 1]]}

binary with codec and keys : 
 b'\xa2emonth\x82\x86cjancfebcaprcsepcdeccmay\x89\x00\x01\x02\x00\x03\x04\x02\x05\x00gquarter\x83\x84bq1bq2bq3bq4\x00\x86\x00\x00\x01\x02\x03\x01'


## Json compatibility

Dataset objects are completely defined by their Json representation for any format (object, string, binary). i.e. the object reconstructed from its json representation is identical to the initial object.

In [5]:
print('Object compatibility : ', 
      il1 ==
      Sdataset.ntv(il1.to_ntv()) == 
      Sdataset.ntv(il1.to_ntv(modecodec='full')) == 
      Sdataset.from_ntv(il1.to_ntv().to_obj(encoded=True), decode_str=True) ==
      Sdataset.from_ntv(il1.to_ntv().to_obj(encoded=True, format='cbor'), decode_str=True))

Object compatibility :  True
