# Example: Time series analysis

This example presents the analysis and the representation of hierarchical data.


## use case description

The time series represent a daily event counter. 

The aim of the study is to identify the influential temporal parameters

The time series is included in a table with potential temporal parameters:

- study: study description
- n_event: number of events
- date: date (isoformat)
- year: year of the date
- yearmonth: year and month of the date
- month: month of the date
- weekday: day of the week
- yearday: day of the year
- day: day of the date

## data acquisition

In [35]:
from datetime import date, timedelta
from random import random

columns = ['study', 'n_event', 'date', 'year', 'yearmonth', 'month', 'weekday', 'yearday', 'day']

study = 'event analysis'
t0 = date.fromisoformat('2000-01-01')
event0 = 100

def data_acquisition(n_dates):
    '''return a list of records '''
    data = []
    for dt in range(n_dates):
        dat = t0 + timedelta(dt)
        dat_iso = dat.isoformat()
        year = dat.year
        yearmonth = dat_iso[:7]
        month = dat.month
        weekday = dat.isoweekday()
        yearday = (dat - date.fromisoformat(str(dat.isoformat()[:4] + '-01-01'))).days
        day = dat.day
        n_event = round(event0 + (month-6) * random() + (weekday - 3.5) * random() + 0.1 * yearday * random() + random() * 10) 
        #data.append((study, n_event, dat_iso, year, yearmonth, month, weekday, yearday, day))
        data.append((study, n_event, dat_iso, year, yearmonth, month))
    return data

In [62]:
import pandas as pd

time_sr = pd.DataFrame(data_acquisition(10000), columns=columns[:6]).sample(frac=1).reset_index(drop=True)
time_sr

Unnamed: 0,study,n_event,date,year,yearmonth,month
0,event analysis,113,2006-08-03,2006,2006-08,8
1,event analysis,126,2007-09-17,2007,2007-09,9
2,event analysis,119,2011-08-19,2011,2011-08,8
3,event analysis,115,2008-06-28,2008,2008-06,6
4,event analysis,114,2012-05-04,2012,2012-05,5
...,...,...,...,...,...,...
9995,event analysis,125,2012-07-21,2012,2012-07,7
9996,event analysis,114,2025-07-28,2025,2025-07,7
9997,event analysis,122,2002-09-05,2002,2002-09,9
9998,event analysis,117,2018-05-04,2018,2018-05,5


In [63]:
import ntv_pandas

analys = time_sr.npd.analysis()
print(analys.tree())

-1: root-derived (10000)
   0 : study (9999 - 1)
   1 : n_event (9943 - 57)
   2 : date (0 - 10000)
   4 : yearmonth (9671 - 329)
      3 : year (301 - 28)
      5 : month (317 - 12)


In [64]:
from tab_dataset.dataset import Sdataset, Ndataset

time_sd = Sdataset(time_sr)
jsn = time_sd.to_ntv(modecodec='optimize').to_obj(encoded=True)
jsn2 = time_sd.to_ntv(modecodec='optimize').to_obj(encoded=False)
len(jsn)
#jsn

230039

In [65]:
import cbor2
len(cbor2.dumps(jsn2))

148384

In [66]:
len(time_sr.to_parquet())

103545