## Make Transactions
In this step we are required to group events belonging to the same partecipant together according to a given
condition (e.g., the same day). At this point  the events are supposed to be in either the form $(pid, ts, label)$ or in the form $(pid, ts, l_1, \ldots,l_n)$. In the latter case the user specifies the sequence $I = i_1<\ldots i_m  $ with for every $1\leq j\leq m$ we have
$i_j\in \{1, \ldots, n\}$ and the tuple with a single label are simply obtained from $(pid, ts, l_1, \ldots,l_n)$ by the concatenating the elements in $I$, i.e., $(pid, ts, l_1, \ldots,l_n)$ is turned into 
$(pid, ts, l_{i_1}\cdot \ldots\cdot l_{i_n})$.

In [1]:
import nbimporter
from ETLBasics_t1 import calories_to_df
from FeatureReduction_t4 import calories_kmeans
import dateutil as du
import pandas as pd
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import distance
PATH = '../../../datasets/pmdata/'

In [2]:
r  = calories_kmeans(calories_to_df(PATH, [1,2,3]))

In [3]:
r['label'] = r.apply(lambda x:   f'cluster: {x.cluster_index.values[0]}, day: {x.week_day.values[0]}', axis = 1)

In [19]:
r['id'] = [x[0] for x in list(r.index)]
r['vt']  = [du.parser.parse(x[1]).strftime('%Y-%m-%d') for x in list(r.index)]
r['hour'] = [du.parser.parse(x[1]).strftime('%H') for x in list(r.index)]

Here we decided to make a transaction for each day.

In [25]:
r['t_item'] = r.apply(lambda x:   [x.hour.values[0], x.label.values[0]]  , axis = 1)

In [39]:
transactions = list(r[['id', 'vt', 't_item']].groupby(['id','vt' ])['t_item'].apply(list))

Let us make a function for exporting to other notebooks.

In [48]:
def make_cal_transactions(path, partecipants, n_clusters = 5):
    r  = calories_kmeans(calories_to_df(path, partecipants), n_clusters = n_clusters)
    r['label'] = r.apply(lambda x:   f'calories cluster: {x.cluster_index.values[0]}, day: {x.week_day.values[0]}', axis = 1)
    r['id'] = [x[0] for x in list(r.index)]
    r['vt']  = [du.parser.parse(x[1]).strftime('%Y-%m-%d') for x in list(r.index)]
    r['hour'] = [du.parser.parse(x[1]).strftime('%H') for x in list(r.index)]
    r['t_item'] = r.apply(lambda x:   [int(x.hour.values[0]), x.label.values[0]]  , axis = 1)
    transactions = list(r[['id', 'vt', 't_item']].groupby(['id','vt' ])['t_item'].apply(list))
    return transactions

In [49]:
make_cal_transactions(PATH, [1,2])

[[['00', 'cluster: 4, day: Friday'],
  ['01', 'cluster: 4, day: Friday'],
  ['02', 'cluster: 4, day: Friday'],
  ['03', 'cluster: 4, day: Friday'],
  ['04', 'cluster: 4, day: Friday'],
  ['05', 'cluster: 4, day: Friday'],
  ['06', 'cluster: 0, day: Friday'],
  ['07', 'cluster: 0, day: Friday'],
  ['08', 'cluster: 1, day: Friday'],
  ['09', 'cluster: 1, day: Friday'],
  ['10', 'cluster: 1, day: Friday'],
  ['11', 'cluster: 1, day: Friday'],
  ['12', 'cluster: 1, day: Friday'],
  ['13', 'cluster: 1, day: Friday'],
  ['14', 'cluster: 1, day: Friday'],
  ['15', 'cluster: 2, day: Friday'],
  ['16', 'cluster: 2, day: Friday'],
  ['17', 'cluster: 0, day: Friday'],
  ['18', 'cluster: 4, day: Friday'],
  ['19', 'cluster: 1, day: Friday'],
  ['20', 'cluster: 2, day: Friday'],
  ['21', 'cluster: 1, day: Friday'],
  ['22', 'cluster: 0, day: Friday'],
  ['23', 'cluster: 1, day: Friday']],
 [['00', 'cluster: 4, day: Saturday'],
  ['01', 'cluster: 4, day: Saturday'],
  ['02', 'cluster: 4, day: Saturd