# Data Process to aggregate routines

This is guide to aggregate routines

---

In location.ipynb, we analyze location data and derive routines from location data.
Here, we are going to aggregate those routines to visualize routines.

Please, run location.ipynb first.

## Aggregate daily

aggregate for daily routine

DataFrame schema
- start_at
- end_at
- routine
- user_id


In [8]:
# Please change USER_ID
USER_ID = 'P3029'
LOCATION_DATASET_DIRECTORY = f'../csv/routines_raw/{USER_ID}-location.csv'
SLEEP_DATASET_DIRECTORY = f'../csv/routines_raw/{USER_ID}-sleep.csv'

In [16]:
import pandas as pd

raw_df = pd.concat([
    pd.read_csv(LOCATION_DATASET_DIRECTORY)[['user_id', 'start_at', 'end_at', 'weekday', 'routine']],
    pd.read_csv(SLEEP_DATASET_DIRECTORY)[['user_id', 'start_at', 'end_at', 'weekday', 'routine']]
])
raw_df['start_at'] = pd.to_datetime(raw_df['start_at'])
raw_df['end_at'] = pd.to_datetime(raw_df['end_at'])

raw_df

Unnamed: 0,user_id,start_at,end_at,weekday,routine
0,P3029,2019-04-30 09:00:00+09:00,2019-04-30 09:15:00+09:00,1,CLASS
1,P3029,2019-04-30 09:45:00+09:00,2019-04-30 10:00:00+09:00,1,CLASS
2,P3029,2019-04-30 10:00:00+09:00,2019-04-30 10:15:00+09:00,1,CLASS
3,P3029,2019-04-30 10:15:00+09:00,2019-04-30 10:30:00+09:00,1,CLASS
4,P3029,2019-04-30 11:00:00+09:00,2019-04-30 11:15:00+09:00,1,CLASS
...,...,...,...,...,...
1,P3029,2019-05-02 03:45:00+09:00,2019-05-02 08:15:00+09:00,3,SLEEP
2,P3029,2019-05-03 02:45:00+09:00,2019-05-03 08:45:00+09:00,4,SLEEP
3,P3029,2019-05-04 03:45:00+09:00,2019-05-04 09:15:00+09:00,5,SLEEP
4,P3029,2019-05-05 03:00:00+09:00,2019-05-05 07:30:00+09:00,6,SLEEP


In [14]:
raw_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 98 entries, 0 to 5
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype                    
---  ------    --------------  -----                    
 0   user_id   98 non-null     object                   
 1   start_at  98 non-null     datetime64[ns, UTC+09:00]
 2   end_at    98 non-null     datetime64[ns, UTC+09:00]
 3   weekday   98 non-null     int64                    
 4   routine   98 non-null     object                   
dtypes: datetime64[ns, UTC+09:00](2), int64(1), object(2)
memory usage: 4.6+ KB


In [17]:
daily_df = raw_df.copy()
daily_df['start_at'] = daily_df['start_at'].dt.time
daily_df['end_at'] = daily_df['end_at'].dt.time

daily_df = daily_df.groupby(['user_id', 'start_at', 'end_at']) \
                   .agg(routine=('routine', lambda x: pd.Series.mode(x)[0])) \
                   .reset_index()

daily_df

Unnamed: 0,user_id,start_at,end_at,routine
0,P3029,00:00:00,00:15:00,INDOOR
1,P3029,00:15:00,00:30:00,INDOOR
2,P3029,00:30:00,00:45:00,INDOOR
3,P3029,00:45:00,01:00:00,STUDY
4,P3029,02:00:00,02:15:00,STUDY
5,P3029,02:45:00,08:45:00,SLEEP
6,P3029,03:00:00,07:30:00,SLEEP
7,P3029,03:00:00,08:15:00,SLEEP
8,P3029,03:45:00,08:15:00,SLEEP
9,P3029,03:45:00,09:15:00,SLEEP


In [18]:
daily_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58 entries, 0 to 57
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   user_id   58 non-null     object
 1   start_at  58 non-null     object
 2   end_at    58 non-null     object
 3   routine   58 non-null     object
dtypes: object(4)
memory usage: 1.9+ KB


### Check with Daily Routine

In [19]:
import plotly.express as px

daily_fig_df = daily_df[daily_df['routine'] != 'INDOOR']
daily_fig_df['start_at'] = pd.to_datetime(daily_df['start_at'], format='%H:%M:%S')
daily_fig_df['end_at'] = daily_fig_df['start_at'] + pd.Timedelta(minutes=15)

fig = px.timeline(
    daily_fig_df,
    x_start='start_at',
    x_end='end_at',
    y='user_id',
    color='routine',
    height=400,
    width=1200,
)
fig.update_xaxes(tickformat="%H:%M")

fig.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



### Export daily routine

In [73]:
daily_df.to_csv(f'../csv/routines/{USER_ID}-daily.csv')

-----

## Aggregate weekly

In [75]:
weekly_df = raw_df.copy()
weekly_df['start_at'] = weekly_df['start_at'].dt.time
weekly_df['end_at'] = weekly_df['end_at'].dt.time

weekly_df = weekly_df.groupby(['user_id', 'start_at', 'end_at', 'weekday']) \
    .agg(routine=('routine', lambda x: pd.Series.mode(x)[0])) \
    .reset_index()

weekly_df

Unnamed: 0,user_id,start_at,end_at,weekday,routine
0,P3029,00:00:00,00:15:00,4,INDOOR
1,P3029,00:15:00,00:30:00,2,INDOOR
2,P3029,00:15:00,00:30:00,4,INDOOR
3,P3029,00:30:00,00:45:00,4,INDOOR
4,P3029,00:45:00,01:00:00,4,STUDY
...,...,...,...,...,...
87,P3029,21:45:00,22:00:00,3,STUDY
88,P3029,21:45:00,22:00:00,4,INDOOR
89,P3029,22:45:00,23:00:00,3,STUDY
90,P3029,23:15:00,23:30:00,3,STUDY


### Check with weekly routine

In [80]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

weekly_fig_df = weekly_df[weekly_df['routine'] != 'INDOOR']
weekly_fig_df['start_at'] = pd.to_datetime(weekly_df['start_at'], format='%H:%M:%S')
weekly_fig_df['end_at'] = weekly_fig_df['start_at'] + pd.Timedelta(minutes=15)

fig = make_subplots(rows=7, cols=1, shared_xaxes=True, vertical_spacing=0.02)
timeline_figs = []
for i in range(7):
    timeline_figs.append(
        px.timeline(weekly_fig_df[weekly_fig_df["weekday"] == i], x_start="start_at", x_end="end_at", y="user_id", color="routine")
    )

for i in range(7):
    f = timeline_figs[i]
    show_legend = True if i == 0 else False
    for j in range(len(f.data)):
        fig.add_trace(go.Bar(f.data[j], showlegend=show_legend), row=(i + 1), col=1)

fig.update_yaxes(title="Mon", row=1, col=1)
fig.update_yaxes(title="Tue", row=2, col=1)
fig.update_yaxes(title="Wed", row=3, col=1)
fig.update_yaxes(title="Thr", row=4, col=1)
fig.update_yaxes(title="Fri", row=5, col=1)
fig.update_yaxes(title="Sat", row=6, col=1)
fig.update_yaxes(title="Sun", row=7, col=1)

fig.update_xaxes(tickformat="%H:%M")
fig.update_layout(title="Weekly Routine")

fig.update_xaxes(type="date")

fig.update_layout(width=700, height=600)



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



### Export weekly routine

In [81]:
weekly_df.to_csv(f'../csv/routines/{USER_ID}-weekly.csv', index=False)