# Features

The `motion` table consists of what appear to be sensor triggers. We can treat the data as an independent time-series per home.

First convert this to a time-series to allow for more flexible methods of generating features.

In [1]:
import pandas as pd
pd.options.display.max_columns = 40
from lib.data.features import read_raw_data, transform_sensor_triggers_to_time_series
from lib.common.paths import DATABASE_LOCATION

raw_data = read_raw_data(DATABASE_LOCATION, train=True)
time_series = transform_sensor_triggers_to_time_series(raw_data)
time_series.head(5)

Unnamed: 0,home_id,datetime,multiple_occupancy,WC1,bathroom1,bedroom1,conservatory,dining room,hallway,kitchen,living room,lounge,study
0,0904961f621c9bd03542b43b992ec431,2024-01-01 08:27:15+00:00,0,0,0,0,0,0,1,0,0,0,0
1,0904961f621c9bd03542b43b992ec431,2024-01-01 08:28:19+00:00,0,0,0,0,0,0,1,0,0,0,0
2,0904961f621c9bd03542b43b992ec431,2024-01-01 08:35:04+00:00,0,0,0,0,0,0,1,0,0,0,0
3,0904961f621c9bd03542b43b992ec431,2024-01-01 08:36:20+00:00,0,0,0,0,0,0,1,0,0,0,0
4,0904961f621c9bd03542b43b992ec431,2024-01-01 08:36:42+00:00,0,0,0,0,0,0,0,1,0,0,0


## Multiple locations simulataneously

Hypothesis: two or more rooms showing activity at the same time is indicative of a multiple occupant household.

There are single-occupant examples where two rooms can trigger in a short interval, for example when a person walks between rooms. However, these events should be smaller in number than actual events where multiple occupants are active in different rooms, triggering motion sensors for a long period of time.

Additionally, we may want to also consider a relatively large window here, as the individuals in both rooms may not necessarily move at the same time.

In [2]:
from lib.data.features import add_multiple_location_triggers_in_window

multi_location_windows = ["5min", "30min", "1h", "2h"]
locations = list(set(raw_data["location"]))
for window in multi_location_windows:
    time_series = add_multiple_location_triggers_in_window(time_series, window, locations)

multiple_location_event_columns = [f"multiple_room_triggers_{window}" for window in multi_location_windows]
display(time_series[["home_id", "datetime", "multiple_occupancy"] + multiple_location_event_columns].head(5))

Unnamed: 0,home_id,datetime,multiple_occupancy,multiple_room_triggers_5min,multiple_room_triggers_30min,multiple_room_triggers_1h,multiple_room_triggers_2h
0,0904961f621c9bd03542b43b992ec431,2024-01-01 08:27:15+00:00,0,0,0,0,0
1,0904961f621c9bd03542b43b992ec431,2024-01-01 08:28:19+00:00,0,0,0,0,0
2,0904961f621c9bd03542b43b992ec431,2024-01-01 08:35:04+00:00,0,0,0,0,0
3,0904961f621c9bd03542b43b992ec431,2024-01-01 08:36:20+00:00,0,0,0,0,0
4,0904961f621c9bd03542b43b992ec431,2024-01-01 08:36:42+00:00,0,1,1,1,1


## Events per minute

Hypothesis: households with more than one occupant trigger more sensors over the same period of time.

A good example of this might be trips to the bathroom, which are usually performed independently by members of a household at different times, but the expected number of trips per person is the same.

This feature could be negatively impacted by the variance of the number of motion sensor triggers per person, for example a more active single occupant could have much more triggers than two sedentary individuals in a multiple occupancy home.

In addition, since we are dividing by the time elapsed it will be undefined at the time of first sensor trigger.

In [3]:
from lib.data.features import add_cumulative_triggers, add_elapsed_time

time_series = add_cumulative_triggers(time_series, locations + multiple_location_event_columns)
time_series = add_elapsed_time(time_series)

new_features = []
for col in ["total_cumulative"] + multiple_location_event_columns:
    new_col = col.replace("_cumulative", "") + "_per_hour"
    time_series[new_col] = time_series[col] / time_series["elapsed_time_hours"]
    new_features.append(new_col)

display(time_series[["home_id", "datetime", "multiple_occupancy"] + new_features].head(5))

Unnamed: 0,home_id,datetime,multiple_occupancy,total_per_hour,multiple_room_triggers_5min_per_hour,multiple_room_triggers_30min_per_hour,multiple_room_triggers_1h_per_hour,multiple_room_triggers_2h_per_hour
0,0904961f621c9bd03542b43b992ec431,2024-01-01 08:27:15+00:00,0,inf,,,,
1,0904961f621c9bd03542b43b992ec431,2024-01-01 08:28:19+00:00,0,112.5,0.0,0.0,0.0,0.0
2,0904961f621c9bd03542b43b992ec431,2024-01-01 08:35:04+00:00,0,23.027719,0.0,0.0,0.0,0.0
3,0904961f621c9bd03542b43b992ec431,2024-01-01 08:36:20+00:00,0,26.422018,0.0,0.0,0.0,0.0
4,0904961f621c9bd03542b43b992ec431,2024-01-01 08:36:42+00:00,0,57.142857,6.349206,6.349206,6.349206,6.349206


## Bathroom weight

Hypothesis: Occupants tend to visit the bathroom independently, the proportion of bathroom triggers should be higher in a multiple occupancy home.

As discussed above, the bathroom might be "special" here as it is one room which should be triggered independently per person occupying it. If the rest of the time the occupants share a room together then the bathroom should have a higher proportion of the total motion triggers.

In [4]:
time_series["bathroom_proportion"] = (time_series["bathroom1_cumulative"] + time_series["WC1_cumulative"]) / time_series["total_cumulative"]
display(time_series[["home_id", "datetime", "multiple_occupancy", "bathroom_proportion"]].head(5))

Unnamed: 0,home_id,datetime,multiple_occupancy,bathroom_proportion
0,0904961f621c9bd03542b43b992ec431,2024-01-01 08:27:15+00:00,0,0.0
1,0904961f621c9bd03542b43b992ec431,2024-01-01 08:28:19+00:00,0,0.0
2,0904961f621c9bd03542b43b992ec431,2024-01-01 08:35:04+00:00,0,0.0
3,0904961f621c9bd03542b43b992ec431,2024-01-01 08:36:20+00:00,0,0.0
4,0904961f621c9bd03542b43b992ec431,2024-01-01 08:36:42+00:00,0,0.0
