<h1>Helvar Starter Kit</h1>
<h3>Welcome to Helvar's Junction Challenge!</h3>
<p>This notebook is to help you get acquainted with the dataset. You can follow the instructions to easily load and visualize datasets. However, this is only for convenience purposes. You are welcome to use whatever tools you are comfortable with, as we are only interested in results!</p>
<p>Let's Begin by first loading all necessary libraries for this notebook to run.</p>

In [1]:
%load_ext autoreload
%autoreload 2
import pandas as pd
import numpy as np
import base64
import imageio as iio
from plotting import Plotting
import plotly.graph_objects as go

<p>This starter kit contains sample datafiles. You can also download the full datasets in the README document. Please remember to place the downloaded zip file in the data folder and unzip it.</p>
<ol>
<li>First load the pickle file and convert the timestamps to Helsinki timezone</li>
<li>Next load the json file containing deviceids</li>
<li>Finally load the png file as both a numpy array and a base64 encoded image. We need the later for plotting in Plotly helper functions</li>
</ol>

In [2]:
site = 'site_1'

In [3]:
df_events = pd.read_pickle(f'./data/{site}/{site}.pkl', compression='gzip')

In [4]:
df_events.loc[:, 'timestamp'] = (pd.to_datetime(df_events['timestamp'], utc=True)
                                 .dt.tz_convert('Europe/Helsinki')
                                 .dt.tz_localize(None))

In [5]:
df_events.head(5)

Unnamed: 0,timestamp,deviceid
0,2021-06-27 14:18:04.530398,30
1,2021-06-27 14:18:37.952030,30
2,2021-06-27 14:23:31.079242,30
3,2021-06-27 14:24:02.759153,30
4,2021-06-27 14:24:40.095851,30


In [6]:
df_devices = pd.read_json(f'./data/{site}/{site}.json')

In [7]:
df_devices.head(5)

Unnamed: 0,deviceid,x,y
0,0,3105.50645,1848.24023
1,1,2923.139872,1833.963047
2,2,3887.345883,1334.631008
3,3,4182.726187,1350.439368
4,4,4331.192525,1355.676081


In [8]:
with open(f'./data/{site}/{site}.png', "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read()).decode()

In [9]:
img = iio.imread(f'./data/{site}/{site}.png')
img.shape

(2635, 5270, 4)

We can now load the floorplan just to get a feel of what the devices look like.

In [10]:
scaling_factor = 3 # Set to 1 for highest resolution
plotting_obj = Plotting(bg_img=encoded_string, dims=(img.shape[1], img.shape[0]), df_devices=df_devices, scaling_factor=scaling_factor)
plotting_obj.run(renderer='browser') # Switch to iframe if you would like to view it here

Opening in existing browser session.


## View Occupancy by Day
<p>Since we are dealing with irregular IoT event data, we need to define our time-series window size and compute a statistic for the events during that window. This example shows the sum of events per day. </p>

In [11]:
df_events_day = df_events.copy()
df_events_day.loc[:, 'timestamp'] = df_events_day['timestamp'].dt.floor('1s')
df_events_day.loc[:, 'value'] = 1.0
df_events_day = df_events_day.groupby('timestamp').sum()
df_events_day = df_events_day.drop(['deviceid'], axis=1)
df_events_day = df_events_day.reindex(pd.date_range(df_events_day.index.min(), df_events_day.index.max(), freq='1s')).fillna(0)

In [12]:
df_events_day

Unnamed: 0,value
2021-05-05 00:04:02,2.0
2021-05-05 00:04:03,0.0
2021-05-05 00:04:04,0.0
2021-05-05 00:04:05,0.0
2021-05-05 00:04:06,0.0
...,...
2021-10-31 23:54:40,0.0
2021-10-31 23:54:41,0.0
2021-10-31 23:54:42,0.0
2021-10-31 23:54:43,0.0


In [13]:
fig = go.Figure(data=[go.Scatter(x=df_events_day.index, y=df_events_day['value'])],
                layout=dict(height=700, width=1500))
fig.show()

## Animate Data

<p>The plotting helper script contains an animation engine utilising Plotly. It is extremely simple to use. Here is an example where we aggregate data into 5 minute bins, and then visualize how motion sensors are triggered through 1 day. You can choose different time intervals, but please remember that higher time granularities can end up rendering a lot of frames and might lead to performance issues. In this example, the total number of frames is (60//5) * 24 = 288</p>

In [15]:

df_events_day = df_events[df_events.timestamp.dt.date.astype(str) == "2021-09-07"].copy()
df_events_day

Unnamed: 0,timestamp,deviceid
138435,2021-09-07 00:26:12.453436,30
138436,2021-09-07 00:26:12.453436,30
138437,2021-09-07 00:26:12.453436,30
150222,2021-09-07 00:07:15.304654,30
150223,2021-09-07 00:07:43.462599,30
...,...,...
1561018,2021-09-07 10:06:21.740995,33
1561019,2021-09-07 10:06:15.900765,22
1561020,2021-09-07 10:06:19.582178,32
1561021,2021-09-07 10:06:14.287156,47


In [16]:

df_events_day.timestamp = df_events_day.timestamp.dt.floor('1s')
df_events_day

Unnamed: 0,timestamp,deviceid
138435,2021-09-07 00:26:12,30
138436,2021-09-07 00:26:12,30
138437,2021-09-07 00:26:12,30
150222,2021-09-07 00:07:15,30
150223,2021-09-07 00:07:43,30
...,...,...
1561018,2021-09-07 10:06:21,33
1561019,2021-09-07 10:06:15,22
1561020,2021-09-07 10:06:19,32
1561021,2021-09-07 10:06:14,47


In [17]:

df_events_day.loc[:, 'b'] = 1
df_events_day

Unnamed: 0,timestamp,deviceid,b
138435,2021-09-07 00:26:12,30,1
138436,2021-09-07 00:26:12,30,1
138437,2021-09-07 00:26:12,30,1
150222,2021-09-07 00:07:15,30,1
150223,2021-09-07 00:07:43,30,1
...,...,...,...
1561018,2021-09-07 10:06:21,33,1
1561019,2021-09-07 10:06:15,22,1
1561020,2021-09-07 10:06:19,32,1
1561021,2021-09-07 10:06:14,47,1


In [18]:

df_events_day = df_events_day.groupby(['deviceid', 'timestamp']).sum()
df_events_day

Unnamed: 0_level_0,Unnamed: 1_level_0,b
deviceid,timestamp,Unnamed: 2_level_1
0,2021-09-07 06:02:37,25
0,2021-09-07 06:03:20,1
0,2021-09-07 06:07:44,1
0,2021-09-07 06:08:28,1
0,2021-09-07 06:13:07,18
...,...,...
55,2021-09-07 15:54:46,5
55,2021-09-07 15:55:28,1
55,2021-09-07 15:55:57,1
55,2021-09-07 15:56:30,2


In [None]:

df_events_day = df_events_day.pivot_table(index='timestamp', columns='deviceid', values='b')
df_events_day

In [None]:

df_events_day = df_events_day.reindex(pd.date_range(df_events_day.index.min().floor('1D'), df_events_day.index.max().ceil('1D'), freq='1s', closed='left')).fillna(0)
df_events_day

In [19]:
df_events_day = df_events[df_events.timestamp.dt.date.astype(str) == "2021-09-07"].copy()
df_events_day.timestamp = df_events_day.timestamp.dt.floor('1s')
df_events_day.loc[:, 'b'] = 1
df_events_day = df_events_day.groupby(['deviceid', 'timestamp']).sum()
df_events_day = df_events_day.pivot_table(index='timestamp', columns='deviceid', values='b')
df_events_day = df_events_day.reindex(pd.date_range(df_events_day.index.min().floor('1D'), df_events_day.index.max().ceil('1D'), freq='1s', closed='left')).fillna(0)

SyntaxError: invalid syntax (3230352238.py, line 1)

In [14]:
df_events_day.shape

(86400, 48)

In [9]:
df_events.timestamp = df_events.timestamp.dt.floor('1min')
df_events.loc[:, 'b'] = 1
df_events = df_events.groupby(['deviceid', 'timestamp']).sum()
df_events = df_events.pivot_table(index='timestamp', columns='deviceid', values='b')
df_events = df_events.reindex(pd.date_range(df_events.index.min().floor('1D'), df_events.index.max().ceil('1D'), freq='1s', closed='left')).fillna(0)

In [None]:
df_events

Error: Kernel is dead

In [39]:
frames = df_events_day.to_dict(orient='records')
ts = df_events_day.reset_index()[['index']].astype(str).to_dict(orient='records')

In [40]:
half = int(len(frames)/2)

In [41]:
plotting_obj = Plotting(bg_img=encoded_string, dims=(img.shape[1], img.shape[0]), df_devices=df_devices, scaling_factor=3)
plotting_obj.populate_data(frames[half:half+2000], ts[half:half+2000])
plotting_obj.run(renderer='browser')

Opening in existing browser session.


## Challenge Category 1
<p>For the first challenge, we want to solve a real-world problem of indoor device mapping. The floorplans we have provided you with already contain mapped devices, meaning they have defined locations on a floorplan. In reality, this takes a lot of time. We physically need to identify each device inside the building and place them on a floorplan. The objective of this exercise is to come up with ways to speed up this process. We have written down some ideas below, however feel free to be creative! You can come up with entirely new ideas of your own.</p>

#### Example Aproach:
<p>Let's assume the floorplan contains 500 devices. We can delete 400 devices, and use the occupancy events data to identify neighbours. Once the neighbours have been identified, we can simply use the data to locate the missing 400 devices! This means our engineers only need to locate 100 devices, and let the system run for N days and then find the rest.</p> 
<p>The Machine Learning aspect of the challenge comes from: 1. increasing the number of deleted devices, and 2. the smallest possible value of N that gives the best mappings. You can use Euclidean distance between the predicted location of the object and the actual location from our provided floorplans to determine accuracy of your algorithm. We have provided 5 different sites with a variety of device configurations and data, so be sure to properly create Training and Test Datasets!</p>


## Challenge Category 2
<p>The second challenge is more about providing value to the client. We have provided data from real-world buildings. These buildings are occupied according to certain predictable patterns. For example a school might only observe occupancy during the morning. A hospital or a shopping mall might observe occupancy throughout the day. The objective of this exercise is to determine how people move through buildings by combining spatial and temporal data. </p>

#### Example Aproach:
<p>We can define a graph network of devices, with the edge weights representing the distance between devices. Then depending on the proximity of devices and the correlation of events within a certain time-window, say 15 minutes, we can cluster the most frequently visited spaces in a building. By creating similar sequences across the day, week or month we can determine patterns that show us how the building is used at different times of the day, or different days of the week etc. We can also try to figure out which paths are dominant and which are used least frequently. Such information is extremely valuable to building owners and tenants.</p> 
<p>You can experiment with different windowing approaches. Try to think about what sort of metrics are important. Is it really worth investigaing how the occupancy changes at midnight, or is it worth understanding how people move through the building at 9AM? Also, does the pattern change during the week, for example do more people visit on Monday or Friday?</p>
<p>We would love to see your ideas on what is the best way to present the results of this type of data analysis to the customers. UI/UX experts, we're looking at you!</p>


## Challenge Category 3
<p>The last challenge is related to a type of sensor that is not common in smart buildings at the moment: audio sensors. We have availabe motion sensors that generate occupancy data, but we decided to augmnet that with audio data. We want to explore the utility of incorporating more data sources in a smart building.</p>
<p>The data collection is done at a garage, and contains 4 audio sensors placed in a rectangular grid. There is a lot of activity happening in the garage: people are walking, driving their car, or bicycles. The objective would be to identify these events by combining the audio streams from all the 4 sensors and by incorporating motion data to pinpoint where the activity was taking place.</p>

#### Example Aproach:
<p>This is an extremely open ended challenge. There are numerous ways this can be tackled. Clever audio signal processing, or using deep learning to detect events. The choice is yours.</p> 