### WiFi Data Processing ###

After I successfully derived the WiFi data out of the platform by using API calls, the WiFi data will be processed and presented in this python notebook. Because the API call was edited to only take the traces from the TPM building, and were only added when they met several conditions, cleaning the data set is just a small work. 

In [18]:
import pandas as pd
from datetime import datetime

After importing the necessary libraries, the dataframe is read and cleaned for NA values and duplicates. Also the dates are converted to Date Objects, by doing so several functions can be applied. 

In [19]:
df = pd.read_csv("my_csv_goed2.csv")
df.dropna(subset = ["location"], inplace=True)

df = df[(df[['associationTime']] != 0).all(axis=1)]
df = df[(df[['firstSeenTime']] != 0).all(axis=1)]
df = df[(df[['updateTime']] != 0).all(axis=1)]

df["associationTime"] = df["associationTime"].astype('int64')
df["firstSeenTime"] = df["firstSeenTime"].astype('int64')
df["updateTime"] = df["updateTime"].astype('int64')

def todate(x):
    x = x / 1000
    ##return datetime.fromtimestamp(x).strftime("%A, %B %d, %Y %I:%M:%S")
    return datetime.fromtimestamp(x).strftime("%A, %B %d, %Y")

df["associationTime"] = df["associationTime"].apply(todate)
df["firstSeenTime"] = df["firstSeenTime"].apply(todate)
df["updateTime"] = df["updateTime"].apply(todate)

df["associationTime"] = pd.to_datetime(df.associationTime)
df["firstSeenTime"] = pd.to_datetime(df.firstSeenTime)
df["updateTime"] = pd.to_datetime(df.updateTime)
df["macAddressMonth"] = df["macAddressMonth"].str[0:8]

df = df.drop_duplicates()
df = df.sort_values(by=['macAddressMonth', 'associationTime'])

A snippet of the eventual data frame is presented in the cell below.

In [20]:
df.head()

Unnamed: 0,apName,associationTime,deviceType,firstSeenTime,location,macAddressMonth,updateTime
44170,A-31-0-081,2021-07-14,Unclassified,2021-05-18,TUDelft > 31-TBM > 3e Verdieping,00147477,2021-07-14
606,A-31-0-015,2021-07-08,Unclassified,2021-07-07,TUDelft > 31-TBM > Beganegrond,0022A717,2021-07-09
196,A-31-0-011,2021-07-08,"iPhone12,1",2021-06-23,TUDelft > 31-TBM > Beganegrond,0126EBEC,2021-07-14
260,A-31-0-044,2021-07-09,"iPad8,9",2021-06-28,TUDelft > 31-TBM > 1e Verdieping,0190078B,2021-07-09
1125,A-31-0-074,2021-07-13,Unclassified,2021-07-13,TUDelft > 31-TBM > 3e Verdieping,0195D253,2021-07-13


Out of all the available traces (4705), 3956 traces can be ascribed to devices that were traced more than once.

In [13]:
df[df.groupby('macAddressMonth')['macAddressMonth'].transform('size') > 2].sort_values("macAddressMonth")

Unnamed: 0,apName,associationTime,deviceType,firstSeenTime,location,macAddressMonth,updateTime
19001,A-31-0-080,2021-07-14 12:04:30,Xiaomi Device,2021-05-11 08:48:35,TUDelft > 31-TBM > 3e Verdieping,0425D910,2021-07-14 12:04:32
17295,A-31-0-067,2021-07-14 11:19:11,Xiaomi Device,2021-05-11 08:48:35,TUDelft > 31-TBM > 2e Verdieping,0425D910,2021-07-14 11:46:12
43321,A-31-0-066,2021-07-14 03:41:00,Xiaomi Device,2021-05-11 08:48:35,TUDelft > 31-TBM > 2e Verdieping,0425D910,2021-07-14 03:41:02
18252,A-31-0-067,2021-07-14 11:19:11,Xiaomi Device,2021-05-11 08:48:35,TUDelft > 31-TBM > 2e Verdieping,0425D910,2021-07-14 12:01:17
59860,A-31-0-024,2021-07-14 05:46:48,Xiaomi Device,2021-05-11 08:48:35,TUDelft > 31-TBM > Beganegrond,0425D910,2021-07-14 05:51:49
...,...,...,...,...,...,...,...
47925,A-31-0-064,2021-07-14 11:49:23,Unclassified,2021-06-23 12:09:35,TUDelft > 31-TBM > 2e Verdieping,FBB514F2,2021-07-14 04:16:13
32765,A-31-0-064,2021-07-14 11:49:23,Unclassified,2021-06-23 12:09:35,TUDelft > 31-TBM > 2e Verdieping,FBB514F2,2021-07-14 02:01:55
27085,A-31-0-064,2021-07-14 11:49:23,Unclassified,2021-06-23 12:09:35,TUDelft > 31-TBM > 2e Verdieping,FBB514F2,2021-07-14 01:16:42
42283,A-31-0-064,2021-07-14 11:49:23,Unclassified,2021-06-23 12:09:35,TUDelft > 31-TBM > 2e Verdieping,FBB514F2,2021-07-14 03:32:26


However, it should be noted that traces should be detected multiple days. To do so, clusters of days will be used.

In [40]:
df_07 = df[df['associationTime'] == "2021-07-07"].drop_duplicates(subset=['macAddressMonth']) # wednesday
df_08 = df[df['associationTime'] == "2021-07-08"].drop_duplicates(subset=['macAddressMonth']) # thursday
df_09 = df[df['associationTime'] == "2021-07-09"].drop_duplicates(subset=['macAddressMonth']) # friday
df_10 = df[df['associationTime'] == "2021-07-10"].drop_duplicates(subset=['macAddressMonth']) # saturday
df_11 = df[df['associationTime'] == "2021-07-11"].drop_duplicates(subset=['macAddressMonth']) # sunday
df_12 = df[df['associationTime'] == "2021-07-12"].drop_duplicates(subset=['macAddressMonth']) # monday
df_13 = df[df['associationTime'] == "2021-07-13"].drop_duplicates(subset=['macAddressMonth']) # tuesday
df_14 = df[df['associationTime'] == "2021-07-14"].drop_duplicates(subset=['macAddressMonth']) # wednesday

In [51]:
df_cleaned = pd.DataFrame()
df_cleaned = df_cleaned.append([df_07, df_08, df_09, df_10, df_11, df_12, df_13, df_14])

In [61]:
df_cleaned.head()

Unnamed: 0,apName,associationTime,deviceType,firstSeenTime,location,macAddressMonth,updateTime
587,A-31-0-050,2021-07-07,Unclassified,2021-07-07,TUDelft > 31-TBM > 1e Verdieping,05B220EA,2021-07-07
607,A-31-0-005,2021-07-07,Unclassified,2021-07-07,TUDelft > 31-TBM > Beganegrond,0ACE29FA,2021-07-07
203,A-31-0-011,2021-07-07,"iPhone11,2",2021-06-24,TUDelft > 31-TBM > Beganegrond,0D807256,2021-07-09
416,A-31-0-061,2021-07-07,Intel-Device,2021-07-05,TUDelft > 31-TBM > 2e Verdieping,0F03E157,2021-07-07
285,A-31-0-011,2021-07-07,Unclassified,2021-06-29,TUDelft > 31-TBM > Beganegrond,0F3D02CF,2021-07-07


First we check how many people were traced during the week of 07 July - 14 July. 

In [62]:
df_number = df_cleaned.drop_duplicates(subset=['macAddressMonth'])
df_number.head()

Unnamed: 0,apName,associationTime,deviceType,firstSeenTime,location,macAddressMonth,updateTime
587,A-31-0-050,2021-07-07,Unclassified,2021-07-07,TUDelft > 31-TBM > 1e Verdieping,05B220EA,2021-07-07
607,A-31-0-005,2021-07-07,Unclassified,2021-07-07,TUDelft > 31-TBM > Beganegrond,0ACE29FA,2021-07-07
203,A-31-0-011,2021-07-07,"iPhone11,2",2021-06-24,TUDelft > 31-TBM > Beganegrond,0D807256,2021-07-09
416,A-31-0-061,2021-07-07,Intel-Device,2021-07-05,TUDelft > 31-TBM > 2e Verdieping,0F03E157,2021-07-07
285,A-31-0-011,2021-07-07,Unclassified,2021-06-29,TUDelft > 31-TBM > Beganegrond,0F3D02CF,2021-07-07


834 People were traced during the week.

Now we can check which Mac Addressed are traced more than once, as can be seen below. 

In [53]:
df_cleaned[df_cleaned.groupby('macAddressMonth')['macAddressMonth'].transform('size') > 1]

Unnamed: 0,apName,associationTime,deviceType,firstSeenTime,location,macAddressMonth,updateTime
411,A-31-0-060,2021-07-07,Intel-Device,2021-07-05,TUDelft > 31-TBM > 2e Verdieping,167DBEBB,2021-07-07
49,A-31-0-057,2021-07-07,Intel-Device,2021-05-12,TUDelft > 31-TBM > 2e Verdieping,715C9BB3,2021-07-07
97,A-31-0-024,2021-07-07,Unclassified,2021-05-26,TUDelft > 31-TBM > Beganegrond,98A7D9A2,2021-07-07
600,A-31-0-011,2021-07-07,Apple-Device,2021-07-07,TUDelft > 31-TBM > Beganegrond,A0974C41,2021-07-07
94,A-31-0-045,2021-07-07,Unclassified,2021-05-26,TUDelft > 31-TBM > 1e Verdieping,BD7AE397,2021-07-07
...,...,...,...,...,...,...,...
11872,A-31-0-016,2021-07-14,"iPhone11,2",2021-07-02,TUDelft > 31-TBM > Beganegrond,CB26B9CD,2021-07-14
4702,A-31-0-075,2021-07-14,Unclassified,2021-05-19,TUDelft > 31-TBM > 3e Verdieping,CE7B8663,2021-07-14
48589,A-31-0-069,2021-07-14,Unclassified,2021-06-17,TUDelft > 31-TBM > 3e Verdieping,F84A30A9,2021-07-14
25055,A-31-0-052,2021-07-14,Liteon Device,2021-05-07,TUDelft > 31-TBM > 1e Verdieping,FA9679B6,2021-07-14


Out of the 834 people traced, 122 people were traced on another day.

We also checked if people were present more than one day, however this was not the case.

In [63]:
df_cleaned[df_cleaned.groupby('macAddressMonth')['macAddressMonth'].transform('size') > 2]

Unnamed: 0,apName,associationTime,deviceType,firstSeenTime,location,macAddressMonth,updateTime
