<div class="alert alert-block alert-info">
 
Code from this notebook has been folded into the `build_features.py` source code under the function name `get_beacon_gps_intersection`.
    
</div>

# Intersection Between Beiwe-Recorded GPS and Beacon-Recorded IAQ
This notebook is meant to assess the the amount of data available for the intersection between GPS and IAQ.

In [28]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd

%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Data Import
As always, we start by importing the necessary data from each modality

## Beiwe GPS
GPS data are downsampled to every minute

In [34]:
gps = pd.read_csv("../data/processed/beiwe-gps-ux_s20.csv",parse_dates=["timestamp"],infer_datetime_format=True)
gps.head()

Unnamed: 0,timestamp,utc,lat,long,altitude,accuracy,beiwe
0,2020-05-06 01:00:00,2020-05-06T06:00:05.477,30.23705,-97.71051,135.77695,65.0,15tejjtw
1,2020-05-06 02:05:00,2020-05-06T07:05:39.725,30.23706,-97.71049,135.84357,65.0,15tejjtw
2,2020-05-06 02:06:00,2020-05-06T07:06:06.808,30.23706,-97.71049,135.84357,65.0,15tejjtw
3,2020-05-06 03:01:00,2020-05-06T08:01:07.179,30.23716,-97.71047,139.34561,64.00967,15tejjtw
4,2020-05-06 03:02:00,2020-05-06T08:02:00.546,30.23716,-97.71047,139.34561,32.00483,15tejjtw


## Beacon IAQ
Beacon data are downsampled to every 2-minutes given the minimum resolution is just a hair over 1 minute (at least for the utx000 data).

In [9]:
iaq = pd.read_csv("../data/processed/beacon-ux_s20.csv",parse_dates=["timestamp"],infer_datetime_format=True)
iaq.head()

Unnamed: 0,timestamp,tvoc,lux,no2,co,co2,pm1_number,pm2p5_number,pm10_number,pm1_mass,pm2p5_mass,pm10_mass,temperature_c,rh,beacon,beiwe,redcap
0,2020-06-08 13:00:00,67.766667,3.61488,3.526111,13.922047,,12.081799,11.458559,11.201085,0.74428,0.429834,1.935866,27.383333,46.586667,1,kyj367pi,10
1,2020-06-08 13:02:00,67.9625,3.64395,3.526111,13.906931,,12.157965,11.542477,11.28288,0.750738,0.40297,1.940782,27.390625,46.58125,1,kyj367pi,10
2,2020-06-08 13:04:00,68.847059,3.63516,3.529306,13.893371,,12.044653,11.436841,11.182763,0.742682,0.482686,1.937115,27.397059,46.597059,1,kyj367pi,10
3,2020-06-08 13:06:00,69.788889,3.58734,3.529677,13.874056,,12.01994,11.401453,11.147062,0.739928,0.52202,1.933971,27.402778,46.619444,1,kyj367pi,10
4,2020-06-08 13:08:00,70.552632,3.582777,3.530139,13.862026,,12.040436,11.431434,11.17738,0.742263,0.494546,1.936699,27.407895,46.639474,1,kyj367pi,10


# Resampling
Due to the different sampling rates, we have to downsample the gps data further

In [46]:
gps_resampled = pd.DataFrame()
for pt in gps["beiwe"].unique():
    gps_by_pt = gps[gps["beiwe"] == pt]
    gps_by_pt.set_index("timestamp",inplace=True)
    gps_by_pt = gps_by_pt.resample('2T').mean()
    gps_by_pt.reset_index(inplace=True)
    gps_by_pt.dropna(inplace=True)
    gps_by_pt["beiwe"] = pt
    gps_resampled = gps_resampled.append(gps_by_pt)

# Intersection
With the data imported, we define a function to handle the intersection between the two datasets based on the timestamp value. 

In [47]:
def gps_iaq_intersection(df1, df2, byid="beiwe", join_col="timestamp"):
    """Returns the dataframe of the inner merge of the two datasets"""
    
    merged = df1.merge(right=df2,on=[byid,join_col],how="inner")
    return merged

In [48]:
intersection = gps_iaq_intersection(gps_resampled,iaq)

# Comparison
Comparing the length and amount of data from the merge to the original datasets.

## Number of DataPoints

In [49]:
print(f"Number of Datapoints:\n\tGPS:\t{len(gps)}\n\tIAQ:\t{len(iaq)}\n\tInt:\t{len(intersection)}")

Number of Datapoints:
	GPS:	554860
	IAQ:	1398554
	Int:	134643


<div class="alert alert-block alert-warning">
 
Resampling results in 134,643 data points in the intersection while using the raw data results in only 87,189. **Be sure to resample**.
    
</div>

## Number of Participants

In [40]:
byid="beiwe"
print(f"Number of Participants:\n\tGPS:\t{len(gps[byid].unique())}\n\tIAQ:\t{len(iaq[byid].unique())}\n\tInt:\t{len(intersection[byid].unique())}")

Number of Participants:
	GPS:	52
	IAQ:	25
	Int:	23


<div class="alert alert-block alert-success">
 
Looks like we only lose two participants from the IEQ dataset.
    
</div>