# Resampling

The amount of noise in the data requires the use of a smoothing process to reduce interference. A method to reduce high-frequency noise is resampling. Resampling methods refer to the process
of changing the frequency of some data to produce a data set with a different cadence. The data set analyzed has several RSSI values for each second. However, this granularity is not necessary for localization purposes since people cannot completely change position in such a short time. Therefore, we can reduce some of the interference by lowering the frequency of each measurement.

## Downsampling

The process of decreasing the frequency of a time series is called downsampling. Two main aspects are fundamental in this process: the frequency to which resample the data and the summary
statistics to group the data, i.e., an aggregate function.

First of all let's read the cleaned data obtained in the data preparation phase.

In [None]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
df = pd.read_csv('../dataset/clean_data.csv')

Then we need the set the time index that will be used by the resample function.

In [None]:
df['Time'] = df['Time'].apply(pd.Timestamp)

### Frequency
The first decision is the resolution of the resample, i.e., the new frequency of the measurements.
Since the data set contains RSSI values from a smartphone carried by someone, an interval of few seconds might be suitable. We will try with a one second frequency and a two seconds frequency

### Aggregation function
The second decision regards the aggregation function. There are different options like sum, mean, median, but in the context of indoor localization mean and median seem the most appropriate.

#### One second frequency 
Line 1 shows the resampling method to produce a new time series with a one-second frequency. The RSSI value of each is computed taking the means over each second interval.

In [None]:
mean_df_one = df.resample('1S', on='Time').mean()

mean_df_one

It is interesting to notice that the number of rows is reduced from 13.584 to 1201, showing that the initial data set contains an average of 11 RSSI values for each second.

In [None]:
mean_df_one.plot(title='Resample 1 second using mean', figsize=(20, 12))

Line 1 shows the use of the resampling method to produce a new time series with a one-second frequency. The RSSI value of each interval is computed taking the medians over each second window.

In [None]:
median_df_one = df.resample('1S', on='Time').median()
median_df_one.plot(title='Resample 1 second using median', figsize=(20, 12))

We compare the two downsampled data obtained using a line chart.

In [None]:
plt.figure(figsize=(20, 12))

plt.plot(mean_df_one['rssi'], label='Mean')
plt.plot(median_df_one['rssi'], linestyle='dotted',  label='Median')

plt.legend()
plt.xlabel('Time')
plt.ylabel('RSSI')

The comparison shows that using the mean as the aggregate function is more effective in reducing the interference. Specifically, the downsample based on the median still contains high-frequency noise.

#### Two seconds frequency
Line 1 shows the resampling method to produce a new time series with a two-second frequency. The RSSI value of each is computed taking the means over an interval of two seconds.

In [None]:
mean_df_two = df.resample('2S', on='Time').mean()

mean_df_two.plot(title='Resample 2 seconds using mean', figsize=(20, 12))

Line 1 shows the use of the resampling method to produce a new time series with a two-second frequency. The RSSI value of each interval is computed taking the medians over a window of two seconds.

In [None]:
median_df_two = df.resample('2S', on='Time').median()
median_df_two.plot(title='Resample 2 seconds using median', figsize=(20, 12))

We compare the two downsampled data obtained using a line chart.

In [None]:
plt.figure(figsize=(20, 12))

plt.plot(mean_df_two['rssi'], label='Mean')
plt.plot(median_df_two['rssi'], linestyle='dotted', label='Median')

plt.legend()
plt.xlabel('Time')
plt.ylabel('RSSI')

As noticed before, the downsample based on the mean function  provides a signal with less interference and a more smoothed line than the median-based downsample.

### Results

Indoor localization applications usually require a precision that allows us to use both the onesecond and the two seconds frequency. However, for clarity purposes, we choose to use the two seconds frequency since it produces more readable graphs thanks to a data set with fewer rows.
Moreover, based on the results obtained, we decide to use the downsample applying the mean as the aggregate function.

In [None]:
mean_df_two

Saving the downsampled dataframe into a *.csv* file.

In [None]:
mean_df_two.to_csv('../dataset/resample_mean.csv', index=True)