# Training on 2 classes

In this experiment, we trained on 2 classes, `all_off` (0), where all the communication is off, and `all_on` (1), where everything is on.

We tested in following conditions:

Number of test cases: 1

Location: inside, in kitchen, next to a computer and several apliances 

Communication modules (internal producers of electromagnetic emissions)
- wifi
- bluetooth
- mobile antenna

Unfortunately, due to miscommunication in our team, `all_off` phase did not completely turn off mobile antenna (we turned off mobile data, but did not turn on airplane mode)

In [134]:
import glob
import csv
import pandas as pd
import re
import numpy as np

csv_path = ""
data_path = "../data/interim/2_stages"
files = glob.glob(f"{data_path}/*uncalibrated.csv")
data_uncalibrated = [pd.read_csv(file, delimiter=',', index_col=0) for file in files]
classes = [int(re.search('stage_.+?', filename).group(0).split('_')[-1]) for filename in files]
n_of_train_cases = len(classes)
print(f"{n_of_train_cases = }", classes)
print(data_uncalibrated[0])
print(len(data_uncalibrated))

n_of_train_cases = 2 [6, 1]
        X_UnCal    Y_UnCal    Z_UnCal     X_Bias    Y_Bias     Z_Bias  \
0    -56.449398  45.408398 -93.073800 -17.579866  46.09261 -32.968437   
1    -55.400200  46.457600 -92.402800 -17.579866  46.09261 -32.968437   
2    -56.974000  44.883800 -93.415400 -17.579866  46.09261 -32.968437   
3    -47.579998  47.726400 -86.180800 -17.579866  46.09261 -32.968437   
4    -56.302998  44.908200 -93.146996 -17.579866  46.09261 -32.968437   
...         ...        ...        ...        ...       ...        ...   
5993 -34.904198  50.398197 -80.239395 -17.579866  46.09261 -32.968437   
5994 -35.599598  50.227398 -79.946600 -17.579866  46.09261 -32.968437   
5995 -35.050600  49.715000 -79.800200 -17.579866  46.09261 -32.968437   
5996 -35.855800  50.117600 -79.226800 -17.579866  46.09261 -32.968437   
5997 -34.050198  50.434800 -79.275600 -17.579866  46.09261 -32.968437   

      time_uncalibrated  
0          1.671096e+12  
1          1.671096e+12  
2          1.6710

We are using uncalibrated sensor values. Since the sampling rate during `all_on` phase was 2 times higher, we corrected the data.

**Estimated sampling rate: 200Hz**

In [135]:
def to_numpy(df: pd.DataFrame):
    return df[['X_UnCal', 'Y_UnCal', 'Z_UnCal']].values

In [136]:
class_0_3d = to_numpy(data_uncalibrated[0])
class_1_3d = to_numpy(data_uncalibrated[1][::2])

# class_0_3d
print(len(class_0_3d))
print(len(class_1_3d))

5998
5998


We generated sequences of fixed window size (1000 readings, 5 seconds) with stride 100

In [137]:
def subsequences(ts, window, stride=2):
    assert ts.shape[1] == 3
    return np.lib.stride_tricks.sliding_window_view(ts, (window, ts.shape[1]))[:,0,:,:][::stride]

test = np.array([
    [0, 1, 2, 3, 4, 5, 6, 7, 8],
    [9, 10, 11, 12, 13, 14, 15, 16, 17], 
    [18,19,20,21,22,23,24,25,26]]).transpose()
print(test)
result = subsequences(test, window=4, stride=3)
print(result.shape)
print(result)

[[ 0  9 18]
 [ 1 10 19]
 [ 2 11 20]
 [ 3 12 21]
 [ 4 13 22]
 [ 5 14 23]
 [ 6 15 24]
 [ 7 16 25]
 [ 8 17 26]]
(2, 4, 3)
[[[ 0  9 18]
  [ 1 10 19]
  [ 2 11 20]
  [ 3 12 21]]

 [[ 3 12 21]
  [ 4 13 22]
  [ 5 14 23]
  [ 6 15 24]]]


We have a dataset of 50 test cases from every testing mode (`all_on` and `all_off`), every dataset a matrix `(1000, 3)` as `(frequenceies, spatial_dimension_xyz)`

In [138]:
test_cases_0 = subsequences(class_0_3d, window=1000, stride=100)
test_cases_1 = subsequences(class_1_3d, window=1000, stride=100)
print(test_cases_0.shape)
print(test_cases_1.shape)

(50, 1000, 3)
(50, 1000, 3)


We are using `rfftn`, which is `N-dimensional discrete Fourier Transform for real input`

In [139]:
from scipy.fft import rfftn
def fourier(df, treshold=10):
    return rfftn(df, axes=0, norm="forward")[treshold:]

In [140]:
from scipy.fft import rfftn

rfftn(class_0_3d, axes=0, norm="forward")

array([[-3.89750882e+01+0.j        ,  4.85352078e+01+0.j        ,
        -8.29530468e+01+0.j        ],
       [-1.20560617e+00+1.42817799j, -2.18458987e-01+0.60430655j,
        -5.93090107e-01+1.31465085j],
       [-8.87296186e-01+0.82105826j, -1.83730692e-01+0.21872908j,
        -5.54312004e-01+0.36532047j],
       ...,
       [-1.09438635e-02+0.03007365j, -5.58393800e-04-0.00222816j,
         1.17132753e-02+0.01659711j],
       [ 4.56946178e-02+0.00367642j,  8.05905579e-03+0.00622745j,
         5.55719169e-02+0.01617404j],
       [ 9.81268109e-02+0.j        ,  2.96477344e-02+0.j        ,
         7.45587461e-02+0.j        ]])

In [141]:
from scipy.fft import rfftn
cut = 20

X_0 = np.array([fourier(test_case, cut) for test_case in test_cases_0])
X_1 = np.array([fourier(test_case, cut) for test_case in test_cases_1])

Y_0 = np.zeros(X_0.shape[0])
Y_1 = np.ones(X_1.shape[0])

print(Y_0.shape)
print(Y_1.shape)

print(X_0.shape)
print(X_1.shape)

(50,)
(50,)
(50, 481, 3)
(50, 481, 3)


Fourier transform returns complex values, we keep just real ones

In [142]:
X = np.concatenate((X_0, X_1), axis=0)
Y = np.concatenate((Y_0, Y_1), axis=0)

# to real values
X = np.real(X)

print(X.shape)
print(Y.shape)


(100, 481, 3)
(100,)


2 methods for projecting into 1-dim test cases:
- just pick one of 3 dims
- flatten

In [143]:

# pick dimension
X_dim_0 = X[:,:,1]
print(X_dim_0.shape)

# flatten
X_flatten = np.array([el.flatten('F') for el in X])
print(X_flatten.shape)

X_to_train = X_dim_0
# X_to_train = X_flatten

(100, 481)
(100, 1443)


## one dimension, fourier, svm

In [144]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn import svm

X_train, X_test, y_train, y_test = train_test_split(X_to_train, Y, test_size=0.33, random_state=42)
print(y_train.shape)
clf = svm.SVC(decision_function_shape='ovo')
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(sum([1 for i in range(len(y_pred)) if y_pred[i] == y_test[i]])/len(y_pred))

classes_real = ["all_off", "all_on"]
classes_pred = ["all_off_predicted", "all_on_predicted"]
pd.DataFrame(confusion_matrix(y_test, y_pred), classes_real, classes_pred)

(67,)
0.6060606060606061


Unnamed: 0,all_off_predicted,all_on_predicted
all_off,6,13
all_on,0,14


## 2 classes, fourier + PCA (1443 -> 100 dim)  + SVM

In [145]:
from sklearn.decomposition import PCA

print(X_flatten.shape)

pca = PCA(n_components=100)
X_PCA =  pca.fit(X_flatten).transform(X_flatten)
print(X_PCA.shape)

(100, 1443)
(100, 100)


In [146]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn import svm

X_train, X_test, y_train, y_test = train_test_split(X_PCA, Y, test_size=0.33, random_state=42)
print(y_train.shape)
clf = svm.SVC(decision_function_shape='ovo')
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(sum([1 for i in range(len(y_pred)) if y_pred[i] == y_test[i]])/len(y_pred))

classes_real = ["all_off", "all_on"]
classes_pred = ["all_off_predicted", "all_on_predicted"]
pd.DataFrame(confusion_matrix(y_test, y_pred), classes_real, classes_pred)

(67,)
0.5757575757575758


Unnamed: 0,all_off_predicted,all_on_predicted
all_off,5,14
all_on,0,14


## 2 classes, Fourier + flattening + RF

Fourier analysis. Cut first 20 frequencies. Flatten matrix (freq, 3). Random forrest of depth 2

In [147]:
from sklearn.ensemble import RandomForestClassifier
X_train, X_test, y_train, y_test = train_test_split(X_flatten, Y, test_size=0.33)
clf = RandomForestClassifier(max_depth=2)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(sum([1 for i in range(len(y_pred)) if y_pred[i] == y_test[i]])/len(y_pred))

classes_real = ["all_off", "all_on"]
classes_pred = ["all_off_predicted", "all_on_predicted"]
pd.DataFrame(confusion_matrix(y_test, y_pred), classes_real, classes_pred)

0.9696969696969697


Unnamed: 0,all_off_predicted,all_on_predicted
all_off,16,1
all_on,0,16


## Remarks
- RF results could be overfitted (entire dataset from one 60min sampling session)
- dimensionality reduction on test-case level is TBD. Could do PCA on every single test case
- this should be tested on multiple samplings, multiple phones 