# Biometric Authentication of Smartphone Users with Support Vector Machines
**Math189R - Midterm Project**  
Nico Espinosa Dice  
*April, 2020*

## Importing Data

In [1]:
import pandas as pd
import numpy as np
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn import metrics

In [4]:
train_data = pd.read_csv("../Data/train.csv")
test_data = pd.read_csv("../Data/test.csv")
questions = pd.read_csv("../Data/questions.csv")
sample_submission = pd.read_csv("../Data/sampleSubmission.csv")

## Data Exploration

Verifies that the ID's provided in the Device column of train.csv correspond bijectively to the ID's in the QuizDevice column of questions.csv:

In [5]:
train_devices = train_data["Device"].tolist()
train_devices.sort()
train_devices = list(dict.fromkeys(train_devices))

quiz_devices = questions["QuizDevice"].tolist()
quiz_devices.sort()
quiz_devices = list(dict.fromkeys(quiz_devices))

train_devices == quiz_devices

True

Shape of datasets:

In [6]:
print("train_data shape:", train_data.shape)
print("test_data shape:", test_data.shape)

train_data shape: (29563983, 5)
test_data shape: (27007200, 5)


**Features of training data:**  
T = time (Unix time: miliseconds since 1/1/1970)  
X = acceleration measured in g on x co-ordinate  
Y = acceleration measured in g on y co-ordinate  
Z = acceleration measured in g on z co-ordinate  
DeviceId = Unique Id of the device that generated the samples

In [7]:
print(train_data.columns)

Index(['T', 'X', 'Y', 'Z', 'Device'], dtype='object')


## Data Preparation

The cell below is commented out because the entire dataset is too large and slow to work with for general testing purposes.

In [8]:
# Creates a dataset of only the X values: time, x-acceleration, y-acceleration, and z-acceleration
# X_train = train_data[["T", "X", "Y", "Z"]]
# X_test = test_data[["T", "X", "Y", "Z"]]

# # Creates a dataset of the target
# y_train = train_data[["Device"]]

In the cell below, we take a random subset of the entire dataset to train and test our data on.

In [9]:
smaller_data = train_data.sample(frac = 0.017)

In [10]:
selected_devices = [7, 8, 9]
three_devices = train_data.loc[train_data['Device'].isin(selected_devices)]

In [11]:
X_train, X_test, y_train, y_test = train_test_split(three_devices[["T", "X", "Y", "Z"]], three_devices.Device, test_size=0.3, random_state = 1)






In [12]:
# new shape
X_train.shape

(697977, 4)

Variance of features:

In [19]:
X_train.var()

T    9.527624e+18
X    7.083804e+00
Y    1.307122e+01
Z    1.911240e+01
dtype: float64

## Model Implementation

Create a classification SVM with a Radial Basis Function kernel

In [13]:
svm_model = svm.SVC(kernel='rbf')

Trains the svm on train_data

In [None]:
svm_model.fit(X_train, y_train)

## Model Prediction and Evaluation

Outputs prediction of svm for test_data inputs

In [None]:
y_pred = svm_model.predict(X_test)

Evaluates model's accuracy (i.e how many times the model predicts the correct classification)

In [None]:
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))

## Saving Model

Saves model to file

In [None]:
from joblib import dump, load
dump(svm_model, 'svm_2.joblib')

In [None]:
new_svm = load('svm_2.joblib')