# 1. Exploratory Data Analysis

## 1.1. Understanding the Project

The primary objective of this project is to predict and identify users' private information, including their **ID number**, **gender**, and **age**. This is achieved through the analysis of two types of data: **head and hand motion data**, and **traffic data**.

Data collection involved conducting experiments with **100 participants** who engaged in gameplay across **two sets**: **Beat Saber and Cooking Simulator** for the first set, and **Medal of Honor and Forklift Simulator** for the second set. Each set consisted of a **slow game** and a **fast game**. For the first set, Beat Saber was the fast game, and Cooking Simulator was the slow one. For the second set, Forklift Simulator was the fast game, and Medal of Honor was the slow one. Each participant played each game for multiple sessions, with a minimum of **10 minutes** per session. Participants could choose to start with either the fast or slow game based on their experience with the VR headset. The participants were divided into **two groups**, with each group situated in a separate room with similar conditions. The VR headset used for the experiments was the **Oculus Quest 2**.

Throughout the experiments, participants experienced diverse types of movements, such as **walking** and **joystick interactions**, depending on the nature of the selected game. Simultaneously, traffic data was recorded by connecting the VR headset to a capable device capable of capturing the traffic data while participants engaged in gameplay.

Among the initially planned **100 participants**, we opted to analyze the data from only **60 participants**. This decision was prompted by technical challenges and disruptions experienced by the remaining participants. Some individuals encountered technical issues during the experiments, while others had to discontinue their participation due to VR sickness. As a result, the dataset was narrowed down to the data collected from the 60 participants who completed the experiments successfully.


## 1.2. Understanding the Games
in this project project we selected 4 well know VR Games which Beat Saber, cooking simlator, medal of honor, and forklift somulator, this games can capture different movement patterns.
### 1.2.1. Beat Saber
![Beat Saber](https://www.donanimhaber.com/images/images/haber/116345/src/facebook-sanal-gerceklik-sirketi-beat-games-i-satin-aldi116345_0.jpg)
Image source :  https://www.donanimhaber.com/facebook-sanal-gerceklik-sirketi-beat-games-i-satin-aldi--116345

Beat Saber stands as a Virtual Reality rhythm game where the objective revolves around slashing beat cubes using a lightsaber as they approach [1]. This game is about maintaining rhythm, with players equipped with a lightsaber in each hand—one blue and one red. As the music initiates, blocks comes at the player with colored arrows glide , akin to other rhythm games. The player's task is to slash the blocks following the direction indicated by the arrows, all while navigating obstacles that come their way through body and lightsaber movements.

Success in Beat Saber hinges on precise timing; successful slashes contribute to an escalating combo multiplier and, consequently, a higher final score. While the initial gameplay may seem straightforward, numerous streaming videos reveal the potential for chaos to ensue rapidly. To accommodate a range of skill levels, each song features multiple difficulty levels, allowing beginners to easily engage and experts to revel in a challenging experience.

Sources : 

[1] :  https://www.meta.com/en-gb/experiences/2448060205267927/

[2] :  https://www.windowscentral.com/beat-saber-everything-we-know-about-vr-rhythm-game

### 1.2.2. Cooking Simluator
### 1.2.3. Medal of Honor
### 1.2.4. Forklift simulator

## 1.2. Descriptive Statistics

In [2]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 
import plotly
import os
import plotly.express as px

In [51]:
# Store all the files paths related to the participants data
all_files = []
data_path = '../data/raw/Raw_traffic_and_movement_data/'
for path, subdirs, files in os.walk(data_path):
    for name in files:
        all_files.append(path + '/' + name)

# Separating traffic data and movement data paths 
traffic_data = [x  for x in all_files if '_traffic.csv' in x]
movement_data = [x  for x in all_files if '_movement.csv' in x]

In [52]:
# Assign Numeric ID to participants 
participants = next(os.walk(data_path))[1]
participant_dict = {participant: idx + 1 for idx, participant in enumerate(participants)}
print(participant_dict)

{'group1_order1_user0': 1, 'group1_order1_user1': 2, 'group1_order1_user10': 3, 'group1_order1_user11': 4, 'group1_order1_user12': 5, 'group1_order1_user13': 6, 'group1_order1_user14': 7, 'group1_order1_user2': 8, 'group1_order1_user3': 9, 'group1_order1_user4': 10, 'group1_order1_user5': 11, 'group1_order1_user6': 12, 'group1_order1_user7': 13, 'group1_order1_user8': 14, 'group1_order1_user9': 15, 'group1_order2_user0': 16, 'group1_order2_user1': 17, 'group1_order2_user10': 18, 'group1_order2_user11': 19, 'group1_order2_user12': 20, 'group1_order2_user13': 21, 'group1_order2_user14': 22, 'group1_order2_user2': 23, 'group1_order2_user3': 24, 'group1_order2_user4': 25, 'group1_order2_user5': 26, 'group1_order2_user6': 27, 'group1_order2_user7': 28, 'group1_order2_user8': 29, 'group1_order2_user9': 30, 'group2_order1_user0': 31, 'group2_order1_user1': 32, 'group2_order1_user10': 33, 'group2_order1_user11': 34, 'group2_order1_user12': 35, 'group2_order1_user13': 36, 'group2_order1_user14'

In [54]:
# Print the Statistical Description for 5 Participants (fast and slow)
for i in movement_data[:10]:
    df = pd.read_csv(i)
    idx = i.split('/')
    speed = 'fast' if 'fast' in idx[-1] else 'slow'
    print(f'Statistical Description of participant {participant_dict[idx[4]]} - {speed} Movement')
    display(df.describe())

Statistical Description of participant 1 - fast Movement


Unnamed: 0,HeadPosX,HeadPosY,HeadPosZ,HeadOrientationW,HeadOrientationX,HeadOrientationY,HeadOrientationZ,time
count,73871.0,73871.0,73871.0,73871.0,73871.0,73871.0,73871.0,73871.0
mean,-0.073011,1.419461,-0.133501,0.853448,0.004862,-0.168108,0.026216,616.093601
std,0.159273,0.061764,0.258087,0.338726,0.042083,0.354022,0.028707,355.707639
min,-0.638,0.735,-0.99,-0.797,-0.481,-1.0,-0.232,0.0
25%,-0.179,1.421,-0.351,0.953,-0.018,-0.245,0.015,308.0405
50%,-0.071,1.429,-0.071,0.992,0.002,-0.025,0.028,616.097
75%,0.019,1.435,0.107,0.998,0.023,0.04,0.041,924.138
max,0.568,1.479,0.393,1.0,0.288,0.752,0.24,1232.178


Statistical Description of participant 1 - slow Movement


Unnamed: 0,HeadPosX,HeadPosY,HeadPosZ,HeadOrientationW,HeadOrientationX,HeadOrientationY,HeadOrientationZ,time
count,79913.0,79913.0,79913.0,79913.0,79913.0,79913.0,79913.0,79913.0
mean,0.082856,1.401169,0.66733,-0.136351,0.032191,0.905499,0.095999,666.467389
std,0.208084,0.087422,0.219664,0.329357,0.088946,0.115421,0.146681,384.80083
min,-0.616,0.69,0.071,-0.914,-0.242,0.405,-0.278,0.0
25%,-0.018,1.397,0.598,-0.355,-0.018,0.896,-0.041,333.216
50%,0.119,1.414,0.742,-0.129,0.019,0.948,0.132,666.45
75%,0.229,1.438,0.805,0.103,0.063,0.975,0.208,999.718
max,0.678,1.546,1.104,0.687,0.444,1.0,0.49,1332.952


Statistical Description of participant 2 - fast Movement


Unnamed: 0,HeadPosX,HeadPosY,HeadPosZ,HeadOrientationW,HeadOrientationX,HeadOrientationY,HeadOrientationZ,time
count,69422.0,69422.0,69422.0,69422.0,69422.0,69422.0,69422.0,69422.0
mean,-0.012797,1.475351,-0.038835,0.852061,-0.074982,-0.167686,-0.012848,578.961114
std,0.182488,0.230942,0.158999,0.337979,0.065435,0.345225,0.048956,334.261432
min,-1.848,0.0,-0.642,-0.735,-0.897,-0.999,-0.248,0.0
25%,-0.111,1.502,-0.149,0.94,-0.104,-0.222,-0.034,289.48925
50%,-0.038,1.52,-0.02,0.991,-0.082,-0.018,-0.004,578.9615
75%,0.104,1.533,0.074,0.996,-0.053,0.022,0.011,868.43375
max,0.789,1.656,0.725,1.0,0.37,0.538,0.598,1157.923


Statistical Description of participant 2 - slow Movement


Unnamed: 0,HeadPosX,HeadPosY,HeadPosZ,HeadOrientationW,HeadOrientationX,HeadOrientationY,HeadOrientationZ,time
count,55444.0,55444.0,55444.0,55444.0,55444.0,55444.0,55444.0,55444.0
mean,-0.117749,1.436281,-0.17445,-0.451855,0.111716,-0.437905,-0.098536,497.222756
std,0.572471,0.189291,0.431202,0.436453,0.144541,0.581953,0.178423,266.966421
min,-1.321,0.602,-0.925,-0.998,-0.628,-1.0,-0.82,34.873
25%,-0.658,1.451,-0.52,-0.706,0.004,-0.803,-0.242,266.02675
50%,0.105,1.487,-0.295,-0.593,0.141,-0.727,-0.13,497.23
75%,0.334,1.521,0.216,-0.388,0.214,0.00025,0.041,728.41625
max,0.885,1.585,0.994,0.855,0.686,1.0,0.333,959.653


Statistical Description of participant 3 - fast Movement


Unnamed: 0,HeadPosX,HeadPosY,HeadPosZ,HeadOrientationW,HeadOrientationX,HeadOrientationY,HeadOrientationZ,time
count,65618.0,65618.0,65618.0,65618.0,65618.0,65618.0,65618.0,65618.0
mean,0.059768,1.645056,-0.348529,0.851493,-0.040626,-0.176594,0.045702,547.228413
std,0.182366,0.069553,0.315707,0.336984,0.052013,0.349937,0.036278,315.946138
min,-0.594,0.946,-1.042,-0.722,-0.614,-0.999,-0.219,0.006
25%,-0.058,1.648,-0.607,0.936,-0.063,-0.251,0.031,273.62025
50%,0.026,1.656,-0.386,0.992,-0.034,-0.025,0.052,547.2295
75%,0.171,1.661,-0.106,0.997,-0.012,0.025,0.07,820.83875
max,0.658,1.698,0.559,1.0,0.135,0.593,0.194,1094.449


Statistical Description of participant 3 - slow Movement


Unnamed: 0,HeadPosX,HeadPosY,HeadPosZ,HeadOrientationW,HeadOrientationX,HeadOrientationY,HeadOrientationZ,time
count,37148.0,37148.0,37148.0,37148.0,37148.0,37148.0,37148.0,37148.0
mean,-0.318917,1.562906,-0.257345,0.879954,-0.202073,-0.270173,-0.030043,309.80459
std,0.10361,0.155403,0.087332,0.112355,0.177593,0.242836,0.088425,178.871974
min,-0.775,0.0,-1.577,0.363,-0.61,-0.878,-0.295,0.006
25%,-0.385,1.527,-0.288,0.81975,-0.323,-0.446,-0.095,154.9
50%,-0.338,1.584,-0.252,0.926,-0.232,-0.266,0.0,309.812
75%,-0.248,1.647,-0.219,0.962,-0.048,-0.111,0.034,464.70725
max,0.0,1.705,0.0,1.0,0.166,0.548,0.277,619.604


Statistical Description of participant 4 - fast Movement


Unnamed: 0,HeadPosX,HeadPosY,HeadPosZ,HeadOrientationW,HeadOrientationX,HeadOrientationY,HeadOrientationZ,time
count,77739.0,77739.0,77739.0,77739.0,77739.0,77739.0,77739.0,77739.0
mean,-0.131216,1.52893,-1.006319,0.32865,-0.00212,-0.080795,-0.002716,648.364897
std,0.247557,0.081036,0.483521,0.833296,0.033337,0.43528,0.023378,374.344633
min,-1.176,0.77,-2.052,-1.0,-0.426,-1.0,-0.346,0.0
25%,-0.248,1.521,-1.426,-0.747,-0.018,-0.204,-0.012,324.1505
50%,-0.111,1.544,-1.106,0.879,-0.001,-0.013,-0.002,648.401
75%,0.008,1.557,-0.532,0.998,0.016,0.059,0.008,972.5525
max,0.792,1.591,0.097,1.0,0.287,1.0,0.144,1296.704


Statistical Description of participant 4 - slow Movement


Unnamed: 0,HeadPosX,HeadPosY,HeadPosZ,HeadOrientationW,HeadOrientationX,HeadOrientationY,HeadOrientationZ,time
count,74327.0,74327.0,74327.0,74327.0,74327.0,74327.0,74327.0,74327.0
mean,-0.155172,1.47219,-0.023718,0.104574,0.015125,0.778556,0.083553,619.90863
std,0.201464,0.147104,0.240734,0.517105,0.104958,0.282938,0.131324,357.908962
min,-0.75,0.643,-1.257,-1.0,-0.367,-0.91,-0.264,0.0
25%,-0.277,1.473,-0.097,-0.354,-0.038,0.745,-0.015,309.975
50%,-0.128,1.505,-0.015,0.01,0.012,0.876,0.092,619.9
75%,-0.008,1.537,0.067,0.574,0.077,0.955,0.185,929.8745
max,0.369,1.597,0.996,1.0,0.547,1.0,0.426,1239.799


Statistical Description of participant 5 - fast Movement


Unnamed: 0,HeadPosX,HeadPosY,HeadPosZ,HeadOrientationW,HeadOrientationX,HeadOrientationY,HeadOrientationZ,time
count,64000.0,64000.0,64000.0,64000.0,64000.0,64000.0,64000.0,64000.0
mean,0.109736,1.539494,0.139459,-0.85176,0.044997,0.164378,0.011917,533.738204
std,0.187034,0.05728,0.212747,0.341629,0.04912,0.353831,0.031744,308.153621
min,-0.603,0.973,-0.471,-1.0,-0.271,-0.543,-0.137,0.014
25%,-0.008,1.543,-0.037,-0.997,0.023,-0.045,-0.007,266.87575
50%,0.087,1.547,0.137,-0.994,0.046,0.009,0.008,533.738
75%,0.212,1.552,0.314,-0.939,0.066,0.242,0.027,800.601
max,0.804,1.602,0.699,0.78,0.576,1.0,0.374,1067.464


Statistical Description of participant 5 - slow Movement


Unnamed: 0,HeadPosX,HeadPosY,HeadPosZ,HeadOrientationW,HeadOrientationX,HeadOrientationY,HeadOrientationZ,time
count,79942.0,79942.0,79942.0,79942.0,79942.0,79942.0,79942.0,79942.0
mean,-0.357395,1.503843,-0.383036,-0.292871,0.030118,0.279942,0.057808,666.747064
std,0.635892,0.094245,0.393907,0.554947,0.143357,0.683194,0.190669,384.927392
min,-1.629,0.745,-1.315,-1.0,-0.468,-1.0,-0.507,0.009
25%,-0.838,1.484,-0.621,-0.832,-0.049,-0.231,-0.046,333.39425
50%,-0.46,1.519,-0.381,-0.278,0.012,0.57,0.0345,666.732
75%,0.146,1.555,-0.168,0.07,0.1,0.917,0.192,1000.10275
max,1.172,1.616,0.867,1.0,0.557,1.0,0.49,1333.44


In [3]:
df = pd.read_csv('C:/Users/salim/vr_user_identification/data/processed/movement_slow_stat.csv')
df.head()

Unnamed: 0,HeadPosX_mean,HeadPosX_std,HeadPosX_min,HeadPosX_25%,HeadPosX_50%,HeadPosX_75%,HeadPosX_max,HeadPosY_mean,HeadPosY_std,HeadPosY_min,...,HeadOrientationY_75%,HeadOrientationY_max,HeadOrientationZ_mean,HeadOrientationZ_std,HeadOrientationZ_min,HeadOrientationZ_25%,HeadOrientationZ_50%,HeadOrientationZ_75%,HeadOrientationZ_max,ID_
0,0.082856,0.208084,-0.616,-0.018,0.119,0.229,0.678,1.401169,0.087422,0.69,...,0.975,1.0,0.095999,0.146681,-0.278,-0.041,0.132,0.208,0.49,group1_order1_user0
1,-0.117749,0.572471,-1.321,-0.658,0.105,0.334,0.885,1.436281,0.189291,0.602,...,0.00025,1.0,-0.098536,0.178423,-0.82,-0.242,-0.13,0.041,0.333,group1_order1_user1
2,-0.318917,0.10361,-0.775,-0.385,-0.338,-0.248,0.0,1.562906,0.155403,0.0,...,-0.111,0.548,-0.030043,0.088425,-0.295,-0.095,0.0,0.034,0.277,group1_order1_user10
3,-0.155172,0.201464,-0.75,-0.277,-0.128,-0.008,0.369,1.47219,0.147104,0.643,...,0.955,1.0,0.083553,0.131324,-0.264,-0.015,0.092,0.185,0.426,group1_order1_user11
4,-0.357395,0.635892,-1.629,-0.838,-0.46,0.146,1.172,1.503843,0.094245,0.745,...,0.917,1.0,0.057808,0.190669,-0.507,-0.046,0.0345,0.192,0.49,group1_order1_user12


## 1.3. Univariate Analysis:

## 1.4. Multivariate Analysis: