# 1. Exploratory Data Analysis

## 1.1. Understanding the Project

The primary objective of this project is to predict and identify users' private information, including their **ID number**, **gender**, and **age**. This is achieved through the analysis of two types of data: **head and hand motion data**, and **traffic data**.

Data collection involved conducting experiments with **100 participants** who engaged in gameplay across **two sets**: **Beat Saber and Cooking Simulator** for the first set, and **Medal of Honor and Forklift Simulator** for the second set. Each set consisted of a **slow game** and a **fast game**. For the first set, Beat Saber was the fast game, and Cooking Simulator was the slow one. For the second set, Forklift Simulator was the fast game, and Medal of Honor was the slow one. Each participant played each game for multiple sessions, with a minimum of **10 minutes** per session. Participants could choose to start with either the fast or slow game based on their experience with the VR headset. The participants were divided into **two groups**, with each group situated in a separate room with similar conditions. The VR headset used for the experiments was the **Oculus Quest 2**.

Throughout the experiments, participants experienced diverse types of movements, such as **walking** and **joystick interactions**, depending on the nature of the selected game. Simultaneously, traffic data was recorded by connecting the VR headset to a capable device capable of capturing the traffic data while participants engaged in gameplay.

Among the initially planned **100 participants**, we opted to analyze the data from only **60 participants**. This decision was prompted by technical challenges and disruptions experienced by the remaining participants. Some individuals encountered technical issues during the experiments, while others had to discontinue their participation due to VR sickness. As a result, the dataset was narrowed down to the data collected from the 60 participants who completed the experiments successfully.


## 1.2. Understanding the Games
in this project project we selected 4 well know VR Games which Beat Saber, cooking simlator, medal of honor, and forklift somulator, this games can capture different movement patterns.
### 1.2.1. Beat Saber
![Beat Saber](https://www.donanimhaber.com/images/images/haber/116345/src/facebook-sanal-gerceklik-sirketi-beat-games-i-satin-aldi116345_0.jpg)
Image source :  https://www.donanimhaber.com/facebook-sanal-gerceklik-sirketi-beat-games-i-satin-aldi--116345

Beat Saber stands as a Virtual Reality rhythm game where the objective revolves around slashing beat cubes using a lightsaber as they approach [1]. This game is about maintaining rhythm, with players equipped with a lightsaber in each hand—one blue and one red. As the music initiates, blocks comes at the player with colored arrows glide , akin to other rhythm games. The player's task is to slash the blocks following the direction indicated by the arrows, all while navigating obstacles that come their way through body and lightsaber movements.

Success in Beat Saber hinges on precise timing; successful slashes contribute to an escalating combo multiplier and, consequently, a higher final score. While the initial gameplay may seem straightforward, numerous streaming videos reveal the potential for chaos to ensue rapidly. To accommodate a range of skill levels, each song features multiple difficulty levels, allowing beginners to easily engage and experts to revel in a challenging experience.

Sources : 

[1] :  https://www.meta.com/en-gb/experiences/2448060205267927/

[2] :  https://www.windowscentral.com/beat-saber-everything-we-know-about-vr-rhythm-game

### 1.2.2. Cooking Simluator
### 1.2.3. Medal of Honor
### 1.2.4. Forklift simulator

## 1.2. Descriptive Statistics

In [12]:
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 
import plotly
import os
import plotly.express as px

In [9]:
# Store all the files paths related to the participants data
all_files = []
data_path = '../data/raw/Raw_traffic_and_movement_data/'
for path, subdirs, files in os.walk(data_path):
    for name in files:
        all_files.append(path + '/' + name)

# Separating traffic data and movement data paths 
traffic_data = [x  for x in all_files if '_traffic' in x]
movement_data = [x  for x in all_files if '_movemen' in x]

In [10]:
# Assign Numeric ID to participants 
participants = next(os.walk(data_path))[1]
participant_dict = {participant: idx + 1 for idx, participant in enumerate(participants)}
print(participant_dict)

{'group1_order1_user0': 1, 'group1_order1_user1': 2, 'group1_order1_user10': 3, 'group1_order1_user11': 4, 'group1_order1_user12': 5, 'group1_order1_user13': 6, 'group1_order1_user14': 7, 'group1_order1_user2': 8, 'group1_order1_user3': 9, 'group1_order1_user4': 10, 'group1_order1_user5': 11, 'group1_order1_user6': 12, 'group1_order1_user7': 13, 'group1_order1_user8': 14, 'group1_order1_user9': 15, 'group1_order2_user0': 16, 'group1_order2_user1': 17, 'group1_order2_user10': 18, 'group1_order2_user11': 19, 'group1_order2_user12': 20, 'group1_order2_user13': 21, 'group1_order2_user14': 22, 'group1_order2_user2': 23, 'group1_order2_user3': 24, 'group1_order2_user4': 25, 'group1_order2_user5': 26, 'group1_order2_user6': 27, 'group1_order2_user7': 28, 'group1_order2_user8': 29, 'group1_order2_user9': 30, 'group2_order1_user0': 31, 'group2_order1_user1': 32, 'group2_order1_user10': 33, 'group2_order1_user11': 34, 'group2_order1_user12': 35, 'group2_order1_user13': 36, 'group2_order1_user14'

In [11]:
# Print the Statistical Description for 5 Participants (fast and slow)
for i in movement_data[:10]:
    df = pd.read_csv(i)
    idx = i.split('/')
    speed = 'fast' if 'fast' in idx[-1] else 'slow'
    print(f'Statistical Description of participant {participant_dict[idx[4]]} - {speed} Movement')
    display(df.describe())

KeyError: ''

In [18]:
df = pd.read_csv('C:/Users/salim/vr_user_identification/data/processed/movement_slow_stat.csv')
df.head()

Unnamed: 0,HeadPosX_count,HeadPosX_mean,HeadPosX_std,HeadPosX_min,HeadPosX_25%,HeadPosX_50%,HeadPosX_75%,HeadPosX_max,HeadPosY_count,HeadPosY_mean,...,HeadOrientationZ_max,time_count,time_mean,time_std,time_min,time_25%,time_50%,time_75%,time_max,IDs_
0,79913.0,0.082856,0.208084,-0.616,-0.018,0.119,0.229,0.678,79913.0,1.401169,...,0.49,79913.0,666.467389,384.80083,0.0,333.216,666.45,999.718,1332.952,group1_order1_user0
1,55444.0,-0.117749,0.572471,-1.321,-0.658,0.105,0.334,0.885,55444.0,1.436281,...,0.333,55444.0,497.222756,266.966421,34.873,266.02675,497.23,728.41625,959.653,group1_order1_user1
2,37148.0,-0.318917,0.10361,-0.775,-0.385,-0.338,-0.248,0.0,37148.0,1.562906,...,0.277,37148.0,309.80459,178.871974,0.006,154.9,309.812,464.70725,619.604,group1_order1_user10
3,74327.0,-0.155172,0.201464,-0.75,-0.277,-0.128,-0.008,0.369,74327.0,1.47219,...,0.426,74327.0,619.90863,357.908962,0.0,309.975,619.9,929.8745,1239.799,group1_order1_user11
4,79942.0,-0.357395,0.635892,-1.629,-0.838,-0.46,0.146,1.172,79942.0,1.503843,...,0.49,79942.0,666.747064,384.927392,0.009,333.39425,666.732,1000.10275,1333.44,group1_order1_user12


## 1.3. Univariate Analysis:

## 1.4. Multivariate Analysis: