In [2]:
# importing libraries here to better keep track of them

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [3]:
# Reading csv files into memory

injury_history_raw = pd.read_csv("data/injury_history(injury_history).csv", sep = ",", encoding = 'ISO-8859-1')
muscle_imbalance_raw = pd.read_csv("data/injury_history(muscle_imbalance_data).csv", sep = ",", encoding = 'ISO-8859-1')
player_sessions_raw = pd.read_csv("data/injury_history(player_sessions).csv", sep = ",", encoding = 'ISO-8859-1')

In [4]:
print(f"Raw Injury Data: \n {injury_history_raw.head()} \n")
print(f"Raw Muscle Imbalance Data: \n {muscle_imbalance_raw.head()} \n")
print(f"Raw Player Session Data: \n {player_sessions_raw.head()} \n")

Raw Injury Data: 
    Player.ID             Name  Group.Id    Injury Type   Body Part   Side  \
0        101  Jordan Matthews       201  Muscle Strain  Quadriceps  Right   
1        101  Jordan Matthews       201     Tendonitis       Wrist   Left   
2        101  Jordan Matthews       201     Tendonitis    Shoulder  Right   
3        103   Malik Robinson       203         Strain       Groin  Right   
4        103   Malik Robinson       203       Fracture       Wrist   Left   

  Injury Date Severity  Recovery Time (days)  \
0   12/5/2023  Grade 2                    51   
1  10/25/2023      NaN                    11   
2   7/22/2023      NaN                    12   
3   6/28/2023  Grade 1                    20   
4   2/14/2023      NaN                    68   

                                    Additional Notes  
0  Grade 2 quadriceps strain with partial tearing...  
1  De Quervain's tenosynovitis. Swelling and pain...  
2  Rotator cuff tendonitis due to overuse. Anti-i...  
3  Grade 

In [5]:
print(f'Columns present in injury data: \n {injury_history_raw.columns} \n')
print(f'Columns present in muscle imbalance data: \n {muscle_imbalance_raw.columns} \n')
print(f'Columns present in player session data: \n {player_sessions_raw.columns} \n')

Columns present in injury data: 
 Index(['Player.ID', 'Name', 'Group.Id', 'Injury Type', 'Body Part', 'Side',
       'Injury Date', 'Severity', 'Recovery Time (days)', 'Additional Notes'],
      dtype='object') 

Columns present in muscle imbalance data: 
 Index(['Player.ID', 'Session ID', 'Player Name', 'Date Recorded',
       'Hamstring To Quad Ratio', 'Quad Imbalance Percent',
       'HamstringImbalance Percent', 'Calf Imbalance Percent',
       'Groin Imbalance Percent'],
      dtype='object') 

Columns present in player session data: 
 Index(['Name', 'Player.ID', 'Group.Id', 'Group.name', 'League.ID',
       'Session.ID', 'Session_Date', 'Position', 'Distance..mi.',
       'Distance...min..mi.', 'Duration..s.', 'Steps', 'Speed....of.max......',
       'Speed..max....mph.', 'Speed..?ò...mph.', 'Time..s.',
       'Accumulated.Acceleration.Load', 'Anaerobic.Activity..distance...mi.',
       'Jump.Load..J.', 'Heart.Rate..?ò...bpm.', 'Heart.Rate..min....bpm.',
       'Heart.Rate..max..

## Player Session Data column research

#### TRIMP - A way to measure Training Impulse (Aerobic Training Load). It is a product of Training volume (minutes) and Training intensity (beats per minutes). 

TRIMP = Training VOlume * Training Intensity. For example if a workout is 50 minutes and the average heart rate (bpm) is 140 bpm, TRIMP score is calculated out to 7000. TRIMP is affected by a number of factors but we can look for correlation on a correlation matrix as well. 
    - Physical Settings
    - Resting and Maximal Heart Rate
    - Gender (since its mbb there's not going to be any variance with this feature)

TRIMP can be used to compare sessions of different lengths, or to compare high-intensity sessions with longer game data. For example if a session with TRIMP score of 108 might be considered moderate, the TRIMP/min score could indicate an intense session if Training Volume is lower. Training Stress Score (TSS) is a proprietary variant based on a score of 100 for a 1-hour maximum sustained effort. 
Positive correlation between Heart Rate and Effort

---

In [11]:
print(f'Size of injury history data: \n {muscle_imbalance_raw.shape}\n')
print(f'Size of muscle imbalance data: \n {muscle_imbalance_raw.shape}\n')
print(f'Size of player session data: \n {muscle_imbalance_raw.shape}\n')

Size of injury history data: 
 (182, 9)

Size of muscle imbalance data: 
 (182, 9)

Size of player session data: 
 (182, 9)



## Null Values

In [14]:
print(f'Null Values in injury history data: \n {injury_history_raw.isnull().sum()}')
print(f'Null Values in muscle imbalance data: \n {muscle_imbalance_raw.isnull().sum()}')
print(f'Null Values in player session data: \n {player_sessions_raw.isnull().sum()}')

Null Values in injury history data: 
 Player.ID                0
Name                     0
Group.Id                 0
Injury Type              0
Body Part                0
Side                     5
Injury Date              0
Severity                10
Recovery Time (days)     0
Additional Notes         0
dtype: int64
Null Values in muscle imbalance data: 
 Player.ID                     0
Session ID                    0
Player Name                   0
Date Recorded                 0
Hamstring To Quad Ratio       0
Quad Imbalance Percent        0
HamstringImbalance Percent    0
Calf Imbalance Percent        0
Groin Imbalance Percent       0
dtype: int64
Null Values in player session data: 
 Name                                  0
Player.ID                             0
Group.Id                              0
Group.name                            0
League.ID                             0
Session.ID                            0
Session_Date                          0
Position            

Okay so with null values, injury history data is the only dataset with null values. There are 2 columns with null values with a total of 15 null values. Less than 10% of the data. We should delete the data since we are working with health data instead of imputing the values. I think it will introduce bias through assumption. Our train test split is going to be pretty weak since we only will have 167 instances. 