# Aim

The aim of this code is to clean the Injury feature, whose info is in: (InjJoint, InjDefn, SpecInj). 
- first: We will look at the NaN in the InjDefn column. Most of them are Not injured based on other columns.
- then: We will clean the Nan in the InjJoint column, based on InjDefn
- then: We will group the different 'NoInjury', 'NoInjury' and recently coded values. 
- then: We will create injury groups.

In [1]:
import pandas as pd
import numpy as np

In [None]:
# Loading in the run_data_csv
data = pd.read_csv('../data/meta/run_data_meta.csv')
pd.reset_option('display.max_rows')


## Cleaning missing InjJoint values

In [3]:

# Select only the columns we need to clean: filename (to id and merge later), InjJoint,
inj_data = data[['filename', 'InjJoint', 'InjDefn', 'SpecInjury']]
nulls_inj = inj_data[inj_data['InjJoint'].isnull()]
nulls_inj

Unnamed: 0,filename,InjJoint,InjDefn,SpecInjury
33,20140410T152849.json,,No injury,
34,20140410T153617.json,,No injury,
36,20140617T103150.json,,No injury,
48,20140414T094847.json,,No injury,
50,20140402T132349.json,,No injury,
...,...,...,...,...
1824,20150310T120115.json,,No injury,
1825,20150310T121326.json,,No injury,
1826,20150312T142834.json,,No injury,
1827,20150312T143944.json,,No injury,


In [4]:
nulls_inj['InjDefn'].isnull().sum()

72

In [5]:
nulls_inj['InjJoint'].isnull().sum()

234

In [6]:
nulls_inj[nulls_inj['InjDefn'].isnull()]

Unnamed: 0,filename,InjJoint,InjDefn,SpecInjury
1092,20120522T121758.json,,,No injury
1093,20120524T105624.json,,,No injury
1094,20120528T103155.json,,,No injury
1095,20120605T115027.json,,,No injury
1096,20120605T124837.json,,,No injury
...,...,...,...,...
1331,20140609T123854.json,,,No injury
1332,20140602T105322.json,,,No injury
1333,20140606T121315.json,,,No injury
1334,20140610T115941.json,,,No injury


In [23]:
#replacing the nulls from InjDefn with 'NoInjury', because all correspond to 'NoInjury' in 'SpecInj'
cleaned_nulls_inj_df = data.copy()
cleaned_nulls_inj_df['InjDefn'].fillna('No injury', inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  cleaned_nulls_inj_df['InjDefn'].fillna('No injury', inplace=True)


In [24]:
cleaned_nulls_inj_df['InjDefn'].isnull().sum() #verification

0

Now, cleaning InjJoint based on InjDefn information.

In [25]:
cleaned_nulls_inj_df[cleaned_nulls_inj_df['InjDefn'] == 'No injury']

Unnamed: 0,sub_id,datestring,filename,speed_r,age,Height,Weight,Gender,DominantLeg,InjDefn,...,SpecInjury2,Activities,Level,YrsRunning,RaceDistance,RaceTimeHrs,RaceTimeMins,RaceTimeSecs,YrPR,NumRaces
3,100560,2012-07-17 10:37:48,20120717T103748.json,2.657365,33,179.3,83.0,Female,Right,No injury,...,,Yoga,Recreational,,Casual Runner (no times),,,,,
4,101481,2012-07-17 10:50:21,20120717T105021.json,2.625088,32,176.3,58.6,Female,,No injury,...,,,,,,,,,,
8,100658,2012-11-22 14:03:16,20121122T140316.json,2.434180,22,172.0,69.0,Female,Right,No injury,...,,running,Recreational,7.0,Half Marathon,,,,,
10,100727,2013-04-10 10:54:46,20130410T105446.json,2.724679,22,170.0,63.0,Female,Left,No injury,...,,Running,Recreational,8.0,10k,,40,00,,4.0
11,100767,2013-06-06 13:46:51,20130606T134651.json,2.988546,33,180.0,69.0,Male,Left,No injury,...,,running,Competitive,10.0,10k,,38,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1824,200985,2015-03-10 12:01:15,20150310T120115.json,2.768780,40,176.0,66.8,Female,Right,No injury,...,,"Running, biking, general fitness training",Recreational,25.0,Half Marathon,HH,MM,SS,,1.0
1825,200985,2015-03-10 12:13:26,20150310T121326.json,3.359120,40,176.0,66.8,Female,Right,No injury,...,,"Running, biking, general fitness training",Recreational,25.0,Half Marathon,HH,MM,SS,,1.0
1826,200986,2015-03-12 14:28:34,20150312T142834.json,2.858234,20,174.0,56.8,Female,Right,No injury,...,,Track,Competitive,8.0,Other distance,HH,5,30,2013.0,0.0
1827,200986,2015-03-12 14:39:44,20150312T143944.json,4.876998,20,174.0,56.8,Female,Right,No injury,...,,Track,Competitive,8.0,Other distance,HH,5,30,2013.0,0.0


In [37]:
cleaned_nulls_total = cleaned_nulls_inj_df.copy()

condition_injdefn_no_injury = cleaned_nulls_total['InjDefn'] == 'No injury'
condition_injjoint_nan_or_other = cleaned_nulls_total['InjJoint'].isnull() | \
                                  (cleaned_nulls_total['InjJoint'] == 'Other')

rows_to_edit = condition_injdefn_no_injury & condition_injjoint_nan_or_other


cleaned_nulls_total.loc[rows_to_edit, 'InjJoint'] = 'No injury'


In [38]:
cleaned_nulls_total['InjJoint']


0                      Knee
1              Lumbar Spine
2                Hip/Pelvis
3                 No Injury
4       No injury,No injury
               ...         
1827              No injury
1828             Hip/Pelvis
1829           Lumbar Spine
1830              No injury
1831                   Foot
Name: InjJoint, Length: 1832, dtype: object

In [39]:
cleaned_nulls_total['InjJoint'].isnull().sum()

0

## Cleaning the InjJoint feature

In [40]:
cleaned_nulls_total['InjJoint'].value_counts()

InjJoint
Knee                   348
No injury              238
No Injury              228
No injury,No injury    189
Lower Leg              185
Thigh                  181
Foot                   141
Hip/Pelvis             136
Ankle                  107
Lumbar Spine            41
Sacroiliac Joint        22
Other                   16
Name: count, dtype: int64

In [41]:
cleaned_nulls_total.replace({'No Injury': 'No injury',
                             'No injury,No injury': 'No injury',
                             'Foot': 'Foot/Ankle',
                             'Ankle': 'Foot/Ankle',
                             'Lumbar Spine': 'Hip/Pelvis',
                             'Sacroiliac Joint': 'Hip/Pelvis'}, inplace= True)

In [42]:
cleaned_nulls_total[cleaned_nulls_total['InjJoint'] == 'Other'][['InjJoint', 'SpecInjury', 'InjDefn']]

Unnamed: 0,InjJoint,SpecInjury,InjDefn
9,Other,fill in specifics below,Continuing to train in pain
74,Other,,Training volume/intensity affected
78,Other,,Training volume/intensity affected
142,Other,,Continuing to train in pain
202,Other,,Training volume/intensity affected
227,Other,,2 workouts missed in a row
289,Other,,Continuing to train in pain
457,Other,,Training volume/intensity affected
774,Other,fill in specifics below,Continuing to train in pain
804,Other,fill in specifics below,Continuing to train in pain
