<a href="https://colab.research.google.com/github/philipp-lampert/mymandible/blob/main/mymandible.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to the code for the mymandible.com project
Note: The project is still under active development.

Our dataset contains two distinct categories of missing data, each attributed to different underlying reasons for their absence.


*   `NaN` is data that is genuinely missing. If it wasn't, the value would still make sense.

*   `N/A` is data that is missing due to it not being applicable. For example, it is illogical to have a value for `fistula_date` if `fistula = False`.

As Python can represent just one type of missing data, only `NaN` represents true missing values on a technical level. `N/A` is simply another category.

This is important when analyzing patterns of missingness and performing imputation: While it would be illogical trying to impute a `N/A` value (as no answer would make sense) it is very much a valid approach for `NaN` values (where any answer makes sense).

In [20]:
import numpy as np
import pandas as pd
from google.colab import files
uploaded = files.upload()


In [34]:
df = pd.read_csv("BFlapsRevised_DATA_2023-10-21_2036.csv", keep_default_na=False, na_values="NaN")
df = df.replace("", "N/A") # RedCap leaves all non-applicable fields blank so we have to give them a value here
df.head()


Unnamed: 0,id,collector_name,other_collector_name,sex_female,indication,comorbidity___none,comorbidity___smoking,comorbidity___alcohol,comorbidity___copd,comorbidity___hypertension,...,nonunion_location___nan,complication_bony___none,complication_bony___fracture,complication_bony___dislocation,complication_bony___nan,days_to_fracture,days_to_dislocation,tmj_luxation,days_to_tmj_luxation,imaging_complete
0,1,philipp,,False,flap_loss,1,0,0,0,0,...,0,0,0,1,0,,210.0,False,,1
1,2,philipp,,True,malignant_tumor,0,0,0,0,0,...,0,1,0,0,0,,,False,,1
2,3,philipp,,False,osteoradionecrosis,0,0,0,0,1,...,0,1,0,0,0,,,False,,1
3,4,philipp,,True,malignant_tumor,0,1,0,0,0,...,0,1,0,0,0,,,False,,1
4,5,philipp,,False,malignant_tumor,0,1,1,0,0,...,0,0,0,0,0,,,,,1


In [35]:
df = df.drop(["predictors_complete", "outcomes_complete", "imaging_complete"], axis = 1) # Removing auto-generated RedCap columns
df = df.loc[:, ~df.columns.str.startswith("days_to_")] # Removing date-related columns

For checkbox items (multiple-choice), RedCap does not assign missing values (`NaN`) directly to each option-column but rather creates an additional column ending on `___nan`. Thus, we have to set all values to `NaN` where `___nan == 1`.

In [36]:
nan_columns = df.filter(like="___nan").columns
nan_columns

Index(['comorbidity___nan', 'radiotherapy___nan', 'chemotherapy___nan',
       'urkens_classification___nan', 'venous_anastomosis_type___nan',
       'venous_anastomosis_tool___nan', 'complication___nan',
       'complication_plate___nan', 'plate_exposure_location___nan',
       'implant___nan', 'nonunion_location___nan', 'complication_bony___nan'],
      dtype='object')

In [38]:
checkbox_list = [name.split("___nan")[0] for name in nan_columns]
checkbox_list

['comorbidity',
 'radiotherapy',
 'chemotherapy',
 'urkens_classification',
 'venous_anastomosis_type',
 'venous_anastomosis_tool',
 'complication',
 'complication_plate',
 'plate_exposure_location',
 'implant',
 'nonunion_location',
 'complication_bony']

In [39]:
for feature in checkbox_list:
  row_with_nan = df[f"{feature}___nan"] == 1
  columns = df.columns[df.columns.str.startswith(feature)]
  df.loc[row_with_nan, columns] = np.nan
  df = df.drop(f"{feature}___nan", axis=1)

In [41]:
df.head()

Unnamed: 0,id,collector_name,other_collector_name,sex_female,indication,comorbidity___none,comorbidity___smoking,comorbidity___alcohol,comorbidity___copd,comorbidity___hypertension,...,implant___plate_removal,implant___iliac_crest_augmentation,imaging,nonunion,nonunion_location___mandible_flap,nonunion_location___flap_flap,complication_bony___none,complication_bony___fracture,complication_bony___dislocation,tmj_luxation
0,1,philipp,,False,flap_loss,1.0,0.0,0.0,0.0,0.0,...,0.0,1.0,opg,True,1.0,0.0,0.0,0.0,1.0,False
1,2,philipp,,True,malignant_tumor,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,opg,False,0.0,0.0,1.0,0.0,0.0,False
2,3,philipp,,False,osteoradionecrosis,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,opg,False,0.0,0.0,1.0,0.0,0.0,False
3,4,philipp,,True,malignant_tumor,0.0,1.0,0.0,0.0,0.0,...,1.0,0.0,opg,False,0.0,0.0,1.0,0.0,0.0,False
4,5,philipp,,False,malignant_tumor,0.0,1.0,1.0,0.0,0.0,...,0.0,0.0,none,,0.0,0.0,0.0,0.0,0.0,
