<a href="https://colab.research.google.com/github/philipp-lampert/mymandible/blob/main/mymandible.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome to the code for the mymandible.com project
Note: The project is still under active development.

In [120]:
import numpy as np
import pandas as pd
pd.set_option("display.max_rows", None)

from google.colab import files
uploaded = files.upload()


Saving BFlapsRevised_DATA_2023-10-21_2036.csv to BFlapsRevised_DATA_2023-10-21_2036 (1).csv


Our dataset contains two distinct categories of missing data, each attributed to different underlying reasons for their absence.


*   `NaN` is data that is missing for various reasons but a valid value theoretically exists.

*   `-9999` is data that is missing due to the variable as a whole not being relevant to the specific patient. For example, it does not make sense to ask how many bone segments were used to reconstruct the mandible if the patient simply received a plate bridging the gap, without any bone transplant.

As Python can represent just one type of missing data, only `NaN` represents true missing values on a technical level. `-9999` thus has to be a numeric placeholder in order to be also applicable in numeric columns.

This is important when analyzing patterns of missingness and performing imputation: While it makes sense trying to impute `NaN` values, trying to do the same with `-9999` values would be illogical as no value would make sense.

In [121]:
df = pd.read_csv("BFlapsRevised_DATA_2023-10-21_2036.csv", na_filter = False)
df = df.replace(["NaN", ""], pd.NA)
#df = df.replace("", "-9999") # RedCap leaves all non-applicable fields blank so we have to give them a value here

For checkbox items (multiple-choice), RedCap does not assign missing values (`NaN`) directly to each option-column but rather creates an additional column whose name always ends on `___nan`. Thus, we have to set each row within a checkbox-group to `NaN` whenever `___nan == 1`.

In [122]:
nan_columns = df.filter(like = "___nan").columns
checkbox_groups = [name.split("___nan")[0] for name in nan_columns]

In [123]:
for name in checkbox_groups:
  row_with_nan = df[f"{name}___nan"] == 1
  columns = df.columns[df.columns.str.startswith(name)]
  df.loc[row_with_nan, columns] = pd.NA
  df = df.drop(f"{name}___nan", axis=1)


With missing values now being correctly represented in our dataframe, let's remove the auto-generated RedCap columns that are only relevant during data collecting.

In [124]:
df = df.drop(["id", "predictors_complete", "outcomes_complete", "imaging_complete"], axis = 1)
df.head()

Unnamed: 0,collector_name,other_collector_name,sex_female,indication,comorbidity___none,comorbidity___smoking,comorbidity___alcohol,comorbidity___copd,comorbidity___hypertension,comorbidity___type_1_diabetes,...,days_to_nonunion,nonunion_location___mandible_flap,nonunion_location___flap_flap,complication_bony___none,complication_bony___fracture,complication_bony___dislocation,days_to_fracture,days_to_dislocation,tmj_luxation,days_to_tmj_luxation
0,philipp,,False,flap_loss,1.0,0.0,0.0,0.0,0.0,0.0,...,210.0,1.0,0.0,0.0,0.0,1.0,,210.0,False,
1,philipp,,True,malignant_tumor,0.0,0.0,0.0,0.0,0.0,0.0,...,,0.0,0.0,1.0,0.0,0.0,,,False,
2,philipp,,False,osteoradionecrosis,0.0,0.0,0.0,0.0,1.0,0.0,...,,0.0,0.0,1.0,0.0,0.0,,,False,
3,philipp,,True,malignant_tumor,0.0,1.0,0.0,0.0,0.0,0.0,...,,0.0,0.0,1.0,0.0,0.0,,,False,
4,philipp,,False,malignant_tumor,0.0,1.0,1.0,0.0,0.0,0.0,...,,0.0,0.0,0.0,0.0,0.0,,,,


Now, we will define the type of each column (boolean, integer, categorical etc.).

In [125]:
for column in df.columns:
    if "___" in column:
        df[column] = df[column].astype('boolean')

In [126]:
boolean_columns = ['sex_female', 'skin_transplanted', 'flap_loss', 'nonunion', 'tmj_luxation']

for column in boolean_columns:
    df[column] = np.where(df[column] == 'True', True, np.where(df[column] == 'False', False, df[column]))
    df[column] = df[column].astype('boolean')

In [127]:
df = df.astype(
    {
       'indication' : 'category',
       'info_other_comorbidity' : 'string',
       'which_autoimmune_disease' : 'string',
       'which_bleeding_disorder' : 'string',
       'prior_flap' : 'category',
       'age_surgery_years' : 'UInt8',
       'flap_donor_site' : 'category',
       'flap_revision' : 'category',
       'days_to_flap_revision' : 'UInt16',
       'plate_type' : 'category',
       'long_plate_thickness' : 'category',
       'tmj_replacement_type' : 'category',
       'flap_segment_count' : 'category',
       'surgery_duration_min' : 'UInt16',
       'height_cm' : 'UInt8',
       'weight_kg' : 'UInt8',
       'bmi' : 'Float32',
       'flap_loss_type' : 'category',
       'days_to_flap_loss' : 'Int16',
       'days_to_whd_recipient_site' : 'Int16',
       'days_to_whd_donor_site' : 'Int16',
       'days_to_abscess' : 'Int16',
       'days_to_fistula' : 'Int16',
       'days_to_vestibuloplasty' : 'Int16',
       'days_to_osteoradionecrosis' : 'Int16',
       'days_to_bone_exposure' : 'Int16',
       'days_to_plate_exposure' : 'Int16',
       'days_to_plate_removal' : 'Int16',
       'days_to_plate_fracture' : 'Int16',
       'days_to_plate_loosening' : 'Int16',
       'days_to_implant_received' : 'Int16',
       'days_to_implant_planned' : 'Int16',
       'days_to_implant_plate_removal' : 'Int16',
       'days_to_iliac_crest_augmentation' : 'Int16',
       'days_to_follow_up' : 'Int16',
       'imaging' : 'category',
       'days_to_imaging' : 'Int16',
       'nonunion' : 'boolean',
       'days_to_nonunion' : 'Int16',
       'days_to_fracture' : 'Int16',
       'days_to_dislocation' : 'Int16',
       'days_to_tmj_luxation' : 'Int16'
    }
)

To make this easier to work with, let's divide our dataframe into predictor and outcome variables.

In [128]:
predictor_variables = ['sex_female',
       'indication', 'comorbidity___none', 'comorbidity___smoking',
       'comorbidity___alcohol', 'comorbidity___copd',
       'comorbidity___hypertension', 'comorbidity___type_1_diabetes',
       'comorbidity___type_2_diabetes', 'comorbidity___atherosclerosis',
       'comorbidity___coronary_heart_disease',
       'comorbidity___peripheral_artery_disease',
       'comorbidity___hyperlipoproteinemia',
       'comorbidity___hypercholesterolemia', 'comorbidity___osteoporosis',
       'comorbidity___hypothyroidism', 'comorbidity___hyperthyroidism',
       'comorbidity___chronic_kidney_disease',
       'comorbidity___factor_v_deficiency', 'comorbidity___cachexia',
       'comorbidity___bleeding_disorder',
       'comorbidity___autoimmune_disease', 'comorbidity___other',
       'info_other_comorbidity', 'which_autoimmune_disease',
       'which_bleeding_disorder', 'prior_flap', 'age_surgery_years',
       'flap_donor_site', 'flap_revision', 'days_to_flap_revision',
       'radiotherapy___none', 'radiotherapy___pre_surgery',
       'radiotherapy___post_surgery', 'chemotherapy___none',
       'chemotherapy___pre_surgery', 'chemotherapy___post_surgery',
       'plate_type', 'long_plate_thickness', 'urkens_classification___c',
       'urkens_classification___r', 'urkens_classification___b',
       'urkens_classification___s', 'tmj_replacement_type',
       'flap_segment_count', 'surgery_duration_min', 'height_cm',
       'weight_kg', 'bmi', 'skin_transplanted',
       'venous_anastomosis_type___end_end',
       'venous_anastomosis_type___end_side',
       'venous_anastomosis_tool___coupler',
       'venous_anastomosis_tool___suture']

predictors_df = df[predictor_variables]
outcomes_df = df.drop(predictor_variables, axis = 1)

Let's now take another look at the `predictors_df`.

In [129]:
predictors_df.head()

Unnamed: 0,sex_female,indication,comorbidity___none,comorbidity___smoking,comorbidity___alcohol,comorbidity___copd,comorbidity___hypertension,comorbidity___type_1_diabetes,comorbidity___type_2_diabetes,comorbidity___atherosclerosis,...,flap_segment_count,surgery_duration_min,height_cm,weight_kg,bmi,skin_transplanted,venous_anastomosis_type___end_end,venous_anastomosis_type___end_side,venous_anastomosis_tool___coupler,venous_anastomosis_tool___suture
0,False,flap_loss,True,False,False,False,False,False,False,False,...,three,441,184,92,27.173914,,False,True,False,True
1,True,malignant_tumor,False,False,False,False,False,False,False,False,...,one,430,160,51,19.921875,False,False,True,False,True
2,False,osteoradionecrosis,False,False,False,False,True,False,False,False,...,two,478,188,77,21.785875,True,True,False,False,True
3,True,malignant_tumor,False,True,False,False,False,False,False,False,...,three,474,175,61,19.918367,True,False,True,False,True
4,False,malignant_tumor,False,True,True,False,False,False,False,False,...,three,536,174,70,23.120625,True,False,True,False,True
