# Crush Rig Predictive Models
* __Classifier for Trauma Score__
* __Regressor for serosa thickness delta__

Written by Matt MacDonald for CIGITI at the Hospital for Sick Children Toronto
***

All tools to manipulate data will be obtained from the crush_plot.py file. The objective of this notebook is to predict the histological targets from the force/position crush data using a classifier, either logistic regression or otherwise.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

plt.style.use('ggplot')
plt.rcParams['figure.dpi'] = 150

In [None]:
from pdb import set_trace
from warnings import warn

The crush data must be collected using the crush rig and crush.py and stored in the expected folder structure at the root directory indicated by PATH.

In [None]:
from crush_read import *
from crush_plot import *
PATH

Load all data and modify as needed.

In [None]:
study = study_outline(PATH)
targets = study_targets(PATH)
crushes = study_data(study)

In [None]:
crushes = modify(crushes)
crushes = calculate(crushes)

In [None]:
c = random(crushes)
time_plot(c, trim=False)
time_plot(c)

Prepare data for model training and confirm no NaN issues.

In [None]:
X, y, legend = preprocess(crushes, targets)
X.shape

In [None]:
print('Reference for categorical features:')
legend

In [None]:
X.isna().sum()

Remove the holding strain feature since only the STOP protocol is being considered.

In [None]:
X = X.drop('Holding Strain', axis=1)
X.columns

Crush duration is heavily correlated with thickness because of how thickness is calculated by the position above the crush platform at the contact time.

In [None]:
X[['Thickness (mm)', 'Crush Duration (s)']].corr()

Generate matrix of correlations to aid understanding.

In [None]:
y

Focus on the two targets being investigated in the analysis: 
* serosa thickness change significance
* pathologist trauma score rating

In [None]:
key_targets = y[['P Score', 'Trauma Score']].copy()
key_targets = key_targets.rename(columns={'P Score': 'Serosa Change Significance'})
key_targets.head()

In [None]:
pd.concat([X, key_targets], axis=1).corr()

In [None]:
import seaborn as sns

sns.heatmap(pd.concat([X, key_targets], axis=1).corr(), center=0, vmin=-1, vmax=1, cmap='RdBu');

Interesting correlations:
- target stress is strongly correlated with trauma score and p score as expected
- target stress is not correlated with the absolute thickness change, likely because of serosal variability in patients
- strain and stiffness metrics are correlated with the thickness and tissue type of the sample
- trauma score and p score correlate, proving a relationship between the pathologist opinion and serosal thickness

Let's look at the trauma score and serosa thickness metric correlations more closely.

In [None]:
key_targets.corr()

In [None]:
sns.heatmap(key_targets.corr(), center=0, vmin=-1, vmax=1, cmap='RdBu');

In [None]:
key_targets.sort_values('Serosa Change Significance').plot(x='Serosa Change Significance', y='Trauma Score', kind='scatter', alpha=0.5);
plt.yticks([0, 1, 2]);

Look for anomalies in the data.

In [None]:
X['Crush Duration (s)'].plot(style='.');

In [None]:
long = X['Crush Duration (s)'] > 35
long.sum()

In [None]:
time_plot(crushes[long], trim=False)

In [None]:
X[long]

Contact stiffness is low for this sample due to the incorrect trigger for intial contact time. The calculated thickness is also erroneously high. Most other figures are okay however so it is not a huge issue. The source of the error was the crush rig detecting contact too early as a false positive. This would have to be fixed in teh source data if desired.

In [None]:
X[X['Thickness (mm)'] > 10]

Prepare binary classification targets.

In [None]:
y = binary_classes(y)

In [None]:
classifier_targets = y.columns[4:]
for col in classifier_targets:
    most_common = y[col].value_counts().idxmax()
    s = (y[col] == most_common).sum()
    c = y[col].count()
    r = s / c
    print(f"{col}\n    - baseline accuracy = {s}/{c} ({r:.2%})")

In [None]:
X.describe()

In [None]:
y.describe()

In [None]:
(y['Tissue Damage'] == 0).sum()

In [None]:
y['Tissue Damage'].sum()

In [None]:
y['Major Tissue Damage'].sum()

The major tissue damage target is unbalanced. It may not be enough data for an accurate classifier due to the skewed distribution of positive samples.

In [None]:
plt.figure()
trauma_score = y['Tissue Damage'].copy()
trauma_score[y['Major Tissue Damage']] = 2
serosa_delta = y['Percent Serosa Change']
plt.scatter(serosa_delta, trauma_score, color='indigo')
plt.ylabel('Tissue Trauma Score')
plt.xlabel('Percent Serosa Change')
plt.yticks([0, 1, 2]);

In [None]:
plt.figure(figsize=(5, 1))
plt.scatter(100 * y['Percent Serosa Change'], y['Tissue Damage'], color='indigo', alpha=0.25, s=100)
plt.xlabel('Serosa Thickness Change (%)')
plt.yticks([0, 1], ['No Trauma', 'Trauma'])
plt.ylim([-0.5, 1.5]);

Let's see colon vs small bowel.

In [None]:
col_mask = X['Tissue'] == False
sb_mask = X['Tissue'] == True

In [None]:
plt.figure(figsize=(5, 1))
plt.scatter(100 * y[col_mask]['Percent Serosa Change'], y[col_mask]['Tissue Damage'], color='indigo', alpha=0.25, s=100)
plt.xlabel('Serosa Thickness Change (%)')
plt.yticks([0, 1], ['No Trauma', 'Trauma'])
plt.ylim([-0.5, 1.5])
plt.title('Colon');

In [None]:
plt.figure(figsize=(5, 1))
plt.scatter(100 * y[sb_mask]['Percent Serosa Change'], y[sb_mask]['Tissue Damage'], color='indigo', alpha=0.25, s=100)
plt.xlabel('Serosa Thickness Change (%)')
plt.yticks([0, 1], ['No Trauma', 'Tissue Trauma'])
plt.ylim([-0.5, 1.5])
plt.title('Small Bowel');

In [None]:
plt.figure()
s = 0.25
m = y.shape[0]
y1 = y['Tissue Damage']
y2 = y['Significant Serosa Change']
rx = np.random.rand(m) * s - (s / 2)
ry = np.random.rand(m) * s - (s / 2)
plt.scatter(x=y1 + rx, y=y2 + ry, color='seagreen', alpha=0.25, s=100)
plt.xticks([0, 1], ['No Trauma', 'Trauma'])
plt.yticks([0, 1], ['Not Significant', 'Significant'])
plt.xlim([-0.5, 1.5])
plt.ylim([-0.5, 1.5])
plt.title('Serosa Thickness Change vs. Trauma Score')

cnts = [sum([x != y for x, y in zip(y1, y2) if x == 0]),
        sum([x == y for x, y in zip(y1, y2) if x == 1]),
        sum([x == y for x, y in zip(y1, y2) if x == 0]),
        sum([x != y for x, y in zip(y1, y2) if x == 1])]

plt.text(-0.08, 1.25, f"n={cnts[0]} ({100 * cnts[0] / m:.0f}%)", size=10)
plt.text(0.92, 1.25, f"n={cnts[1]} ({100 * cnts[1] / m:.0f}%)", size=10)
plt.text(-0.08, 0.25, f"n={cnts[2]} ({100 * cnts[2] / m:.0f}%)", size=10)
plt.text(0.92, 0.25, f"n={cnts[3]} ({100 * cnts[3] / m:.0f}%)", size=10)

print('Top left N = {} / {}'.format(cnts[0], m))
print('Top right N = {} / {}'.format(cnts[1], m))
print('Bottom left N = {} / {}'.format(cnts[2], m))
print('Bottom right N = {} / {}'.format(cnts[3], m))
print('Agreement = {} / {}'.format(sum(cnts[1:3]), m))

Visualize the key variable which is target stress. Below is the corresponding load in grams for reference.

In [None]:
def gram_to_megapascal(load):
    return (9.81 * load) / (1000 * np.pi * (5/2)** 2)  # 5mm pin

def megapascal_to_gram(load):
    return (1000 * np.pi * (5/2)** 2) * load / 9.81  # 5mm pin

for load in [200, 400, 600, 800, 1000, 1200]:  # test loads in grams
    print(f"{gram_to_megapascal(load):6.2f} (MPa) = {load:5} (grams)")

In [None]:
x_name = 'Target Stress (MPa)'
for y_name in y.columns:
    plt.figure()
    plt.scatter(x=X[x_name], y=y[y_name], alpha=0.25, s=100)
    plt.xlabel(x_name)
    plt.ylabel(y_name)

The goal for the prediction algorithm is to provide a metric for preventing tissue damage intraoperatively. Thus it has the following requirements:

1. Good overall accuracy so it is reliable without being restrictive
2. High recall such that it is conservative, limiting the occurrence of false negatives
3. Simple with limited input so that it can be implemented cheaply in real time

Further to requirement 3 above, no histology features can be used to make the prediction.

In [None]:
def get_freq(crush):
    time = crush.index
    delta = time[1:] - time[:-1]
    return 1 / np.mean(delta.total_seconds())

freqs = crushes['Data'].apply(get_freq)
freqs.mean()

Sample frequency is 31 Hz

Nyquist frequency is 62 Hz

Cutoff frequency of 3rd order butterworth digital filter is 0.2 * 62 = 12.4 Hz

# Data Prep
Define the targets and set the random seed for the model training.

In [None]:
SEED = 42
np.random.seed = SEED

In [None]:
y.columns

In [None]:
targets = ['Percent Serosa Change',  # regression
           'Significant Serosa Change',
           'Tissue Damage',
           'Major Tissue Damage']

class_labels = {'Percent Serosa Change': None,
                'Significant Serosa Change': ['No Change', 'Significant Change'],
                'Tissue Damage': ['No Damage', 'Damage'],
                'Major Tissue Damage': ['No Damage or Minor Damage', 'Major Damage']}

In [None]:
for i, targ in enumerate(targets):
    print(i, targ)

Split the data.

In [None]:
from sklearn.model_selection import train_test_split

X_np = X.values.astype(np.float64)
y_np = y[targets].values

X_train, X_test, y_train, y_test = train_test_split(X_np, y_np, test_size=0.2, random_state=SEED)

In [None]:
print(X_train.shape, X_test.shape)
print(y_train.shape, y_test.shape)

Standardizing the features makes the final model harder to apply and interpret manually. So it is not advisable in this case. Similarly for PCA which will make feature importances hard to determine.

In [None]:
pd.DataFrame(X_train).boxplot();

In [None]:
features = X.columns.values
for i, feat in enumerate(features):
    print(i, feat)

In [None]:
stress_feat = 6
features[stress_feat]

In [None]:
duration_feat = 5
features[duration_feat]

In [None]:
thickness_feat = 3
features[thickness_feat]

In [None]:
strain_feat = 7
features[strain_feat]

In [None]:
tissue_feat = 0
features[tissue_feat]

# Model Prep
Create functions needed for fitting predictive models to the data.

For classification we will use logistic regression due to it's long standing success as a binary classification model for medical data. For regression we will rely on lasso regression to take advantage of the feature selection behaviour given the number of low correlation features in the dataset. Both are also simple enough to be practical for hand calculations or lookup tables in the OR and are interpretable.

In [None]:
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression, LassoCV
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, balanced_accuracy_score, f1_score, roc_auc_score, mean_squared_error
from sklearn.metrics import confusion_matrix, classification_report, roc_curve

In [None]:
def build_lasso(X, y):
    # Fit regressor using cross validation and l1 loss
    model = LassoCV(n_alphas=10, cv=5, random_state=SEED)
    return model.fit(X, y)

In [None]:
def build_logreg(X, y):

    # Model
    model = LogisticRegression(multi_class='auto', random_state=SEED, solver='liblinear', max_iter=5000)

    # Hyperparameter tuning
    param_grid = [{}]
    param_grid[0]["penalty"] = ['l1', 'l2']
    param_grid[0]["C"] = [0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30]
    param_grid[0]["class_weight"] = [None, 'balanced']

    # Perform grid search
    clf = GridSearchCV(estimator=model, cv=5, refit=True,
                       param_grid=param_grid, verbose=1, scoring='balanced_accuracy')
    clf.fit(X, y)
    
    return clf

In [None]:
def plot_fit(X, y, model, target):
    best_feat = np.argmax(np.abs(model.coef_)) 
    plt.figure()
    plt.plot(X[:, best_feat], model.predict(X), 'k.')
    plt.scatter(X[:, best_feat], y, alpha=0.25, s=100)
    plt.xlabel(stress_feature)  # assumed to be best feature
    plt.ylabel(target)
    plt.legend(['Predicted', 'Actual'])
    plt.show()

def assess_lasso(X, y, model, target):
    # RMSE and fit curve for regressors
        plot_fit(X, y, model, target)
        
        return {'RMSE': np.sqrt(mean_squared_error(y, model.predict(X)))}

In [None]:
def plot_auc(X, y, model, target):
    plt.figure()
    plt.plot(*roc_curve(y, model.predict_proba(X)[:, 1])[:2],
             label=f"ROC curve (AUC = {roc_auc_score(y, model.predict(X)):.2f})")
    plt.plot([0, 1], [0, 1], 'k--')
    plt.xlim([-0.05, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.legend(loc='lower right')
    plt.title(target)
    plt.show()

def assess_logreg(X, y, model, target):       
    
    # Classification metrics and AUC for classifiers
    plot_auc(X, y, model, target)
    report = classification_report(y, model.predict(X),
                                   digits=3,
                                   output_dict=True)

    return report['1.0']  # positive calss only

### DELETE??

In [None]:
from sklearn.feature_selection import RFE, RFECV

def optimal_reg(X, y, target, n_features=None):
    # Restrict the features used to get a simpler model
    
    # Lasso and MSE for regression
    if class_labels[target] is None:
        model = lasso_reg(X, y)
        if n_features is not None:
            rfe = RFE(model, n_features_to_select=n_features)
        else:
            rfe = RFECV(model, cv=5, scoring='neg_mean_squared_error', n_jobs=4)
    
    # Logistic regression and accuracy for classification
    else:
        model = logit_reg(X, y)
        if n_features is not None:
            rfe = RFE(model, n_features_to_select=n_features)
        else:
            rfe = RFECV(model, cv=5, scoring='accuracy', n_jobs=4)
    
    return rfe.fit(X, y)

In [None]:
def build(X, y, model, n_features=None):
    '''
    Convenient function  to build multiple models
    Remove any features deemed to be irrelevent by recursive feature elimination
    '''
    if n_features is None:
        n_features = X.shape[1]
    rfe = restricted_reg(X, y, model, n_features)
    
    # Rank the features
    rank = pd.DataFrame({'features': features,
                         'ranking': rfe.ranking_})
    rank = rank.sort_values(by='ranking')
    selected = rank.features[:n_features].tolist()
    
    # Train the model once more on just the selected features
    mask = rfe.support_
    model = logit_reg(X[:, mask], y)
    
    return {'model': model,
            'rank': rank,
            'features': selected,
            'n_features': n_features}

# Models

Build linear regression model for the continuous targets. Build logistic regression models for the class target values.

### Percent Change in Serosa Thickness
Regression model with Lasso.

In [None]:
idx = 0
targets[ind]

In [None]:
model = build_lasso(X_train, y_train[:, idx])

In [None]:
assess_lasso(X_train, y_train[:, idx], model, targets[idx])

In [None]:
assess_lasso(X_test, y_test[:, idx], model, targets[idx])

In [None]:
feat_importances = pd.DataFrame(np.vstack([features, np.abs(model.coef_)]).T, columns= ['Feature', 'Coefficient'])
feat_importances = feat_importances.sort_values('Coefficient', ascending=True).set_index('Feature')
feat_importances

In [None]:
feat_importances.plot.barh();

How does it perform with just the target stress to work with?

In [None]:
X_train_single = X_train[:, stress_feat].reshape(-1, 1)
X_test_single = X_test[:, stress_feat].reshape(-1, 1)
model = build_lasso(X_train_single, y_train[:, idx])

In [None]:
assess_lasso(X_train_single, y_train[:, idx], model, targets[idx])

In [None]:
assess_lasso(X_test_single, y_test[:, idx], model, targets[idx])

### Significant Change in Serosa Thickness

In [None]:
idx = 1
targets[idx]

In [None]:
clf = build_logreg(X_train, y_train[:, idx])

In [None]:
clf.best_params_

In [None]:
assess_logreg(X_train, y_train[:, idx], clf, targets[idx])

In [None]:
assess_logreg(X_test, y_test[:, idx], clf, targets[idx])

In [None]:
feat_importances = pd.DataFrame(np.vstack([features, np.abs(clf.best_estimator_.coef_)]).T, columns= ['Feature', 'Coefficient'])
feat_importances = feat_importances.sort_values('Coefficient', ascending=True).set_index('Feature')
feat_importances

In [None]:
feat_importances.plot.barh();

How does it perform with just the target stress to work with?

In [None]:
X_train_single = X_train[:, stress_feat].reshape(-1, 1)
X_test_single = X_test[:, stress_feat].reshape(-1, 1)
clf = build_logreg(X_train_single, y_train[:, idx])

In [None]:
assess_logreg(X_train_single, y_train[:, idx], clf, targets[idx])

In [None]:
assess_logreg(X_test_single, y_test[:, idx], clf, targets[idx])

There is no measureable difference if duration is included along with target stress.

In [None]:
clf = build_logreg(X_train[:, [stress_feat, duration_feat]], y_train[:, idx])
assess_logreg(X_test[:, [stress_feat, duration_feat]], y_test[:, idx], clf, targets[idx])

Similarly, including the thickness makes no measureable difference.

In [None]:
clf = build_logreg(X_train[:, [stress_feat, duration_feat, thickness_feat]], y_train[:, idx])
assess_logreg(X_test[:, [stress_feat, duration_feat, thickness_feat]], y_test[:, idx], clf, targets[idx])

Review the model parameters for the target stress only model.

In [None]:
clf = build_logreg(X_train_single, y_train[:, idx])
model = clf.best_estimator_
print('Model coefficients:')
print(model.coef_)
print('Model features:')
print(features[stress_feat])
print('Model intercept:')
print(model.intercept_)
print('Model parameters:')
print(clf.best_params_)

Where does the model predict significant serosa thickness change then?

In [None]:
stress = np.linspace(0, 1, 1000).reshape(-1, 1)
pred = model.predict(stress)
prob = model.predict_proba(stress)

In [None]:
print(stress[np.argmax(pred)] * 1000, 'kPa limit')

In [None]:
print(megapascal_to_gram(stress[np.argmax(pred)]), 'gram limit')

In [None]:
plt.scatter(stress, pred)
plt.scatter(stress, prob[:, 1]);

### Tissue Damage

In [None]:
idx = 2
targets[idx]

In [None]:
clf = build_logreg(X_train, y_train[:, idx])

In [None]:
clf.best_params_

In [None]:
assess_logreg(X_train, y_train[:, idx], clf, targets[idx])

In [None]:
assess_logreg(X_test, y_test[:, idx], clf, targets[idx])

In [None]:
feat_importances = pd.DataFrame(np.vstack([features, np.abs(clf.best_estimator_.coef_)]).T, columns= ['Feature', 'Coefficient'])
feat_importances = feat_importances.sort_values('Coefficient', ascending=True).set_index('Feature')
feat_importances

In [None]:
feat_importances.plot.barh();

How does it perform with just the target stress to work with?

In [None]:
X_train_single = X_train[:, stress_feat].reshape(-1, 1)
X_test_single = X_test[:, stress_feat].reshape(-1, 1)
clf = build_logreg(X_train_single, y_train[:, idx])

In [None]:
assess_logreg(X_train_single, y_train[:, idx], clf, targets[idx])

In [None]:
assess_logreg(X_test_single, y_test[:, idx], clf, targets[idx])

There is a measureable difference if strain is included along with target stress.

In [None]:
clf = build_logreg(X_train[:, [stress_feat, strain_feat]], y_train[:, idx])
assess_logreg(X_test[:, [stress_feat, strain_feat]], y_test[:, idx], clf, targets[idx])

There is a slight degradation in performance if the tissue type is included as well.

In [None]:
clf = build_logreg(X_train[:, [stress_feat, strain_feat, tissue_feat]], y_train[:, idx])
assess_logreg(X_test[:, [stress_feat, strain_feat, tissue_feat]], y_test[:, idx], clf, targets[idx])

Review the model parameters for the target stress only model.

In [None]:
clf = build_logreg(X_train_single, y_train[:, idx])
model = clf.best_estimator_
print('Model coefficients:')
print(model.coef_)
print('Model features:')
print(features[stress_feat])
print('Model intercept:')
print(model.intercept_)
print('Model parameters:')
print(clf.best_params_)

Where does the model predict significant serosa thickness change then?

In [None]:
stress = np.linspace(0, 1, 1000).reshape(-1, 1)
pred = model.predict(stress)
prob = model.predict_proba(stress)

In [None]:
print(stress[np.argmax(pred)] * 1000, 'kPa limit')

In [None]:
print(megapascal_to_gram(stress[np.argmax(pred)]), 'gram limit')

In [None]:
plt.scatter(stress, pred)
plt.scatter(stress, prob[:, 1]);

### Major Tissue Damage

Select a specific indicator from the targets and split the dataset.

In [None]:
idx = 3
targets[idx]

In [None]:
clf = build_logreg(X_train, y_train[:, idx])

In [None]:
y_train[:, idx].sum()

Only 4 positive examples for major damage!! That is far too few to build a useable model as feared.

## Voting Classifier
Combining the two single models may make a more powerful model.

In [None]:
from sklearn.ensemble import VotingClassifier

serosa_model = build_logreg(X_train_single, y_train[:, 1]).best_estimator_
trauma_model = build_logreg(X_train_single, y_train[:, 2]).best_estimator_

In [None]:
# Use the trauma score rating as the golden standard target
voting_clf = VotingClassifier([('Serosa', serosa_model), ('Trauma', trauma_model)], voting='soft')
voting_clf.fit(X_train_single, y_train[:, 2])

In [None]:
assess_logreg(X_train_single, y_train[:, 2], voting_clf, targets[2])

In [None]:
assess_logreg(X_test_single, y_test[:, 2], voting_clf, targets[2])

Unfortunately, voting with a single feature does not improve the outcome. The same is seen if all features are included.

In [None]:
serosa_model = build_logreg(X_train, y_train[:, 1]).best_estimator_
trauma_model = build_logreg(X_train, y_train[:, 2]).best_estimator_
voting_clf = VotingClassifier([('Serosa', serosa_model), ('Trauma', trauma_model)], voting='soft')
voting_clf.fit(X_train, y_train[:, 2])
assess_logreg(X_test, y_test[:, 2], voting_clf, targets[2])

Look at disagreements between models.