# Transfer learning with deep convolutional neural network for liver steatosis assessment in ultrasound images
Reproduce Results of [Transfer learning with deep convolutional neural network for liver steatosis assessment in ultrasound images](https://pubmed.ncbi.nlm.nih.gov/30094778/)


> We used a pre-trained CNN to extract features based on B-mode images. Next, using the neural features, we employed the support vector machine (SVM) algorithm to classify images containing fatty liver. Aside of fatty liver classification, it is clinically relevant to quantify the grade of liver steatosis. For this task, we used the extracted features and the Lasso regression method. In both cases, liver biopsy results served as a reference. The performance of the pro- posed approach was compared with the GLCM methods.





In [1]:
import sys
sys.path.append('../src')

from utils.reduce import reduce_pca
from utils.split import train_test_split

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.model_selection import GroupKFold
from sklearn.metrics import accuracy_score

from itertools import product
import pickle
import pandas as pd
import numpy as np
import mlflow
import matplotlib.pyplot as plt

In [8]:
mlflow.set_experiment('scattering_svm_pca_experiment')

  and should_run_async(code)


## Feature Reduction/Selection

#### Upload Scattering Features

In [9]:
with open('../data/03_features/scattering_features.pickle', 'rb') as handle:
    scatter_dict = pickle.load(handle)
    df_scattering = scatter_dict['df']
    scattering_params = {'J':scatter_dict['J'],
                         'M':scatter_dict['M'],
                         'N':scatter_dict['N']}

#### Apply PCA

Since sklearn is used for PCA, the dataset will be transformed into a panda dataframe.

In [None]:
pca_n_components = 10
df_scattering_10 = reduce_pca(data=df_scattering, n_components=pca_n_components)

# Cross Validation using SVM Classification

> Methods that exclude outliers were used to normalize the features. Patient-specific leave-one-out cross-validation (LOOCV) was applied to evaluate the classification. In each case, the test set consisted of10 images from the same patient and the training set contained 540 images from the remaining 54 patients. For each training set, fivefold cross-validation and grid search were applied to indicate the optimal SVM classifier hyperparameters and the best kernel. To address the problem of class imbalance, the SVM hyperparameter C of each class was adjusted inversely proportional to that class frequency in the training set. Label 1 indicated the image containing a fatty liver and label −1 otherwise. 


In [5]:
df_train, df_test = train_test_split(df_scattering_10)

  and should_run_async(code)


In [6]:
# Set the parameters by cross-validation
param_gamma = [1e-3, 1e-4]
param_C = [1, 10, 100, 1000]
rbf_params = list(product(['kernel'],param_gamma, param_C))
linear_params = list(product(['linear'],param_C))
params = rbf_params + linear_params

In [7]:
standardize = True

df_train_pid = df_train.pop('id')
df_train_y = df_train.pop('class')
search_metrics = {}

for param in params:
    # Do cross-validation
    with mlflow.start_run():
        mlflow.log_param('pca_n',pca_n_components)
        mlflow.log_params(scattering_params)
        group_kfold = GroupKFold(n_splits=5)
        metrics = []
        for train_index, valid_index in group_kfold.split(df_train, 
                                                          df_train_y, 
                                                          df_train_pid):
            X_train, X_valid = df_train.iloc[train_index], df_train.iloc[valid_index]
            y_train, y_valid = df_train_y.iloc[train_index], df_train_y.iloc[valid_index]

            if standardize:
                scaler = StandardScaler()
                X_train = scaler.fit_transform(X_train)
                X_valid = scaler.transform(X_valid)
            
            mlflow.log_param('model',f'svm: {param[0]}')
            if param[0] == 'kernel': 
                mlflow.log_param('gamma',param[1])
                mlflow.log_param('C',param[2])
                model = SVC(gamma=param[1], C=param[2])
            if param[0] == 'linear': 
                model = LinearSVC(C=param[1])
                mlflow.log_param('C',param[1])
            model.fit(X_train, y_train)
            predictions = model.predict(X_valid)
            acc = accuracy_score(y_valid, predictions)

            metrics.append(acc)
        
        search_metrics[str(param)] = np.mean(metrics)
        
        mlflow.log_metric('accuracy',np.mean(metrics))
        



# Run Mlflow to see results

`!mlflow ui`

Should launch something like this:



In [None]:
# !mlflow ui

# Test Prediction

In [5]:
# !mlflow ui 
# Set a new mlflow experiment
# Use the best hyperparameters to train a model on the whole training data
# Test and record results!
mlflow.set_experiment('test_results_dataset_liver_bmodes_steatosis_assessment_IJCARS')

Best combination of hyper parameters
<img width="782" alt="Screen Shot 2020-09-29 at 8 47 03 PM" src="https://user-images.githubusercontent.com/23482039/94630966-36573680-0295-11eb-9352-18b1796b3fd4.png">

In [12]:
with open('../data/03_features/scattering_features.pickle', 'rb') as handle:
    scatter_dict = pickle.load(handle)
    df_scattering = scatter_dict['df']
    scattering_params = {'J':scatter_dict['J'],
                         'M':scatter_dict['M'],
                         'N':scatter_dict['N']}

  and should_run_async(code)


In [None]:
pca_n_components = 10
df_scattering_10 = reduce_pca(data=df_scattering, n_components=pca_n_components)
df_train, df_test = train_test_split(df_scattering_10)
standardize = True


df_train.pop('id')
df_test.pop('id')
df_train_y = df_train.pop('class')
df_test_y = df_test.pop('class')


In [None]:
pca_n_components = 10
df_scattering_10 = reduce_pca(data=df_scattering, n_components=pca_n_components)
df_train, df_test = train_test_split(df_scattering_10)

In [8]:
def get_train_test_patients_id(ids, train_sz: float=.9, seed:int=2020):
    # Get the list of patient id's and shuffle that list 
    patient_ids = ids.unique()
    np.random.seed(seed)
    np.random.shuffle(patient_ids)
    # Create two lists of patient id's for training and testing
    train_patient_cnt = int(len(patient_ids) * train_sz)
    train_id = patient_ids[0:int(train_patient_cnt)]
    test_id = patient_ids[int(train_patient_cnt): int(len(patient_ids))]
    return train_id, test_id

In [15]:
data= df_scattering
train_sz = 0.9
seed =1221

train_id, test_id = get_train_test_patients_id(data['id'])
# Separate the features from id and class columns
train_data = data[data['id'].isin(train_id)].reset_index(drop=True)
test_data = data[data['id'].isin(test_id)].reset_index(drop=True)
#TO DO: IS IT SHUFFLE?
train_data.sample(frac=1)

  and should_run_async(code)


Unnamed: 0,id,class,0,1,2,3,4,5,6,7,...,1390922,1390923,1390924,1390925,1390926,1390927,1390928,1390929,1390930,1390931
221,25,1,0.000010,-0.000004,-8.925645e-06,-0.000005,-2.201625e-06,-6.780629e-06,0.000076,0.054140,...,2.328133,5.571801,1.289984,0.635552,6.082054,5.703537,2.455366,3.204983,1.724655,1.941889
352,40,1,0.000011,-0.000002,-3.070657e-06,-0.000004,-4.009341e-06,-3.803893e-06,0.000080,0.054145,...,2.631998,2.607885,1.887473,0.699335,6.082669,5.704036,2.455938,3.204574,1.724959,1.941498
31,5,0,0.000011,0.000001,-1.291469e-06,-0.000002,-4.378272e-06,-2.550456e-06,0.000086,0.054142,...,1.634720,2.301602,1.910247,0.709768,6.082740,5.708612,2.458519,3.201056,1.728634,1.946914
292,34,1,0.000014,0.000005,4.332238e-06,0.000003,1.048949e-06,-2.928814e-06,0.000082,0.054143,...,2.721247,1.106732,0.413090,0.638495,6.082207,5.703805,2.455143,3.201264,1.726209,1.944834
208,23,1,0.000019,0.000007,5.984728e-06,0.000004,6.737261e-06,7.060763e-06,0.000088,0.054152,...,2.333182,5.574312,1.291569,0.636656,6.083462,5.707928,2.458575,3.203622,1.724986,1.942188
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
389,43,1,0.000019,0.000008,8.876254e-06,0.000004,5.266569e-06,5.355347e-06,0.000088,0.054147,...,6.033401,5.955864,1.293151,0.629484,6.072688,5.707843,2.459794,3.203723,1.724595,1.941337
203,23,1,0.000018,0.000005,2.537354e-06,-0.000004,-1.223681e-06,2.036426e-06,0.000087,0.054145,...,2.328592,5.571931,1.290058,0.635608,6.082691,5.707336,2.458105,3.203240,1.724670,1.941920
142,17,0,0.000006,-0.000007,-1.001323e-05,-0.000009,-9.256087e-06,-1.319945e-05,0.000073,0.054137,...,3.192692,1.916151,1.401622,0.682439,6.082611,5.704041,2.455608,3.203341,1.728342,1.946159
232,26,1,0.000013,-0.000001,-1.901093e-06,-0.000002,7.519861e-07,-1.609779e-06,0.000077,0.054142,...,2.721246,1.106731,0.413087,0.638494,6.082796,5.707552,2.458146,3.200641,1.722166,1.945236


In [None]:
with mlflow.start_run():
    model =  SVC(gamma= 1e-4, C=1000)
    model.fit(df_train, df_train_y)
    predictions = model.predict(df_test)
    acc = accuracy_score(df_test_y, predictions)
    mlflow.log_param('Model', 'Scattering features + PCA + SVM')
    mlflow.log_metric('accuracy', acc)


In [None]:
print('The test accuracy of the model is ', acc)

In [9]:
#id_c = df_scattering.pop('id')
#class_c = df_scattering.pop('class')

pca = PCA(n_components=10)
data = pca.fit_transform(df_scattering)

In [13]:
(pca.explained_variance_ratio_).sum()

0.6470887