# Transfer learning with deep convolutional neural network for liver steatosis assessment in ultrasound images
Reproduce Results of [Transfer learning with deep convolutional neural network for liver steatosis assessment in ultrasound images](https://pubmed.ncbi.nlm.nih.gov/30094778/)


> We used a pre-trained CNN to extract features based on B-mode images. Next, using the neural features, we employed the support vector machine (SVM) algorithm to classify images containing fatty liver. Aside of fatty liver classification, it is clinically relevant to quantify the grade of liver steatosis. For this task, we used the extracted features and the Lasso regression method. In both cases, liver biopsy results served as a reference. The performance of the pro- posed approach was compared with the GLCM methods.





In [10]:
import sys
sys.path.append('../src')
import warnings
warnings.filterwarnings("ignore") 

from utils.reduce import reduce_pca
from utils.split import train_test_split

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.model_selection import GroupKFold
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression

from itertools import product
import pickle
import pandas as pd
import numpy as np
import mlflow
import matplotlib.pyplot as plt

In [11]:
mlflow.set_experiment('scattering_pca_logistic_regression_experiment')

## Feature Reduction/Selection

#### Upload Scattering Features

In [18]:
with open('../data/03_features/scattering_features.pickle', 'rb') as handle:
    scatter_dict = pickle.load(handle)
    df_scattering = scatter_dict['df']
    scattering_params = {'J':scatter_dict['J'],
                         'M':scatter_dict['M'],
                         'N':scatter_dict['N']}

#### Apply PCA

Since sklearn is used for PCA, the dataset will be transformed into a panda dataframe.

In [19]:
pca_n_components = 20
df_scattering_10 = reduce_pca(data=df_scattering, n_components=pca_n_components)

# Cross Validation using SVM Classification

> Methods that exclude outliers were used to normalize the features. Patient-specific leave-one-out cross-validation (LOOCV) was applied to evaluate the classification. In each case, the test set consisted of10 images from the same patient and the training set contained 540 images from the remaining 54 patients. For each training set, fivefold cross-validation and grid search were applied to indicate the optimal SVM classifier hyperparameters and the best kernel. To address the problem of class imbalance, the SVM hyperparameter C of each class was adjusted inversely proportional to that class frequency in the training set. Label 1 indicated the image containing a fatty liver and label âˆ’1 otherwise. 


In [20]:
df_train, df_test = train_test_split(df_scattering_10)

In [21]:
# Set the parameters by cross-validation
#param_penalty= ['l1', 'l2', 'elasticnet', None]
#param_solver = ['newton-cg', 'lbfgs', 'liblinear', 'sag', 'saga']
params = [1, 10, 100, 1000]
#params = list(product(['regression'],param_C))

In [22]:
standardize = True

df_train_pid = df_train.pop('id')
df_train_y = df_train.pop('class')


In [None]:

search_metrics = {}

for param in params:
    # Do cross-validation
    with mlflow.start_run():
        mlflow.log_param('pca_n',pca_n_components)
        mlflow.log_params(scattering_params)
        group_kfold = GroupKFold(n_splits=5)
        metrics = []
        for train_index, valid_index in group_kfold.split(df_train, 
                                                          df_train_y, 
                                                          df_train_pid):
            X_train, X_valid = df_train.iloc[train_index], df_train.iloc[valid_index]
            y_train, y_valid = df_train_y.iloc[train_index], df_train_y.iloc[valid_index]

            if standardize:
                scaler = StandardScaler()
                X_train = scaler.fit_transform(X_train)
                X_valid = scaler.transform(X_valid)
            
            mlflow.log_param('C',param)
            
            model =  LogisticRegression(C=param)
            #model = SVC(gamma=param[1], C=param[2])
            model.fit(X_train, y_train)
            predictions = model.predict(X_valid)
            acc = accuracy_score(y_valid, predictions)

            metrics.append(acc)
        
        search_metrics[str(param)] = np.mean(metrics)
        
        mlflow.log_metric('accuracy',np.mean(metrics))
        

# Run Mlflow to see results

`!mlflow ui`

Should launch something like this:



In [7]:
# !mlflow ui 

INFO: 'test_results' does not exist. Creating a new experiment


# Test Prediction

In [18]:
# Set a new mlflow experiment
# Use the best hyperparameters to train a model on the whole training data
# Test and record results!
mlflow.set_experiment('test_results_dataset_liver_bmodes_steatosis_assessment_IJCARS')

Best combination of hyper parameters
<img width="711" alt="Screen Shot 2020-09-29 at 9 47 09 PM" src="https://user-images.githubusercontent.com/23482039/94634185-57238a00-029d-11eb-83ba-ab553d65f348.png">



In [19]:
pca_n_components = 10
df_scattering_10 = reduce_pca(data=df_scattering, n_components=pca_n_components)
df_train, df_test = train_test_split(df_scattering_10)
standardize = True


df_train.pop('id')
df_test.pop('id')
df_train_y = df_train.pop('class')
df_test_y = df_test.pop('class')


In [21]:
with mlflow.start_run():
    model =  LogisticRegression(C=1)
    model.fit(df_train, df_train_y)
    predictions = model.predict(df_test)
    acc = accuracy_score(df_test_y, predictions)
    mlflow.log_param('Model', 'Scattering features + PCA + Logistic Regression')
    mlflow.log_metric('accuracy', acc)


  and should_run_async(code)


In [22]:
print('The test accuracy of the model is ', acc)

The test accuracy of the model is  1.0


  and should_run_async(code)


In [26]:
df_train_y

0      0
1      0
2      0
3      0
4      0
      ..
485    1
486    1
487    1
488    1
489    1
Name: class, Length: 490, dtype: uint8