# BMI 260 ASSIGNMENT #3 | Mammogram Spring 2018

## Name 1:Joseph Nicolls

## Name 2: Alex Lu

Breast cancer has the highest incidence and second highest mortality rate for women in the US. 
Your task is to utilize machine learning to either classify AND/OR segment mammograms or neither, as long as you justify why it is useful to do whatever it is you want to do.  

## Traditional ML techniques for Classification of Mammograms

###  Background and Relevance

Mammograms are difficult to classify because identifying features for malignant tumors can be vague and masses in mammograms can appear anywhere with any orientation in the breast tissue. Though the setup for this problem suggests use of CNNs, we want to use traditional ML techniques in this exploratory research for a variety of reasons. The principal reason that we want to do this is for the feature analysis possible with traditional ML techniques that isn't possible with CNNs. According to a study from Britton et. all, sensitivity of radiologists in classifying mammograms can vary between 53.1-74.1%. Through feature analysis, indicative features could be highlighted for radiologists to focus on, potentially raising sensitivities and decreasing the variability of sensitivities between radiologists. In essence, we will attempt to classify the malignancy of tumors within mammograms using traditional machine learning techniques based on quantitative features derived from the mammogram and categorical features provided by expert analysis on those mammograms.




### Approach and Methods


The images were taken from the Digital Database for Screening Mammography (DDSM), a database which consists of 2620 mammogram studies. These images are X-rays, recorded as grey-scale images. he two classes are fairly balanced, with 52% benign mammograms and  48% malignant. 

The dataset also comes with a csv of precomputed features and a set of semantic descriptions. Previous work had been done to filter these images to the ROI mass. We will employ the masks to create several new features in addition to features that were previously computed.

We have divided the data into training and testing at random, leaving 10% out for testing. T
In addition, semantic features associated with some on the images have also been included. These semantic features include the type of view included in the image, which side of the patient the breast was on, and case descriptions with a limited vocabulary describing characteristics of the mass. We believe that semantic features such as breast density, mass shape, and mass margins can be highly useful. The limited vocabulary of these qualitative features allows one-hot encoding and incorporation into our feature vectors for traditional ML methods.
Relevant statistics in our data are based on the conventional wisdom that malignant tumors typically have irregular borders and a non spherical shape. (Halls, Mammogram and Ultrasound Images Explained) In order to measure this, we compare the margins of the mass to its convex hull, or the smallest set of convex points which contains the ROI. Two statistics which provide a metric of this relationship is the convexity (the ratio of the convex hull perimeter to the ROI perimeter) and the solidity ( the ratio of the ROI area to the convex hull area.) The lower the convexity and lower the solidity, the less spherical the ROI is, potentially hinting at malignant character.
In addition, we have also calculated a similar metric called extent. This metric is the ratio of the area of the ROI to the area of the bounding box of the ROI.

We will first try to leverage the following features in order to make attempts at classification: 
* features in the features_matrix.csv (ASM, Area, Centroid coordinates, ...) 
* convexity, solidity, and extent. (Calculated from masked ROI)


In order to decrease the feature dimensionality, we considered principal component analysis for feature extraction. In choosing to do so, we would have sacrificed variable interpretability. However, due to the relatively small feature space that we’re operating in, it’s not overly burdensome to trace backwards. In addition, identifying features that are strongly associated with each other could provide the ground for recommendations to pathologists. However, once we performed PCA, we found that one component accounted for 44% of the variability while all other principal components accounted for ~10%. In comparison to results from training features with normalization and zero-centering alone, PCA did not contribute significantly to model performance and only contributed to overfitting. 


First, let's import the packages that we're going to use 

In [6]:
import h5py 
import numpy as np
import pandas as pd
import seaborn as sns
import cv2

import os 
import random
random.seed(42)

from sklearn import preprocessing

from sklearn.linear_model import Lasso
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import cross_val_score, cross_val_predict

from sklearn.metrics import roc_auc_score
from sklearn.metrics import auc
from sklearn.metrics import roc_curve
from sklearn.metrics import confusion_matrix

from sklearn.decomposition import PCA

from sklearn.model_selection import train_test_split

from skimage.morphology import convex_hull_image
from skimage import data, img_as_float
from skimage.util import invert
import matplotlib.pyplot as plt

from skimage.measure import regionprops


Now, let's import from the other code files 

In [7]:
def train_models(train, labels):
    
    svcClf = SVC(random_state = 260)
    svcClf.fit(train, labels)
    
    rfClf = RandomForestClassifier(random_state = 260)
    rfClf.fit(train, labels)
    
    return [svcClf, rfClf]


def Tuner(combined_features, groundtruth):
    '''
    Input: 
        * all features
        * ground truth labels

    Output:
        * tuning grid of (accuracy, sensitivity, specificity) for specified bounds on C, class_weight

    '''
    C = .3
    C_bound = 20
    grid = np.zeros((C_bound, 3))
    for i in range(C_bound):
        grid[i] = EvaluateClassifier(combined_features, groundtruth, C)
        C += .3
    return grid


def EvaluateClassifier(X, Y, C):
    '''
    Input: 
        * Dataset X
        * Labels Y
    Output:
        * outputs performance metrics: accuracy, sens, spec

    Evaluates a linear-kernel support classifer with given penalty input C and melanoma class-weight weight
    with 10-fold cross-validation. We must adjust the class weight to account for the class imbalance inherent 
    to the dataset. 
    '''

    svm_man = SVC(C = C, kernel='linear')

    scores = cross_val_score(svm_man, X, Y, cv=10, scoring='accuracy')
    acc = scores.mean()
    print('acc is:')
    print(acc)

    #------ Compute metrics ---------
    y_predict = cross_val_predict(svm_man, X, Y, cv=10)
    conf_mat = confusion_matrix(Y, y_predict)
    tn, fp, fn, tp = conf_mat.ravel()
    sens = float(tp)/(tp+fn)
    spec = float(tn)/(tn+fp)
    print('sens is:')
    print(sens)
    print('spec is:')
    print(spec)
    print(conf_mat)

    return (acc, sens, spec)


def validate_models(models, test, y_true):
    '''
        input: 
            * models, a list of trained models 
            * test, the test set on which models will be evaluated
            * feature_names, the list of feature columns that are being used 
    '''
    print("Preparing to validate models")
    for model in models:
        y_pred = model.predict(test)
        conf_mat = confusion_matrix(y_true, y_pred)
        tn, fp, fn, tp = conf_mat.ravel()
        sens = float(tp)/(tp+fn)
        spec = float(tn)/(tn+fp)
        print('sens is:')
        print(sens)
        print('spec is:')
        print(spec)
        print(conf_mat)

        print("auroc: " +  str(roc_auc_score(y_true, y_pred)))
        
    print("Done validating models")

In [8]:
def get_props(data): 
    ret,thresh_img = cv2.threshold(data,0,255,cv2.THRESH_BINARY)
    thresh_props =  regionprops(thresh_img.astype(int))
    thresh_area = thresh_props[0].area
    thresh_perimeter = thresh_props[0].perimeter
    chull = convex_hull_image(thresh_img)
    props = regionprops(chull.astype(int))
    chull_area = props[0].area
    chull_perimeter = props[0].perimeter 
    convexity = chull_perimeter/thresh_perimeter
    solidity = thresh_area/chull_area
    return convexity, solidity, thresh_props[0].extent, thresh_props[0].major_axis_length, thresh_props[0].minor_axis_length
    

def reduce_dimensionality(raw_data, new_dims=3):
    '''
        input:
            * raw_data, the raw matrix that will be reduced in dimensionality 
        output:
            * the dimensionality-reduced data 
        
    '''
    print("preparing to reduce dimensionality")
    pca = PCA()
    pca.fit(raw_data)
    print(">>> variance explained by each principal component")
    print(pca.explained_variance_ratio_)  
    print(">>> the first principal component")
    print(pca.components_[0])
    reduced = pca.transform(raw_data)[:,:new_dims]
    return reduced


def mean_center_normalize(data):
    data -= np.mean(data, axis = 0)
    data /= np.std(data, axis = 0)
    return data

In [13]:
def gather_images(data_parent_dir):
    '''
        input:
            * paths, a list of the paths that we need to input 
        output:
            * a dataframe containing image data, label, and name 
    '''
    print("preparing to read images")
    
    # initialize data structures 
    data = []
    label = []
    name = []
    convexities = []
    solidities = []
    extents = []
    major_axis_lengths = []
    minor_axis_lengths = []
    
    paths = os.listdir(data_parent_dir)

    scan_ids = []
    
    # iteration over paths, dirs
    for first_path in paths:
        if first_path[0] == '.': # check for random dot files that come up :( 
            continue
        local_dir = os.path.join(data_parent_dir, first_path)
        for image in os.listdir(local_dir):
            scan_ids.append("_".join((first_path, image[:-3])))
            with h5py.File(os.path.join(local_dir, image), 'r') as hf:
                data.append(np.array(hf.get('data')))
                label.append(np.array(hf.get('label')).item(0))
                name.append(np.array(hf.get('name')))
                
                # compute additional features
                convexity, solidity, extent, major_axis_length, minor_axis_length = get_props(hf.get('data')[:, :, 1])
                convexities.append(convexity)
                solidities.append(solidity)
                extents.append(extent)
                major_axis_lengths.append(major_axis_length)
                minor_axis_lengths.append(minor_axis_length)
         
    print(scan_ids[:10])
    scan_ids = ["P_" +scan_id for scan_id in scan_ids]
    
    d = {'pixel_data':data, 'label':label, 'name':name}
    
    d_computed = {'convexity': convexities, 'solidity': solidities, 'extent': extents, 'major_axis_length': major_axis_lengths, 'minor_axis_length': minor_axis_lengths}
   
    df_img = pd.DataFrame(data=d, index=scan_ids)
    df_computed = pd.DataFrame(data=d_computed, index=scan_ids)
    
    return (df_img, df_computed, scan_ids)


def drop_excess_rows(scan_ids, precomputed_df):

    drop_list = []
    true_list = []
    scan_ids = set(scan_ids)
    names = precomputed_df.index.values
    for name in names:
        if name not in scan_ids:
            drop_list.append(name)

    print("the drop list has: " + str(len(drop_list)))
    return precomputed_df.drop(drop_list)


def gather_semantic_features(semantics_path):
    semantic_df = pd.read_csv(semantics_path)
    semantic_df.dropna(inplace=True)
    return semantic_df


def encode_categorical_labels(semantic_df, semantic_feature_names):
    '''
        input:
        
        output:
        
    '''
    for feature in semantic_feature_names:
        le = preprocessing.LabelEncoder()
        le.fit(list(semantic_df[feature].astype(str)))
        semantic_df[feature] = le.transform(semantic_df[feature])
    return semantic_df


def one_hot_encoding(semantic_df, semantic_feature_names):
    '''
        input:
        
        output:
        
    '''
    enc = preprocessing.OneHotEncoder(sparse=False)
    semantic_one_hots = enc.fit_transform(semantic_df[semantic_feature_names])
    _, one_hot_length = semantic_one_hots.shape
    return (semantic_one_hots, one_hot_length)


def generate_semantic_df(semantics_path, semantic_feature_names, total_patientIDs):
	'''

		Here, we find that there are semantic descriptions of images that do not
		appear in the h5'd dataset, and also images that appear in this dataset 
		without corresponding semantic descriptions.  
	'''
    
	semantic_df = gather_semantic_features(semantics_path)
	semantic_df = encode_categorical_labels(semantic_df, semantic_feature_names)
	semantic_one_hots, one_hot_length = one_hot_encoding(semantic_df, semantic_feature_names)

	has_semantic = [s1 + "_" + s2 + "_" + s3 for (s1, s2, s3) in zip(list(semantic_df['patient_id']), list(semantic_df['side']), list(semantic_df['view']))]

	semantic_encoded_dict = {img:np.zeros(one_hot_length) for (idx, img) in enumerate(total_patientIDs)}

	for img, patient_id in enumerate(has_semantic): 
		if patient_id in semantic_encoded_dict.keys():
			semantic_encoded_dict[patient_id] = semantic_one_hots[img] 

	print('---')
	print(len(has_semantic))
	print(one_hot_length)
	print(len(total_patientIDs))
	print(len(semantic_encoded_dict.keys()))

	encoded_feature_names = ["one_hot #" + str(x) for x in range(1, one_hot_length+1)]
	semantic_encoded_df = pd.DataFrame.from_dict(semantic_encoded_dict, orient='index', columns=encoded_feature_names)
	print(semantic_encoded_df.index)
	#    semantic_encoded_df.set_index(total_patientIDs)
	return (semantic_encoded_df, encoded_feature_names)
    


We initially predict some potential issues with this approach: the number of features is large, and many are potentially correlated. For that reason, we can implement some dimensionality reduction and take a look at the variation explained by the principal components (and their contents)

The beginnings of our preprocessing pipeline

In [14]:
def preprocess(data_parent_dir, precomputed_path, dim_reduc=None):
    print('Entering preprocessing')
    
    feature_names = []
    
    # reading things in 
    image_df, computed_df, scan_ids = gather_images(data_parent_dir)
    precomputed_df = drop_excess_rows(scan_ids, pd.read_csv(precomputed_path, index_col=0))

    # an initial merge
    feature_df = pd.merge(computed_df, precomputed_df, on=precomputed_df.index)
    
    # zero-centering, normalization of all data within the feature dataframe 
    features = feature_df.values[:,1:]
    print(features.shape)
    new_fts = mean_center_normalize(np.array(features,dtype=np.float32))
    print("==============================")
    print(new_fts.shape)
    
    print(image_df.describe())
    print(computed_df.describe())
    print(precomputed_df.describe())
    
    # dimensionality reduction 
    if dim_reduc is not None:
        new_fts = reduce_dimensionality(new_fts, dim_reduc)
        feature_names += ["pc# "+str(x) for x in range(1, dim_reduc+1)]
        feature_df = pd.DataFrame(new_fts, columns=feature_names, index=precomputed_df.index)
        print("==============================")
        print(feature_df.info())
        
    # adding back to df 
    else:
        feature_names += list(feature_df)[1:]
        feature_df[feature_names] = new_fts
        print("==============================")
        print(feature_df.info())
    
    print(feature_names)
    
    # joining dfs
    df = pd.merge(image_df, feature_df[feature_names], on=feature_df.index)
    df.info()
    
    train, test = train_test_split(df, test_size=0.2, random_state=260)
    
    return (train[feature_names], train['label'], test[feature_names], test['label'], scan_ids)

    

Now that we've pretty much established how our data should be preprocessed, we want to introduce and train some basic machine learning models 

In [15]:
data_parent_dir = "./data_fixed_crop_w_mask"
precomputed_path = "features_matrix.csv"

In [16]:
X_train, y_train, X_test, y_test, all_IDs = preprocess(data_parent_dir, precomputed_path, dim_reduc=None) 

Entering preprocessing
preparing to read images
['00001_LEFT_CC', '00001_LEFT_MLO', '00004_LEFT_CC', '00004_LEFT_MLO', '00004_RIGHT_CC', '00004_RIGHT_MLO', '00009_RIGHT_CC', '00009_RIGHT_MLO', '00015_LEFT_MLO', '00016_LEFT_CC']
the drop list has: 91
(1508, 23)
(1508, 23)
             label
count  1508.000000
mean      0.478780
std       0.499715
min       0.000000
25%       0.000000
50%       0.000000
75%       1.000000
max       1.000000
         convexity     solidity       extent  major_axis_length  \
count  1508.000000  1508.000000  1508.000000        1508.000000   
mean      0.823225     0.863078     0.644197          90.195399   
std       0.151674     0.117002     0.107757          62.893666   
min       0.461828     0.047736     0.016178          31.827394   
25%       0.754172     0.853156     0.620854          55.699932   
50%       0.821002     0.885096     0.662134          68.380669   
75%       0.876884     0.912385     0.699197          93.923356   
max       3.014770   

## Models and Results

For this exploratory research, we examined SVMs and Random Forests. SVMs were a natural choice due to our small feature space and even class distribution. In addition, Random Forest models have the potential to elucidate unconventional relationships between seemingly independent variables. Due to our use of SVMs, normalization was required to prevent features with significantly high amplitudes from dominating the prediction outcome. 

From the features that were previously computed, performance in these models was worse than the majority classifier. However, with the addition of our computed features, we were able to improve the performance of our classifier measurably. Before tuning, our SVM performed with a sensitivity of .49 and a specificity of .82. Our RF performed with a sensitivity of .57 and a specificity of .75. Our area under the receiver operator curve for our SVM model and RF model was .65 and .66 respectively. In order to improve these metrics, we performed hyperparameter tuning. In out tuning, we discovered that a major measure that our models fell behind on was in sensitivity. This makes sense, as malicious tumors are easier to spot in a mammogram, but it is more difficult to tell the difference, based on outline alone, to tell the difference between a benign and malicious mass. 


In [17]:
models = train_models(X_train, y_train)
validate_models(models, X_test, y_test)

Preparing to validate models
sens is:
0.4697986577181208
spec is:
0.6862745098039216
[[105  48]
 [ 79  70]]
auroc: 0.5780365837610212
sens is:
0.3422818791946309
spec is:
0.6666666666666666
[[102  51]
 [ 98  51]]
auroc: 0.5044742729306488
Done validating models


Now that we've seen our models perform like hot garbage, we're going to try to add in the semantic feature encoding to see if that will help 

In [18]:
def preprocess2(data_parent_dir, precomputed_path, semantics_path, semantic_feature_names, dim_reduc=None):
    print('Entering preprocessing')
    
    feature_names = []
    
    # reading things in 
    image_df, computed_df, scan_ids = gather_images(data_parent_dir)
    precomputed_df = drop_excess_rows(scan_ids, pd.read_csv(precomputed_path, index_col=0))

    # an initial merge
    feature_df = pd.merge(computed_df, precomputed_df, on=precomputed_df.index)
    
    # zero-centering, normalization of all data within the feature dataframe 
    features = feature_df.values[:,1:]
    new_fts = mean_center_normalize(np.array(features,dtype=np.float32))
    
    # dimensionality reduction 
    if dim_reduc is not None:
        new_fts = reduce_dimensionality(new_fts, dim_reduc)
        feature_names += ["pc# "+str(x) for x in range(1, dim_reduc+1)]
        feature_df = pd.DataFrame(new_fts, columns=feature_names, index=precomputed_df.index)


        
    # adding back to df 
    else:
        feature_names += list(feature_df)[1:]
        feature_df[feature_names] = new_fts
    
    
    # get the semantic df
    semantic_df, encoded_feature_names = generate_semantic_df(semantics_path, semantic_feature_names, scan_ids)
    
    
    
    # joining dfs
    df = pd.merge(image_df, feature_df[feature_names], on=feature_df.index)
    print(df.index)
    df.reset_index(drop=True, inplace=True)
    df.set_index(image_df.index, inplace=True)
    #print(semantic_df.info())
    all_features_df = pd.merge(df, semantic_df, left_index=True, right_index=True, sort=False)
    all_features_df.info()
    
    feature_names += encoded_feature_names
    train, test = train_test_split(all_features_df, test_size=0.2, random_state=260)
    
    return (train[feature_names], train['label'], test[feature_names], test['label'], scan_ids)

In [19]:
data_parent_dir = "./data_fixed_crop_w_mask"
precomputed_path = "features_matrix.csv"
semantics_path = "mass_case_description_train_set.csv"

semantic_feature_names = ['breast_density', 'abn_num', 'mass_shape', 'mass_margins', 'assessment']

X_train, y_train, X_test, y_test, scan_ids = preprocess2(data_parent_dir, 
                                                       precomputed_path, 
                                                         semantics_path,
                                                       semantic_feature_names,
                                                       dim_reduc=None)


Entering preprocessing
preparing to read images
['00001_LEFT_CC', '00001_LEFT_MLO', '00004_LEFT_CC', '00004_LEFT_MLO', '00004_RIGHT_CC', '00004_RIGHT_MLO', '00009_RIGHT_CC', '00009_RIGHT_MLO', '00015_LEFT_MLO', '00016_LEFT_CC']
the drop list has: 91
---
1273
49
1508
1508
Index(['P_00001_LEFT_CC', 'P_00001_LEFT_MLO', 'P_00004_LEFT_CC',
       'P_00004_LEFT_MLO', 'P_00004_RIGHT_CC', 'P_00004_RIGHT_MLO',
       'P_00009_RIGHT_CC', 'P_00009_RIGHT_MLO', 'P_00015_LEFT_MLO',
       'P_00016_LEFT_CC',
       ...
       'P_01550_RIGHT_CC', 'P_01550_RIGHT_MLO', 'P_01551_LEFT_CC',
       'P_01551_LEFT_MLO', 'P_01553_RIGHT_CC', 'P_01553_RIGHT_MLO',
       'P_01555_LEFT_MLO', 'P_01556_LEFT_CC', 'P_01557_RIGHT_CC',
       'P_01557_RIGHT_MLO'],
      dtype='object', length=1508)
Int64Index([   0,    1,    2,    3,    4,    5,    6,    7,    8,    9,
            ...
            1498, 1499, 1500, 1501, 1502, 1503, 1504, 1505, 1506, 1507],
           dtype='int64', length=1508)
<class 'pandas.core.frame

In [20]:
models = train_models(X_train, y_train)
validate_models(models, X_test, y_test)

Preparing to validate models
sens is:
0.4899328859060403
spec is:
0.8235294117647058
[[126  27]
 [ 76  73]]
auroc: 0.6567311488353731
sens is:
0.5771812080536913
spec is:
0.7581699346405228
[[116  37]
 [ 63  86]]
auroc: 0.6676755713471071
Done validating models


In [None]:
grid = Tuner(X_train, y_train)

acc is:
0.7371857923497268
sens is:
0.5654450261780105
spec is:
0.8925750394944708
[[565  68]
 [249 324]]
acc is:
0.7421584699453552
sens is:
0.5706806282722513
spec is:
0.8973143759873617
[[568  65]
 [246 327]]
acc is:
0.7438114754098359
sens is:
0.5741710296684118
spec is:
0.8973143759873617
[[568  65]
 [244 329]]
acc is:
0.7429918032786885
sens is:
0.5706806282722513
spec is:
0.8988941548183255
[[569  64]
 [246 327]]
acc is:
0.7446311475409836
sens is:
0.5724258289703316
spec is:
0.9004739336492891
[[570  63]
 [245 328]]
acc is:
0.7413114754098361
sens is:
0.5724258289703316
spec is:
0.8941548183254344
[[566  67]
 [245 328]]
acc is:
0.7396448087431695
sens is:
0.5724258289703316
spec is:
0.8909952606635071
[[564  69]
 [245 328]]
acc is:
0.7396584699453552
sens is:
0.5724258289703316
spec is:
0.8909952606635071
[[564  69]
 [245 328]]
acc is:
0.7396584699453552
sens is:
0.5724258289703316
spec is:
0.8909952606635071
[[564  69]
 [245 328]]
acc is:
0.7404781420765028
sens is:
0.57242582

In [None]:
print(grid)

In [None]:
def VisualizeTuning(df, measure, vmin=None, vmax=None):

    (X_low, X_high, X_step) = X_bounds
  
    ax = sns.heatmap(df, 
                     vmin=vmin, 
                     vmax=vmax,
                     cmap="YlGnBu")
#     plt.xlabel('Malignant Lesion Class Weight')
#     plt.ylabel('C')
    #plt.colorbar()
    #g.axes.xticks(np.arange(8), np.arange(X_low, X_high, X_step))
    #g.axes.yticks(np.arange(8), np.arange(Y_low, Y_high, Y_step))
    ax.set_title('Exploration -- 10-fold CV %s'%measure)
    plt.show()

In [None]:
index = [str(.3 + a*.3) for a in range(20)]
graph_labels = ['Accuracy', 'Sensitivity', 'Specificity']
dfs = [pd.DataFrame({graph_labels[i]:grid[:,i]}, index = index) for i in range(3)]

for i in range(3):
    plt.figure()
    ax = sns.heatmap(dfs[i],  cmap="YlGnBu")

# Future Work

So, it turns out that the semantic features do a little something; unfortunately, we didn't have a lot of opportunities to tune the classifer or do more legit feature engineering, but it's been really exciting to explore the different features and approaches that we can take, even if we're not doing something flashy like CNNs. 

As mentioned in our introduction, vanilla ML is significantly more interpretable and parsable in terms of the importance of specific variables. In addition, vanilla ML allows us to incorporate expert opinion and analysis into our models. Because of this, our models do not need to learn specific relationships between masses and malignant character that is already common domain knowledge; most features focus on the irregularity of the mass's borders and deviance from sphericality. 
The interpretability of our vanilla ML models allowed us to draw several conclusions as a result of our research. Though we did not incorporate this into our final model, our exploratory research into PCA demonstrated the dominance of a few features in the variability of our model’s predictions. In addition, our hyperparameter tuning allowed us to formalize a commonly known wisdom about mammogram analysis; it is a difficult problem to differentiate between benign and malicious masses. 


For future work, more feature analysis could potentially result in useful findings and recommendations for pathologists who are examining mammograms. In addition, it would be worthwhile to compare the performance on this model to a CNN to see which images can be misclassified due to dependence on conventional metrics. Finding examples of such images that are successfully classified under CNN but not not correctly classified under vanilla ML approaches would provide excellent examples to provide to pathologists to warn them about the unconventional nature of different types of masses. In addition, due to both lack of time and the limited scope of the data provided, we were unable to to train models based on different orientations of mages. Training different models for different orientations could provide useful feedback on what variables are more indicative of a mass's malignant character in different kinds of images.


# References

Bradski, G. (2000). The OpenCV Library. Dr. Dobb's Journal of Software Tools

Britton P, Warwick J, Wallis MG, et al. Measuring the accuracy of diagnostic imaging in symptomatic breast patients: team and individual performance. The British Journal of Radiology. 2012;85(1012):415-422. doi:10.1259/bjr/32906819.

Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.

Stéfan van der Walt, Johannes L. Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D. Warner, Neil Yager, Emmanuelle Gouillart, Tony Yu and the scikit-image contributors. scikit-image: Image processing in Python. PeerJ 2:e453 (2014) http://dx.doi.org/10.7717/peerj.453

