# Responsiveness Predictive Model

This is a machine learning model that classifies the given lung image data into either responsive or unresponsive to medicine. It does so using a fourier transform of the image, which is then fed into either an SVM or Random Forest classifier. We achieve ~85-90% accuracy on the test set. 

The intuition for this model is as follows: The fourier transform breaks the image down into different features, sorted by frequency. Then, the classifier transforms this feature information into classification information. 

In [43]:
##imports
import torch
import torchvision
import torchvision.transforms as transforms

import numpy as np

import os
from pathlib import Path

from PIL import Image

import torch.nn as nn
import torch.nn.functional as F

from sklearn.svm import LinearSVC, SVC
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

import pickle

In [2]:
##defining labels for the dataset (in a compressed way)
images_path = Path('Lung/lung_images')

images_list = sorted(entry for entry in images_path.iterdir() if entry.is_file())

label_ctr = [5,5,15,7,9,5,4,18,4,9,4,6,5,4,5,8,8,4,81,5,9,4,5,6,5,10,4,3,9,19,5,6,33,50,5] #bad, good, bad, good ...

In [3]:
###definition of dataset
label_ctr = [5,5,15,7,9,5,4,18,4,9,4,6,5,4,5,8,8,4,81,5,9,4,5,6,5,10,4,3,9,19,5,6,33,50,5] #bad, good, bad, good ...
class LungMatrixDataset(torch.utils.data.Dataset):
    def __init__(self, label_ctr, img_path):
        super().__init__()
        self.labels = []
        for i in range(len(label_ctr)):
            self.labels += [((-1)**(i+1))] * label_ctr[i] #create a 1-0 list
        
        images_list = sorted(entry for entry in images_path.iterdir() if entry.is_file())
        self.arrays_list = []
        for item in images_list:
            if item.name == '.DS_Store':
                continue
            image = Image.open(item).convert('L')
            img_data = np.asarray(image)
            self.arrays_list.append((img_data))

    def __len__(self):
        return len(self.arrays_list)
        
    def __getitem__(self, idx):
        return (self.arrays_list[idx]), (self.labels[idx])

In [4]:
lungdat = LungMatrixDataset(label_ctr, 'Lung/lung_images')

In [45]:
##train test split
X_train, X_test, y_train, y_test = train_test_split(lungdat.arrays_list, lungdat.labels,  train_size = 0.75, random_state = 3)


In [46]:
X_train_fourier = np.fft.fft2(X_train)
X_test_fourier = np.fft.fft2(X_test)

In [47]:
X_train = np.absolute(X_train).reshape(len(X_train),600*800)
X_test = np.absolute(X_test).reshape(len(X_test),600*800)

X_train_fourier = np.absolute(X_train_fourier).reshape(len(X_train_fourier),600*800)
X_test_fourier = np.absolute(X_test_fourier).reshape(len(X_test_fourier),600*800)

In [41]:
svm = LinearSVC(random_state = 1)

In [42]:
svm.fit(X_train_fourier, y_train)

LinearSVC(random_state=1)

In [44]:
train_acc = svm.score(X_train_fourier, y_train)
test_acc = svm.score(X_test_fourier, y_test)
print(train_acc, test_acc)

1.0 0.8541666666666666


In [48]:
rfc = RandomForestClassifier(random_state = 1)

In [49]:
rfc.fit(X_train_fourier, y_train)

RandomForestClassifier(random_state=1)

In [50]:
rfc_train_acc = rfc.score(X_train_fourier, y_train)
rfc_test_acc = rfc.score(X_test_fourier, y_test)
print(rfc_train_acc, rfc_test_acc)

1.0 0.9583333333333334


SVM Test Performance
data random_state = 1, model random_state = 0: acc = 0.8542, 1: acc = 0.8542

RFC Test Performance
data random_state = 1, model random_state = 0: acc = 0.8854, 1: acc = 0.896
data random_state = 3, model random_state = 1: acc = 0.9583


Without Fourier:
train accuracy = 0.4271
test accuracy = 0.4792

In [88]:
filename = 'successfull_svm_1.sav'
pickle.dump(svm, open(filename, 'wb'))


In [89]:
filename = 'successfull_rfc_1.sav'
pickle.dump(rfc, open(filename, 'wb'))
