# ***Classification of MRI scans for dementia patients (wow!)***

Group members:


*   John Fortner, jfortner8
*   Jessica Hernandez, jhernandez312



**Introduction**

With the advent of MRI technology, researchers are becoming better and better at understanding how the human brain functions, and what kinds of damage have what sympotmatic effects. Using machine learning techniques, we intend to use MRI images of a variety of patients with different extents of dementia in order to train a classifier that can estimate the extent of dementia progression in a given patient using an MRI scan.

We will attempt to classify images into 4 categories, assocaited with one the following states: NotDemented, VeryMildDemented, MildDemented, or ModerateDemented. Using this classifier, we can predict given an MRI scan to what extent an individual is likely affected by Alzheimer's, making accurate prediction of onset and access to appropriate treatments easier to obtain.

In [None]:
import os
import numpy as numnum
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image

import tensorflow as tf

import skimage
from skimage import color
from skimage import io
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.optim  as optim
from torchvision import datasets, models, transforms, utils
import torch.nn.functional as F
from torchvision.datasets import MNIST
from torch.utils.data import DataLoader

from sklearn.feature_extraction import text
from sklearn.model_selection import train_test_split


#! mkdir ~/.kaggle
#! cp kaggle.json ~/.kaggle/
#! chmod 600 ~/.kaggle/kaggle.json
#! kaggle datasets download datasets/tourist55/alzheimers-dataset-4-class-of-images
#! unzip alzheimers-dataset-4-class-of-images.zip

In [None]:
! pip install kaggle

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
#!rm -r ~/.kaggle
! mkdir ~/.kaggle/

In [None]:
!mv ./kaggle.json ~/.kaggle/

mv: cannot stat './kaggle.json': No such file or directory


In [None]:
!chmod 600 ~/.kaggle/kaggle.json

chmod: cannot access '/root/.kaggle/kaggle.json': No such file or directory


In [None]:
 ! kaggle datasets download tourist55/alzheimers-dataset-4-class-of-images

Downloading alzheimers-dataset-4-class-of-images.zip to /content
 73% 25.0M/34.1M [00:00<00:00, 56.8MB/s]
100% 34.1M/34.1M [00:00<00:00, 62.6MB/s]


In [None]:
#%%capture
!unzip alzheimers-dataset-4-class-of-images.zip

# Code


In [None]:
train_dataset = datasets.ImageFolder(root = 'Alzheimer_s Dataset/train', transform=transforms.Compose([transforms.ToTensor(), transforms.Grayscale(num_output_channels=1)]))
test_dataset = datasets.ImageFolder('Alzheimer_s Dataset/test', transform=transforms.Compose([transforms.ToTensor(), transforms.Grayscale(num_output_channels=1)]))

numDataPoints = len(train_dataset)
num_classes = 4



# We have our data which is imbalanced
# We intend to create an array of weights corresponding to each sampler
# We can then use this array to create a WeightedRandomSampler, which will balance our samples
print(train_dataset[0][0].shape)

#img = numnum.copy(train_dataset[0][0])
# shifted_train_dataset = numnum.moveaxis(train_dataset,1, -1)

# gray_train_dataset = color.rgb2gray(shifted_train_dataset)
# print(gray_train_dataset.shape)
# plt.imshow(gray_train_dataset[0][0], cmap='magma')


#This creates an array to store all of the labels in the same order they appear as the images
labels = numnum.ndarray(len(train_dataset))
for data_ind in range(len(train_dataset)):
  class_num = train_dataset[data_ind][1]
  labels[data_ind] = class_num

labels = labels.astype(int)
print(labels)

#This loop creates another array that stores the number of times each label appears
class_sample_count = numnum.array([len(numnum.where(labels == t)[0]) for t in numnum.unique(labels)])
print(class_sample_count)

#This array holds the appropriate weight for each label according to the class with which it is associated
weight = 1.0 / class_sample_count
print(weight)

#This array holds the appropriate weight (unnormalized) for each datapoint
datapoint_weights = numnum.array([weight[i] for i in labels])
#datapoint_weights /= num_classes
torch.from_numpy(datapoint_weights)

#This creates the appropriate weighted sampler
weighted_sampler = torch.utils.data.WeightedRandomSampler(datapoint_weights, len(datapoint_weights))
print(weighted_sampler)

#This creates the DataLoaders, using the weighted sampler for the train_loader
train_loader = DataLoader(train_dataset, batch_size=12, num_workers=1, sampler=weighted_sampler)
test_loader = DataLoader(test_dataset, batch_size=12, num_workers=1, shuffle = True)

print("Dataloaders created!")


# for i, (data, target) in enumerate(train_loader):
#     print ("batch index {}, 0/1/2/3: {}/{}/{}/{}".format(
#         i,
#         len(numnum.where(target.numpy() == 0)[0]),
#         len(numnum.where(target.numpy() == 1)[0]),
#         len(numnum.where(target.numpy() == 2)[0]),
#         len(numnum.where(target.numpy() == 3)[0])))

torch.Size([1, 208, 176])
[0 0 0 ... 3 3 3]
[ 717   52 2560 1792]
[0.0013947  0.01923077 0.00039063 0.00055804]
<torch.utils.data.sampler.WeightedRandomSampler object at 0x7f01b7bb1f10>
Dataloaders created!


In [None]:
#Now we can do the model stuff!!!
class CONV_NET(nn.Module):
  def __init__(self, big_dropout_rate = 0.0, small_dropout_rate = 0.0, input_size = (176*208*1), num_classes = 4):
    super().__init__()

    self.conv1 = nn.Conv2d(in_channels = 1, out_channels = 6, kernel_size = 5)
    self.relu = nn.ReLU()
    self.conv2 = nn.Conv2d(in_channels = 6, out_channels = 15, kernel_size = 5)
    self.linear1 = nn.Linear(30135, 2340)
    self.linear2 = nn.Linear(2340, 420)
    self.linear3 = nn.Linear(420, num_classes)
    self.pool = nn.MaxPool2d(2, stride = 2)
    self.drop_out_of_school = nn.Dropout(big_dropout_rate)
    self.drop_out_of_prek = nn.Dropout(small_dropout_rate)


  def forward(self, x):
    x = self.pool(self.relu(self.conv1(x)))
    x = self.pool(self.relu(self.conv2(x)))
    x = self.drop_out_of_school(torch.flatten(x, 1))
    x = self.drop_out_of_prek(self.relu(self.linear1(x)))
    x = self.relu(self.linear2(x))
    x = self.linear3(x)
    return x

In [None]:
#TRAINING THE MODEL
def train(model, train_loader, test_loader, num_epochs = 10):

  optimizer = optim.Adamax(model.parameters(), lr = .003)
  loss_func = nn.CrossEntropyLoss()


  #run the stuff
  for epoch in range(num_epochs):
    model.train()
    for i, data in enumerate(train_loader, 0):

      inputs, labels = data

      out = model.forward(inputs)
      #print("LABELS", labels)
      #print(out.argmax(dim=1, keepdim=True))
      loss = loss_func(out, labels)
      optimizer.zero_grad()
      loss.backward()
      optimizer.step()

    print("epoch number", epoch + 1, "complete")
    print("Training accuracy:", test(model, train_loader))
    print("Testing accuracy:", test(model, test_loader))
    print()

  #maybe test after each epoch?

  # Process is complete.
  print('Training process has finished.')



In [None]:
#TESTING THE MODEL
def test(model, test_loader):
    model.eval()
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            #print(output)

            # Retrieve the index associated with the highest probability
            pred_i = numnum.argmax(output)
            pred = output.argmax(dim=1, keepdim=True)
            #print(pred)
            correct += pred.eq(target.view_as(pred)).sum().item()

    #return 100.0 * correct / len(test_loader.dataset)

    #print('\nTest set: Accuracy: {}/{} ({:.0f}%)\n'.format(
    #    correct, len(test_loader.dataset),
    #    100.0 * correct / len(test_loader.dataset)))
    return '{}/{} ({:.0f}%)\n'.format(
        correct, len(test_loader.dataset),
        100.0 * correct / len(test_loader.dataset))

In [None]:
#now we actually run the model
alz_conv_model = CONV_NET(big_dropout_rate = 0.8, small_dropout_rate = 0.45)
train(alz_conv_model, train_loader, test_loader, num_epochs = 50)
test(alz_conv_model, test_loader)
from google.colab import output
#output.eval_js('new Audio("https://upload.wikimedia.org/wikipedia/commons/7/71/EAT_-_06_-_Touch_a_Clown_in_the_Frightness.ogg").play()')
output.eval_js('new Audio("https://upload.wikimedia.org/wikipedia/commons/0/0a/Bach_-_cantata_140._2._recitative.ogg").play()')

epoch number 1 complete
Training accuracy: 2770/5121 (54%)

Testing accuracy: 256/1279 (20%)


epoch number 2 complete
Training accuracy: 3737/5121 (73%)

Testing accuracy: 659/1279 (52%)


epoch number 3 complete
Training accuracy: 3985/5121 (78%)

Testing accuracy: 593/1279 (46%)


epoch number 4 complete
Training accuracy: 4382/5121 (86%)

Testing accuracy: 619/1279 (48%)


epoch number 5 complete
Training accuracy: 4667/5121 (91%)

Testing accuracy: 679/1279 (53%)


epoch number 6 complete
Training accuracy: 4757/5121 (93%)

Testing accuracy: 717/1279 (56%)


epoch number 7 complete
Training accuracy: 4760/5121 (93%)

Testing accuracy: 629/1279 (49%)


epoch number 8 complete
Training accuracy: 4851/5121 (95%)

Testing accuracy: 730/1279 (57%)


epoch number 9 complete
Training accuracy: 5006/5121 (98%)

Testing accuracy: 762/1279 (60%)


epoch number 10 complete
Training accuracy: 5038/5121 (98%)

Testing accuracy: 802/1279 (63%)


epoch number 11 complete
Training accuracy: 5063/