# Final prediction

We have trained two separate models, one for diagnosing alzheimer's from handwriting data, and the other one from  MRI images. We now combine both the models to give us predictions for diagnosis. Both the models have been saved and their weights are first loaded. 

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
from tensorflow.keras.models import load_model
model = load_model('/kaggle/input/mri-predictor/tensorflow2/f10.96-2/1/my_cnn_model.h5')


2024-04-24 13:32:22.531260: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-04-24 13:32:22.531369: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-04-24 13:32:22.812686: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [3]:
import pickle

with open('/kaggle/input/handwriitng-classifier/other/claassifier-gb/1/gb_classifier_weights.pkl', 'rb') as file:
    handwriting_model = pickle.load(file)

# Test MRI images

We preprocess the test MRI images on which we shall be making predictions. The jpg images are converted to npy and saved along with their class names and image resizing.

In [4]:
import os
import cv2
import numpy as np

def jpg_to_npy_in_folders(input_folder, output_folder):
    os.makedirs(output_folder, exist_ok=True)
    for i, item in enumerate(os.listdir(input_folder)):
        item_path = os.path.join(input_folder, item)
        if os.path.isdir(item_path):
            jpg_to_npy_in_folders(item_path, output_folder)
        elif os.path.isfile(item_path) and item.endswith(".jpg"):
            img = cv2.imread(item_path)
            img = cv2.resize(img, (256, 256)) 
            img_array = np.array(img)
            folder = input_folder.split("/")[-1]
            output_file_path = os.path.join(output_folder, f"{folder}{i}.npy")
            np.save(output_file_path, img_array)

jpg_to_npy_in_folders("/kaggle/input/alzheimersdisease5classdatasetadni/Alzheimers-ADNI/train", "/kaggle/working/test")

In [5]:
import os
import numpy as np
directory = '/kaggle/working/test'

files = [file for file in os.listdir(directory) if file.endswith('.npy')]
len(files)

1101

## Class Separation 

The preprocessed images are separated on the basis of class to measure the accuracy of prediction.

In [6]:
ad = [i for i in files if "AD" in i]
cn = [i for i in files if "CN" in i]
emci = [i for i in files if "EMCI" in i]
lmci = [i for i in files if "LMCI" in i]
mci = [i for i in files if " MCI " in i]
print(len(ad), len(cn), len(emci), len(lmci), len(mci))

145 493 204 61 198


## Adding into dataframe

The image paths along with the classes are added to a dataframe for easier processing

In [7]:
import pandas as pd
l = [[ad, 0], [cn, 1], [emci, 2], [lmci, 3], [mci, 4]]
x = []
for i in l:
    for j in i[0]:
        path = os.path.join("/kaggle/working/test", j)
        x.append([i[1], path])

df = pd.DataFrame(x, columns=["Class", "Path"])


# Handwriting Data

The handwriting dataset is loaded and the classes are converted to numerical format

In [8]:
import pandas as pd
df1 = pd.read_csv("/kaggle/input/handwriting-data-to-detect-alzheimers-disease/data.csv")
df1["class"] = pd.Series([1 if i == 'P' else 0 for i in df1["class"]])
df1

Unnamed: 0,ID,air_time1,disp_index1,gmrt_in_air1,gmrt_on_paper1,max_x_extension1,max_y_extension1,mean_acc_in_air1,mean_acc_on_paper1,mean_gmrt1,...,mean_jerk_in_air25,mean_jerk_on_paper25,mean_speed_in_air25,mean_speed_on_paper25,num_of_pendown25,paper_time25,pressure_mean25,pressure_var25,total_time25,class
0,id_1,5160,0.000013,120.804174,86.853334,957,6601,0.361800,0.217459,103.828754,...,0.141434,0.024471,5.596487,3.184589,71,40120,1749.278166,296102.7676,144605,1
1,id_2,51980,0.000016,115.318238,83.448681,1694,6998,0.272513,0.144880,99.383459,...,0.049663,0.018368,1.665973,0.950249,129,126700,1504.768272,278744.2850,298640,1
2,id_3,2600,0.000010,229.933997,172.761858,2333,5802,0.387020,0.181342,201.347928,...,0.178194,0.017174,4.000781,2.392521,74,45480,1431.443492,144411.7055,79025,1
3,id_4,2130,0.000010,369.403342,183.193104,1756,8159,0.556879,0.164502,276.298223,...,0.113905,0.019860,4.206746,1.613522,123,67945,1465.843329,230184.7154,181220,1
4,id_5,2310,0.000007,257.997131,111.275889,987,4732,0.266077,0.145104,184.636510,...,0.121782,0.020872,3.319036,1.680629,92,37285,1841.702561,158290.0255,72575,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
169,id_170,2930,0.000010,241.736477,176.115957,1839,6439,0.253347,0.174663,208.926217,...,0.119152,0.020909,4.508709,2.233198,96,44545,1798.923336,247448.3108,80335,0
170,id_171,2140,0.000009,274.728964,234.495802,2053,8487,0.225537,0.174920,254.612383,...,0.174495,0.017640,4.685573,2.806888,84,37560,1725.619941,160664.6464,345835,0
171,id_172,3830,0.000008,151.536989,171.104693,1287,7352,0.165480,0.161058,161.320841,...,0.114472,0.017194,3.493815,2.510601,88,51675,1915.573488,128727.1241,83445,0
172,id_173,1760,0.000008,289.518195,196.411138,1674,6946,0.518937,0.202613,242.964666,...,0.114472,0.017194,3.493815,2.510601,88,51675,1915.573488,128727.1241,83445,0


## Random sampling

For the purpose of prediction, we take a random sample of 50 entries from the handwriting dataset. We find the value counts of the number of each class so that a suitable image of the supporting class can be added to our sampled data for prediction purposes.

In [9]:
x = df1.sample(50)
a = x["class"].value_counts()


In [10]:
q = df.sample(100)

# Creation of testing dataframes

We create two dataframes, one for the alzhiemer's case and one for the non alzheimer's case. We add a random sampled image path of appropriate class to each dataframe of the adequate length and this can be used as the test data to make the final predictions.

### Non alzheimer data

In [11]:
nonAlz = x[x["class"] == 0]
d = []
e = []
count = 0
for i in q.values:
    if count == a[0]:
        break
    if i[0] == 1:
        d.append(i[1])
        e.append(i[0])
        count += 1

nonAlz.loc[:, "path"] = pd.Series(d, index=nonAlz.index)
nonAlz.loc[:, "mri-class"] = pd.Series(e, index=nonAlz.index)
nonAlz

Unnamed: 0,ID,air_time1,disp_index1,gmrt_in_air1,gmrt_on_paper1,max_x_extension1,max_y_extension1,mean_acc_in_air1,mean_acc_on_paper1,mean_gmrt1,...,mean_speed_in_air25,mean_speed_on_paper25,num_of_pendown25,paper_time25,pressure_mean25,pressure_var25,total_time25,class,path,mri-class
118,id_119,540,7e-06,557.998222,221.396848,1697,6343,0.231201,0.157785,389.697535,...,5.741075,2.53596,73,31590,1942.588003,97106.70791,55120,0,/kaggle/working/test/Final CN JPEG204.npy,1
105,id_106,3990,1e-05,289.788098,180.020926,1348,7379,0.188276,0.147752,234.904512,...,4.997431,3.366661,90,31715,1007.996847,94731.67207,339675,0,/kaggle/working/test/Final CN JPEG318.npy,1
135,id_136,53875,1.4e-05,152.87841,202.675278,18602,6530,0.547489,0.161733,177.776844,...,4.770989,4.387201,120,33060,1634.794465,223263.3926,657345,0,/kaggle/working/test/Final CN JPEG56.npy,1
124,id_125,5120,8e-06,444.560149,206.012058,1883,7998,2.311424,0.207807,325.286103,...,5.226698,2.923295,74,34915,1532.649434,165058.1907,59585,0,/kaggle/working/test/Final CN JPEG158.npy,1
101,id_102,96686,6e-06,358.184944,150.069558,1439,4351,1.553591,0.139658,254.127251,...,4.512446,2.767362,56,36095,1489.56282,109525.6773,55035,0,/kaggle/working/test/Final CN JPEG329.npy,1
133,id_134,2275,9e-06,210.808548,192.709628,2272,7044,0.257618,0.152951,201.759088,...,5.297333,3.282824,71,31485,1778.996189,164438.1775,48445,0,/kaggle/working/test/Final CN JPEG380.npy,1
110,id_111,1270,8e-06,324.458938,180.769514,1382,6677,0.32265,0.15945,252.614226,...,5.322031,3.239461,81,31500,1876.255079,154462.6802,49935,0,/kaggle/working/test/Final CN JPEG113.npy,1
112,id_113,5880,7e-06,167.703955,332.126873,1402,5626,0.33952,0.250085,249.915414,...,6.945456,3.629327,39,30835,1933.080104,86695.5882,43805,0,/kaggle/working/test/Final CN JPEG32.npy,1
152,id_153,1705,8e-06,394.0583,215.581714,2245,7316,0.279453,0.203293,304.820007,...,6.386259,3.044282,43,30670,1933.569775,131252.4962,41660,0,/kaggle/working/test/Final CN JPEG43.npy,1
128,id_129,6645,6e-06,137.550578,158.177782,962,5264,0.225044,0.145105,147.86418,...,3.165987,2.472574,131,42575,1790.781679,198167.4021,90850,0,/kaggle/working/test/Final CN JPEG294.npy,1


### Alzheimer data

In [12]:
Alz = x[x["class"] == 1]
d = []
e = []
count = 0
for i in q.values:
    if count == a[1]:
        break
    if i[0] != 1:
        d.append(i[1])
        e.append(i[0])
        count += 1

Alz.loc[:, "path"] = pd.Series(d, index=Alz.index)
Alz.loc[:, "mri-class"] = pd.Series(e, index=Alz.index)

Alz

Unnamed: 0,ID,air_time1,disp_index1,gmrt_in_air1,gmrt_on_paper1,max_x_extension1,max_y_extension1,mean_acc_in_air1,mean_acc_on_paper1,mean_gmrt1,...,mean_speed_in_air25,mean_speed_on_paper25,num_of_pendown25,paper_time25,pressure_mean25,pressure_var25,total_time25,class,path,mri-class
54,id_55,21800,1.9e-05,65.234661,72.40753,2931,8878,0.208208,0.164452,68.821096,...,4.316327,2.991742,129,47705,1764.116654,249158.7225,154130,1,/kaggle/working/test/Final MCI JPEG74.npy,4
86,id_87,1920,1e-05,306.833283,208.536952,1995,6571,0.190981,0.158575,257.685117,...,4.806361,3.335828,60,35920,1929.296214,94373.02807,59175,1,/kaggle/working/test/Final LMCI JPEG30.npy,3
18,id_19,6365,1.1e-05,78.339161,118.660574,1425,7755,0.127188,0.119886,98.499867,...,3.498404,2.073031,119,59035,474.049462,26984.92666,177155,1,/kaggle/working/test/Final EMCI JPEG196.npy,2
24,id_25,5000,9e-06,229.137054,160.55596,1422,7058,0.223681,0.154142,194.846507,...,3.337732,1.126345,113,107945,1708.374543,191200.2791,245265,1,/kaggle/working/test/Final MCI JPEG138.npy,4
13,id_14,1190,8e-06,348.475752,197.691413,1739,7297,0.189114,0.158889,273.083583,...,1.862117,3.123309,74,47020,1563.536368,224006.6311,96260,1,/kaggle/working/test/Final EMCI JPEG71.npy,2
84,id_85,3490,1.3e-05,142.639451,104.093729,1336,6508,0.215218,0.116111,123.36659,...,4.806361,3.335828,60,35920,1929.296214,94373.02807,59175,1,/kaggle/working/test/Final MCI JPEG142.npy,4
63,id_64,3845,1e-05,130.925954,232.722833,1976,8176,0.203339,0.167934,181.824394,...,3.268645,3.221813,103,53655,1611.441338,234715.4461,123765,1,/kaggle/working/test/Final LMCI JPEG34.npy,3
16,id_17,5655,1e-05,124.529036,120.054474,1336,6170,0.482081,0.131626,122.291755,...,4.001031,1.267222,129,92710,793.470338,120854.9382,209480,1,/kaggle/working/test/Final AD JPEG24.npy,0
71,id_72,2615,1.4e-05,637.692285,228.919963,2105,13749,0.268277,0.15577,433.306124,...,5.473425,3.674722,98,30115,1939.105761,101814.9131,75615,1,/kaggle/working/test/Final EMCI JPEG183.npy,2
48,id_49,2400,9e-06,174.058733,126.660105,1251,6197,0.163816,0.163628,150.359419,...,5.040778,2.499739,108,37860,1469.414025,168462.5506,91090,1,/kaggle/working/test/Final AD JPEG142.npy,0


## Final test data

For the purpoe of representation, we select 7 random samples from the non alzheimer's dataframe and 23 from the alzheiemer's dataframe. This is the final test data we shall be making predictions on.

In [13]:
s = [nonAlz, Alz]
x = []
for i in range(30):
    r = s[0] if i < 7 else s[1]
    x.append(r.sample(1))

# Prediction method

This method takes in the handwriting information and the MRI image. A pipeline is created, where the XGBoost model uses handwriting data to check for possibilty of alzheimer's and if there is a positive probability, it makes a prediction using the CNN model on the MRI data and gives the class of Alzheimer's present in the patient. The metrics have already been shown in the training files and since this is direct application, the metrics will remain the same. 

In [14]:
def predictor(data):
    inp = data.drop(["ID", "class", "path", "mri-class"], axis=1)
    handwriting_prediction = handwriting_model.predict(inp)
    if handwriting_prediction.any() == 1:
        img = np.load(data["path"].values[0])
        img = np.expand_dims(img, axis=0) 
        prediction = np.argmax(model.predict(img), axis=1)[0]
        actual_case = f"have alzheimer's, type {data['mri-class'].values[0]}" if data["class"].any() == 1 else "you dont have alzheimer's"
        print(f"Model predicted - You have alzheimer's, type {prediction}")
        print(f"Actual case - You {actual_case}")
    else:
        print("Model predicted - You dont have alzheimer's")
        print(f"Actual case - You {'have' if data['class'].any() == 1 else 'dont have'} alzheimer's")

## Predictions 

The final predictions and the actual class are printed out to show the high level of accuracy of the prediction pipeline.

In [15]:
for j, i in enumerate(x):
    print(f"Sample {j}")
    predictor(i)
    print("---------------------------------------------------------")

Sample 0
Model predicted - You dont have alzheimer's
Actual case - You dont have alzheimer's
---------------------------------------------------------
Sample 1
Model predicted - You dont have alzheimer's
Actual case - You dont have alzheimer's
---------------------------------------------------------
Sample 2
Model predicted - You dont have alzheimer's
Actual case - You dont have alzheimer's
---------------------------------------------------------
Sample 3
Model predicted - You dont have alzheimer's
Actual case - You dont have alzheimer's
---------------------------------------------------------
Sample 4
Model predicted - You dont have alzheimer's
Actual case - You dont have alzheimer's
---------------------------------------------------------
Sample 5
Model predicted - You dont have alzheimer's
Actual case - You dont have alzheimer's
---------------------------------------------------------
Sample 6
Model predicted - You dont have alzheimer's
Actual case - You dont have alzheimer's
-

I0000 00:00:1713965585.262694      71 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


Model predicted - You have alzheimer's, type 0
Actual case - You have alzheimer's, type 0
---------------------------------------------------------
Sample 10
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
Model predicted - You have alzheimer's, type 4
Actual case - You have alzheimer's, type 4
---------------------------------------------------------
Sample 11
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 18ms/step
Model predicted - You have alzheimer's, type 3
Actual case - You have alzheimer's, type 3
---------------------------------------------------------
Sample 12
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 19ms/step
Model predicted - You have alzheimer's, type 1
Actual case - You have alzheimer's, type 2
---------------------------------------------------------
Sample 13
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 20ms/step
Model predicted - You have alzheimer's, type 4
Actual case - You have alzheimer'