
# Nephros: Kidney Disease Detection

This notebook demonstrates the process of classifying kidney diseases using the Kidney CT Scan Dataset. It covers:

1. Dataset Preparation
2. Data Preprocessing
3. Model Development
4. Model Evaluation
5. Conclusions and Next Steps
    

## 1. Dataset Preparation

These first steps can be run to download the dataset. Be sure to have the Kaggle API downloaded already!

In [None]:
from google.colab import files
files.upload()


Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"krishs0404","key":"48a19c16b8f95536d048e15a1e28bcb6"}'}

In [None]:
!kaggle datasets download -d anima890/kidney-ct-scan


Dataset URL: https://www.kaggle.com/datasets/anima890/kidney-ct-scan
License(s): unknown
Downloading kidney-ct-scan.zip to /content
 98% 1.49G/1.52G [00:10<00:00, 139MB/s]
100% 1.52G/1.52G [00:10<00:00, 148MB/s]


In [None]:
!unzip -o kidney-ct-scan.zip -d kidney_ct_scan


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Normal/Normal- (4363).jpg  
  inflating: kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Normal/Normal- (4364).jpg  
  inflating: kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Normal/Normal- (4365).jpg  
  inflating: kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Normal/Normal- (4366).jpg  
  inflating: kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Normal/Normal- (4367).jpg  
  inflating: kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/Normal/Normal- (4368).jpg  
  inflating: kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-

In [None]:
import os

# Path to the extracted dataset
dataset_path = '/content/kidney_ct_scan'
print(os.listdir(dataset_path))  # Check what files or folders exist


['kidneyData.csv', 'CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone']


In [None]:
csv_path = os.path.join(dataset_path, 'kidneyData.csv')  # Update with the actual name if needed
import pandas as pd

data = pd.read_csv(csv_path)
print(data.head())  # Preview the first few rows


   Unnamed: 0       image_id  \
0           0  Tumor- (1044)   
1           1    Tumor- (83)   
2           2   Tumor- (580)   
3           3  Tumor- (1701)   
4           4  Tumor- (1220)   

                                                path   diag  target  Class  
0  /content/data/CT KIDNEY DATASET Normal, CYST, ...  Tumor       3  Tumor  
1  /content/data/CT KIDNEY DATASET Normal, CYST, ...  Tumor       3  Tumor  
2  /content/data/CT KIDNEY DATASET Normal, CYST, ...  Tumor       3  Tumor  
3  /content/data/CT KIDNEY DATASET Normal, CYST, ...  Tumor       3  Tumor  
4  /content/data/CT KIDNEY DATASET Normal, CYST, ...  Tumor       3  Tumor  


## 2. Data Preprocessing

In the next few steps, you can inspect and play around with the data. Set up directories in order to properly set up the data for later steps.

In [None]:
import os
import pandas as pd
import shutil
from sklearn.model_selection import train_test_split

# Define the path to the dataset
dataset_csv_path = '/content/kidney_ct_scan/kidneyData.csv'  # Path to the CSV file
base_dir = '/content/data'  # Base directory for organized data

# Load the dataset
data = pd.read_csv(dataset_csv_path)

# Preview the data
print(data.head())


   Unnamed: 0       image_id  \
0           0  Tumor- (1044)   
1           1    Tumor- (83)   
2           2   Tumor- (580)   
3           3  Tumor- (1701)   
4           4  Tumor- (1220)   

                                                path   diag  target  Class  
0  /content/data/CT KIDNEY DATASET Normal, CYST, ...  Tumor       3  Tumor  
1  /content/data/CT KIDNEY DATASET Normal, CYST, ...  Tumor       3  Tumor  
2  /content/data/CT KIDNEY DATASET Normal, CYST, ...  Tumor       3  Tumor  
3  /content/data/CT KIDNEY DATASET Normal, CYST, ...  Tumor       3  Tumor  
4  /content/data/CT KIDNEY DATASET Normal, CYST, ...  Tumor       3  Tumor  


In [None]:
# Update the path column to point to the correct directory
data['path'] = data['path'].str.replace('/content/data', '/content/kidney_ct_scan')

# Verify the updated paths
print(data['path'].head())


0    /content/kidney_ct_scan/CT KIDNEY DATASET Norm...
1    /content/kidney_ct_scan/CT KIDNEY DATASET Norm...
2    /content/kidney_ct_scan/CT KIDNEY DATASET Norm...
3    /content/kidney_ct_scan/CT KIDNEY DATASET Norm...
4    /content/kidney_ct_scan/CT KIDNEY DATASET Norm...
Name: path, dtype: object


In [None]:
import os

# List the contents of the main dataset directory
dataset_path = '/content/kidney_ct_scan'
print(os.listdir(dataset_path))  # Show the main folder contents


['kidneyData.csv', 'CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone']


In [None]:
for folder in os.listdir(dataset_path):
    folder_path = os.path.join(dataset_path, folder)
    if os.path.isdir(folder_path):
        print(f"Folder: {folder}")
        print(f"Sample Files: {os.listdir(folder_path)[:5]}")  # Show first 5 files


Folder: CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone
Sample Files: ['CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone']


In [None]:
subfolder_path = os.path.join(dataset_path, 'CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone')
print(os.listdir(subfolder_path)[:10])  # Show first 10 files


['CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone']


In [None]:
nested_folder_path = os.path.join(subfolder_path, 'CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone')
print(os.listdir(nested_folder_path)[:10])  # Show first 10 items in the nested folder


['Normal', 'Tumor', 'Stone', 'Cyst']


In [None]:
# Update the paths to include the nested folder structure
data['path'] = data['path'].str.replace(
    '/content/data/CT KIDNEY DATASET Normal, CYST, TUMOR and STONE',
    '/content/kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone'
)

# Verify updated paths
print(data['path'].head())


0    /content/kidney_ct_scan/CT KIDNEY DATASET Norm...
1    /content/kidney_ct_scan/CT KIDNEY DATASET Norm...
2    /content/kidney_ct_scan/CT KIDNEY DATASET Norm...
3    /content/kidney_ct_scan/CT KIDNEY DATASET Norm...
4    /content/kidney_ct_scan/CT KIDNEY DATASET Norm...
Name: path, dtype: object


In [None]:
import os

nested_folder_path = '/content/kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone'
for class_name in os.listdir(nested_folder_path):
    class_folder = os.path.join(nested_folder_path, class_name)
    if os.path.isdir(class_folder):
        print(f"Class: {class_name}")
        print(f"Sample files: {os.listdir(class_folder)[:5]}")  # Show first 5 files


Class: Normal
Sample files: ['Normal- (2874).jpg', 'Normal- (3812).jpg', 'Normal- (1640).jpg', 'Normal- (2553).jpg', 'Normal- (4579).jpg']
Class: Tumor
Sample files: ['Tumor- (204).jpg', 'Tumor- (1519).jpg', 'Tumor- (1077).jpg', 'Tumor- (1926).jpg', 'Tumor- (1241).jpg']
Class: Stone
Sample files: ['Stone- (1210).jpg', 'Stone- (378).jpg', 'Stone- (595).jpg', 'Stone- (120).jpg', 'Stone- (731).jpg']
Class: Cyst
Sample files: ['Cyst- (492).jpg', 'Cyst- (1214).jpg', 'Cyst- (1770).jpg', 'Cyst- (1044).jpg', 'Cyst- (2927).jpg']


Now, that I have found the fact that the data has a nested folder, I move on to making sure that the data is clean (there are no corrupt or missing images)

In [None]:
import pandas as pd
import os

# Path to the nested folder
nested_folder_path = '/content/kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone'

# Create a new DataFrame with correct paths
file_paths = []
labels = []

for class_name in os.listdir(nested_folder_path):
    class_folder = os.path.join(nested_folder_path, class_name)
    if os.path.isdir(class_folder)
        for file_name in os.listdir(class_folder):
            file_paths.append(os.path.join(class_folder, file_name))
            labels.append(class_name)  # Use the folder name as the label

# Create a DataFrame
data_cleaned = pd.DataFrame({'path': file_paths, 'Class': labels})

# Verify the new DataFrame
print(data_cleaned.head())
print(f"Number of files: {len(data_cleaned)}")


                                                path   Class
0  /content/kidney_ct_scan/CT-KIDNEY-DATASET-Norm...  Normal
1  /content/kidney_ct_scan/CT-KIDNEY-DATASET-Norm...  Normal
2  /content/kidney_ct_scan/CT-KIDNEY-DATASET-Norm...  Normal
3  /content/kidney_ct_scan/CT-KIDNEY-DATASET-Norm...  Normal
4  /content/kidney_ct_scan/CT-KIDNEY-DATASET-Norm...  Normal
Number of files: 12446


In [None]:
missing_files = []
for path in data_cleaned['path']:
    if not os.path.exists(path):
        missing_files.append(path)

print(f"Number of missing files: {len(missing_files)}")
print(f"Sample missing files: {missing_files[:5]}")


Number of missing files: 0
Sample missing files: []


In [None]:
dataset_path = '/content/kidney_ct_scan/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone/CT-KIDNEY-DATASET-Normal-Cyst-Tumor-Stone'


For this version due to a limited availability to computational resources, I will be taking a subset of the data and splitting it into training, validation, and test sets.

In [None]:
from PIL import Image
import shutil
import random

def create_subset(input_dir, output_dir, fraction=0.2, target_size=(224, 224)):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    for class_name in os.listdir(input_dir):
        class_dir = os.path.join(input_dir, class_name)
        if not os.path.isdir(class_dir):
            continue

        output_class_dir = os.path.join(output_dir, class_name)
        os.makedirs(output_class_dir, exist_ok=True)

        file_paths = [os.path.join(class_dir, f) for f in os.listdir(class_dir)]
        sampled_files = random.sample(file_paths, int(len(file_paths) * fraction))

        for file_path in sampled_files:
            with Image.open(file_path) as img:
                img = img.resize(target_size)
                img.save(os.path.join(output_class_dir, os.path.basename(file_path)))

# Create a subset of the data
create_subset(
    dataset_path,
    '/content/kidney_ct_scan_subset',
    fraction=0.2  # Use 20% of the data
)


In [None]:
import os
from sklearn.model_selection import train_test_split
import shutil

def split_dataset(input_dir, output_dir, train_frac=0.6, val_frac=0.2, test_frac=0.2):
    # Ensure output directories exist
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    train_dir = os.path.join(output_dir, 'train')
    val_dir = os.path.join(output_dir, 'val')
    test_dir = os.path.join(output_dir, 'test')

    os.makedirs(train_dir, exist_ok=True)
    os.makedirs(val_dir, exist_ok=True)
    os.makedirs(test_dir, exist_ok=True)

    # Process each class
    for class_name in os.listdir(input_dir):
        class_dir = os.path.join(input_dir, class_name)
        if not os.path.isdir(class_dir):
            continue

        # List all files in the class directory
        files = os.listdir(class_dir)

        # Split files into train and temp (val + test)
        train_files, temp_files = train_test_split(files, test_size=(1 - train_frac), random_state=42)

        # Further split temp into val and test
        val_files, test_files = train_test_split(
            temp_files,
            test_size=(test_frac / (val_frac + test_frac)),
            random_state=42
        )

        # Copy files to their respective directories
        for file_set, target_dir in zip([train_files, val_files, test_files], [train_dir, val_dir, test_dir]):
            class_output_dir = os.path.join(target_dir, class_name)
            os.makedirs(class_output_dir, exist_ok=True)
            for file_name in file_set:
                shutil.copy(os.path.join(class_dir, file_name), os.path.join(class_output_dir, file_name))

# Usage
split_dataset(
    '/content/kidney_ct_scan_subset',  # Input dataset path
    '/content/kidney_ct_scan_split',  # Output dataset path
    train_frac=0.6,                   # 60% training
    val_frac=0.2,                     # 20% validation
    test_frac=0.2                     # 20% test
)


## 3. Model Development

In this section, I play around with different machine learning techniques to improve my model. I first implement transfer learning for the pre-trained model MobileNetV2. Then, I use different mechanisms like fine tuning the model by freezing and unfreezing layers of the CNN, adjusting the learning rate, and playing with the batch size when training the model.

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=1.0/255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

val_test_datagen = ImageDataGenerator(rescale=1.0/255)

train_generator = train_datagen.flow_from_directory(
    '/content/kidney_ct_scan_split/train',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

val_generator = val_test_datagen.flow_from_directory(
    '/content/kidney_ct_scan_split/val',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)


Found 1491 images belonging to 4 classes.
Found 497 images belonging to 4 classes.


In [None]:
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten

base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

model = Sequential([
    base_model,
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.4),
    Dense(4, activation='softmax')  # Adjust for the number of classes
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224_no_top.h5
[1m9406464/9406464[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [None]:
from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=50,
    callbacks=[early_stopping]
)


Epoch 1/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m344s[0m 7s/step - accuracy: 0.5686 - loss: 1.5282 - val_accuracy: 0.2998 - val_loss: 37.5389
Epoch 2/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m382s[0m 7s/step - accuracy: 0.7315 - loss: 0.7756 - val_accuracy: 0.3642 - val_loss: 27.9961
Epoch 3/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m332s[0m 7s/step - accuracy: 0.8249 - loss: 0.5270 - val_accuracy: 0.5292 - val_loss: 9.0436
Epoch 4/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m389s[0m 7s/step - accuracy: 0.8099 - loss: 0.6623 - val_accuracy: 0.5231 - val_loss: 17.0320
Epoch 5/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m399s[0m 8s/step - accuracy: 0.8453 - loss: 0.4803 - val_accuracy: 0.6117 - val_loss: 8.8570
Epoch 6/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m357s[0m 7s/step - accuracy: 0.8799 - loss: 0.3994 - val_accuracy: 0.6157 - val_loss: 6.8158
Epoch 7/50
[1m47/47[0m [32m━

In [None]:
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np

test_generator = val_test_datagen.flow_from_directory(
    '/content/kidney_ct_scan_split/test',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical',
    shuffle=False
)

predictions = model.predict(test_generator)
y_pred = np.argmax(predictions, axis=1)
y_true = test_generator.classes

print(classification_report(y_true, y_pred, target_names=test_generator.class_indices.keys()))
print(confusion_matrix(y_true, y_pred))


Found 499 images belonging to 4 classes.
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m29s[0m 1s/step
              precision    recall  f1-score   support

        Cyst       0.78      0.54      0.64       149
      Normal       0.59      1.00      0.74       203
       Stone       0.26      0.24      0.25        55
       Tumor       1.00      0.03      0.06        92

    accuracy                           0.60       499
   macro avg       0.66      0.45      0.42       499
weighted avg       0.69      0.60      0.53       499

[[ 80  32  37   0]
 [  0 203   0   0]
 [  9  33  13   0]
 [ 13  76   0   3]]


In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, MaxPooling2D, Flatten, Dropout

model = Sequential([
    Conv2D(32, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Dropout(0.3),  # Add dropout to the first convolutional block

    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Dropout(0.4),  # Add dropout to the second convolutional block

    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),  # Add dropout before the final Dense layer
    Dense(4, activation='softmax')  # 4 classes
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


In [None]:
from tensorflow.keras.regularizers import l2

model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
    MaxPooling2D((2, 2)),
    Dropout(0.3),

    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Dropout(0.4),

    Flatten(),
    Dense(128, activation='relu', kernel_regularizer=l2(0.01)),  # Add L2 regularization
    Dropout(0.5),
    Dense(4, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])


In [None]:
from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=5,  # Stop training if val_loss doesn't improve for 5 epochs
    restore_best_weights=True
)

history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=50,
    callbacks=[early_stopping]
)


Epoch 1/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m191s[0m 4s/step - accuracy: 0.3567 - loss: 5.4755 - val_accuracy: 0.5855 - val_loss: 2.0772
Epoch 2/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m192s[0m 4s/step - accuracy: 0.5648 - loss: 1.8157 - val_accuracy: 0.5654 - val_loss: 1.4822
Epoch 3/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m186s[0m 4s/step - accuracy: 0.6053 - loss: 1.3790 - val_accuracy: 0.5674 - val_loss: 1.3540
Epoch 4/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m194s[0m 4s/step - accuracy: 0.6264 - loss: 1.2413 - val_accuracy: 0.5855 - val_loss: 1.2159
Epoch 5/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m201s[0m 4s/step - accuracy: 0.6169 - loss: 1.2391 - val_accuracy: 0.5795 - val_loss: 1.2722
Epoch 6/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m176s[0m 4s/step - accuracy: 0.6140 - loss: 1.2423 - val_accuracy: 0.6016 - val_loss: 1.1831
Epoch 7/50
[1m47/47[0m [32m━━━━

In [None]:
from sklearn.utils.class_weight import compute_class_weight
import numpy as np

# Compute class weights
class_weights = compute_class_weight(
    class_weight='balanced',
    classes=np.unique(train_generator.classes),
    y=train_generator.classes
)
class_weights = dict(enumerate(class_weights))

# Apply class weights during training
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=50,
    class_weight=class_weights,  # Apply class weights here
    callbacks=[early_stopping]
)


Epoch 1/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m179s[0m 4s/step - accuracy: 0.6139 - loss: 1.4357 - val_accuracy: 0.6459 - val_loss: 1.2921
Epoch 2/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m204s[0m 4s/step - accuracy: 0.6092 - loss: 1.4028 - val_accuracy: 0.6137 - val_loss: 1.2489
Epoch 3/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m202s[0m 4s/step - accuracy: 0.6100 - loss: 1.3181 - val_accuracy: 0.6801 - val_loss: 1.1461
Epoch 4/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m200s[0m 4s/step - accuracy: 0.6164 - loss: 1.2927 - val_accuracy: 0.6720 - val_loss: 1.1638
Epoch 5/50
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m203s[0m 4s/step - accuracy: 0.6264 - loss: 1.3394 - val_accuracy: 0.6781 - val_loss: 1.1776


In [None]:
# Reinitialize train and validation generators with smaller batch size
train_generator = train_datagen.flow_from_directory(
    '/content/kidney_ct_scan_split/train',
    target_size=(224, 224),
    batch_size=16,  # Reduced batch size
    class_mode='categorical'
)

val_generator = val_test_datagen.flow_from_directory(
    '/content/kidney_ct_scan_split/val',
    target_size=(224, 224),
    batch_size=16,  # Match the batch size for validation
    class_mode='categorical'
)

# Retrain the model
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=50,
    class_weight=class_weights,  # Retain class weights
    callbacks=[early_stopping]
)


Found 1491 images belonging to 4 classes.
Found 497 images belonging to 4 classes.
Epoch 1/50


  self._warn_if_super_not_called()


[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m203s[0m 2s/step - accuracy: 0.6216 - loss: 1.4020 - val_accuracy: 0.6378 - val_loss: 1.2704
Epoch 2/50
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m264s[0m 2s/step - accuracy: 0.6219 - loss: 1.3724 - val_accuracy: 0.6559 - val_loss: 1.2540
Epoch 3/50
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m267s[0m 2s/step - accuracy: 0.6034 - loss: 1.3803 - val_accuracy: 0.6318 - val_loss: 1.3018
Epoch 4/50
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m202s[0m 2s/step - accuracy: 0.5858 - loss: 1.3729 - val_accuracy: 0.5734 - val_loss: 1.3145
Epoch 5/50
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m202s[0m 2s/step - accuracy: 0.5651 - loss: 1.3776 - val_accuracy: 0.6459 - val_loss: 1.2020


In [None]:
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ReduceLROnPlateau

# Use a learning rate scheduler to reduce the learning rate on a plateau
reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,  # Reduce learning rate by half
    patience=3,  # Wait 3 epochs before reducing
    min_lr=1e-6  # Minimum learning rate
)

# Recompile the model with an initial lower learning rate
model.compile(
    optimizer=Adam(learning_rate=1e-4),  # Start with a lower learning rate
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Retrain the model
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=50,
    class_weight=class_weights,
    callbacks=[early_stopping, reduce_lr]
)


Epoch 1/50
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m206s[0m 2s/step - accuracy: 0.6086 - loss: 1.3503 - val_accuracy: 0.6459 - val_loss: 1.1619 - learning_rate: 1.0000e-04
Epoch 2/50
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m202s[0m 2s/step - accuracy: 0.6165 - loss: 1.3390 - val_accuracy: 0.6439 - val_loss: 1.1311 - learning_rate: 1.0000e-04
Epoch 3/50
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m204s[0m 2s/step - accuracy: 0.6462 - loss: 1.2619 - val_accuracy: 0.6338 - val_loss: 1.1025 - learning_rate: 1.0000e-04
Epoch 4/50
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m268s[0m 2s/step - accuracy: 0.6607 - loss: 1.2102 - val_accuracy: 0.6559 - val_loss: 1.0391 - learning_rate: 1.0000e-04
Epoch 5/50
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m206s[0m 2s/step - accuracy: 0.6408 - loss: 1.2103 - val_accuracy: 0.6439 - val_loss: 1.1316 - learning_rate: 1.0000e-04
Epoch 6/50
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m

In [None]:
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.optimizers import Adam

# Load MobileNetV2 with pretrained weights
base_model = MobileNetV2(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freeze all layers in the base model initially
base_model.trainable = False

# Add custom layers on top
model = Sequential([
    base_model,
    Flatten(),
    Dense(128, activation='relu'),
    Dropout(0.5),
    Dense(4, activation='softmax')  # 4 output classes
])

# Compile the model
model.compile(optimizer=Adam(learning_rate=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])


In [None]:
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=10,  # Train for a few epochs with frozen base layers
    class_weight=class_weights,
    callbacks=[early_stopping]
)


Epoch 1/10
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m120s[0m 1s/step - accuracy: 0.3480 - loss: 2.5266 - val_accuracy: 0.6056 - val_loss: 0.9933
Epoch 2/10
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m100s[0m 1s/step - accuracy: 0.4996 - loss: 1.2011 - val_accuracy: 0.7243 - val_loss: 0.8214
Epoch 3/10
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 1s/step - accuracy: 0.5178 - loss: 1.0681 - val_accuracy: 0.6197 - val_loss: 0.9104
Epoch 4/10
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 1s/step - accuracy: 0.5753 - loss: 0.9569 - val_accuracy: 0.6700 - val_loss: 0.7786
Epoch 5/10
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m97s[0m 1s/step - accuracy: 0.6065 - loss: 0.9025 - val_accuracy: 0.7505 - val_loss: 0.7126
Epoch 6/10
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m96s[0m 1s/step - accuracy: 0.6310 - loss: 0.8630 - val_accuracy: 0.7404 - val_loss: 0.6532
Epoch 7/10
[1m94/94[0m [32m━━━━━━

In [None]:
# Unfreeze the last few layers of the base model
base_model.trainable = True
for layer in base_model.layers[:-50]:  # Freeze all layers except the last 50
    layer.trainable = False


In [None]:
model.compile(optimizer=Adam(learning_rate=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])


In [None]:
history_fine_tune = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=20,  # Fine-tune for more epochs
    class_weight=class_weights,
    callbacks=[early_stopping, reduce_lr]  # Use early stopping and learning rate scheduler
)


Epoch 1/20
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m158s[0m 1s/step - accuracy: 0.5276 - loss: 1.9134 - val_accuracy: 0.7787 - val_loss: 0.5453 - learning_rate: 1.0000e-05
Epoch 2/20
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m134s[0m 1s/step - accuracy: 0.6271 - loss: 0.9632 - val_accuracy: 0.7324 - val_loss: 0.6372 - learning_rate: 1.0000e-05
Epoch 3/20
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m133s[0m 1s/step - accuracy: 0.6711 - loss: 0.7757 - val_accuracy: 0.7485 - val_loss: 0.6195 - learning_rate: 1.0000e-05
Epoch 4/20
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m136s[0m 1s/step - accuracy: 0.6901 - loss: 0.7635 - val_accuracy: 0.7586 - val_loss: 0.5501 - learning_rate: 1.0000e-05
Epoch 5/20
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m143s[0m 1s/step - accuracy: 0.6894 - loss: 0.7141 - val_accuracy: 0.7746 - val_loss: 0.5354 - learning_rate: 5.0000e-06


In [None]:
# Evaluate the fine-tuned model
loss, accuracy = model.evaluate(test_generator)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")


[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m22s[0m 1s/step - accuracy: 0.8104 - loss: 0.4673
Test Loss: 0.5305715203285217, Test Accuracy: 0.8116232752799988


In [None]:
# Unfreeze more layers in the base model
base_model.trainable = True
for layer in base_model.layers[:-100]:  # Freeze all layers except the last 100
    layer.trainable = False

# Recompile with an even smaller learning rate
model.compile(optimizer=Adam(learning_rate=1e-6), loss='categorical_crossentropy', metrics=['accuracy'])

# Retrain the model
history_fine_tune_more = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=20,
    class_weight=class_weights,
    callbacks=[early_stopping, reduce_lr]
)


Epoch 1/20
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m153s[0m 1s/step - accuracy: 0.6171 - loss: 1.0209 - val_accuracy: 0.7807 - val_loss: 0.5392 - learning_rate: 1.0000e-06
Epoch 2/20
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m140s[0m 1s/step - accuracy: 0.6529 - loss: 0.8604 - val_accuracy: 0.7847 - val_loss: 0.5373 - learning_rate: 1.0000e-06
Epoch 3/20
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m148s[0m 1s/step - accuracy: 0.6512 - loss: 0.8632 - val_accuracy: 0.7907 - val_loss: 0.5327 - learning_rate: 1.0000e-06
Epoch 4/20
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m137s[0m 1s/step - accuracy: 0.6512 - loss: 0.8482 - val_accuracy: 0.7847 - val_loss: 0.5274 - learning_rate: 1.0000e-06
Epoch 5/20
[1m94/94[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m142s[0m 1s/step - accuracy: 0.6459 - loss: 0.9144 - val_accuracy: 0.7887 - val_loss: 0.5292 - learning_rate: 1.0000e-06


In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Recreate the ImageDataGenerator
test_datagen = ImageDataGenerator(rescale=1.0/255.0)

# Path to your test dataset directory
test_dir = '/content/kidney_ct_scan_split/test'

# Reinitialize test_generator
test_generator = test_datagen.flow_from_directory(
    test_dir,
    target_size=(224, 224),  # Same size used during training
    batch_size=16,          # Use the batch size you had before
    class_mode='categorical',
    shuffle=False           # Important for evaluation
)


Found 499 images belonging to 4 classes.


In [None]:
loss, accuracy = model.evaluate(test_generator)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")


  self._warn_if_super_not_called()


[1m32/32[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 40ms/step - accuracy: 0.8084 - loss: 0.4452
Test Loss: 0.5066458582878113, Test Accuracy: 0.797595202922821


In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=1.0 / 255.0,
    rotation_range=40,
    width_shift_range=0.3,
    height_shift_range=0.3,
    shear_range=0.3,
    zoom_range=0.3,
    horizontal_flip=True,
    brightness_range=[0.8, 1.2],
    fill_mode='nearest'
)

train_generator = train_datagen.flow_from_directory(
    '/content/kidney_ct_scan_split/train',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)


Found 1491 images belonging to 4 classes.


In [None]:
val_datagen = ImageDataGenerator(rescale=1.0 / 255.0)
val_generator = val_datagen.flow_from_directory(
    '/content/kidney_ct_scan_split/val',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

test_datagen = ImageDataGenerator(rescale=1.0 / 255.0)
test_generator = test_datagen.flow_from_directory(
    '/content/kidney_ct_scan_split/test',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical',
    shuffle=False
)


Found 497 images belonging to 4 classes.
Found 499 images belonging to 4 classes.


In [None]:
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

# Early Stopping
early_stopping = EarlyStopping(
    monitor='val_loss',  # Monitor validation loss
    patience=5,          # Stop training after 5 epochs of no improvement
    restore_best_weights=True  # Restore weights from the best epoch
)

# Learning Rate Reduction
reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',  # Monitor validation loss
    factor=0.1,          # Reduce learning rate by a factor of 10
    patience=3,          # Wait 3 epochs of no improvement before reducing
    min_lr=1e-7,         # Set a minimum learning rate
    verbose=1
)


In [None]:
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=20,
    callbacks=[early_stopping, reduce_lr]
)


Epoch 1/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m28s[0m 365ms/step - accuracy: 0.6136 - loss: 0.9929 - val_accuracy: 0.7847 - val_loss: 0.5527 - learning_rate: 1.0000e-06
Epoch 2/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 349ms/step - accuracy: 0.5388 - loss: 1.1154 - val_accuracy: 0.8008 - val_loss: 0.5429 - learning_rate: 1.0000e-06
Epoch 3/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 346ms/step - accuracy: 0.5687 - loss: 1.1009 - val_accuracy: 0.8008 - val_loss: 0.5359 - learning_rate: 1.0000e-06
Epoch 4/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 347ms/step - accuracy: 0.5837 - loss: 1.0354 - val_accuracy: 0.8048 - val_loss: 0.5347 - learning_rate: 1.0000e-06
Epoch 5/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 349ms/step - accuracy: 0.6314 - loss: 0.9461 - val_accuracy: 0.8008 - val_loss: 0.5318 - learning_rate: 1.0000e-06
Epoch 6/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━

In [None]:
# Training generator
train_generator = train_datagen.flow_from_directory(
    'kidney_ct_scan_split/train',  # Update with your training dataset path
    target_size=(224, 224),
    batch_size=32,  # Increased batch size
    class_mode='categorical'
)

# Validation generator
val_generator = val_datagen.flow_from_directory(
    'kidney_ct_scan_split/val',  # Update with your validation dataset path
    target_size=(224, 224),
    batch_size=32,  # Match batch size to training
    class_mode='categorical'
)

# Test generator
test_generator = test_datagen.flow_from_directory(
    'kidney_ct_scan_split/test',  # Update with your test dataset path
    target_size=(224, 224),
    batch_size=32,  # Match batch size to training
    class_mode='categorical'
)


Found 1491 images belonging to 4 classes.
Found 497 images belonging to 4 classes.
Found 499 images belonging to 4 classes.


In [None]:
history = model.fit(
    train_generator,
    validation_data=val_generator,
    epochs=20,  # Adjust as needed
    callbacks=[early_stopping, reduce_lr]  # Include your existing callbacks
)


Epoch 1/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 349ms/step - accuracy: 0.6264 - loss: 0.9080 - val_accuracy: 0.7988 - val_loss: 0.5106 - learning_rate: 1.0000e-06
Epoch 2/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 345ms/step - accuracy: 0.6290 - loss: 0.8667 - val_accuracy: 0.8008 - val_loss: 0.5102 - learning_rate: 1.0000e-06
Epoch 3/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 349ms/step - accuracy: 0.6546 - loss: 0.8828 - val_accuracy: 0.7988 - val_loss: 0.5102 - learning_rate: 1.0000e-06
Epoch 4/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 343ms/step - accuracy: 0.6514 - loss: 0.8484 - val_accuracy: 0.7988 - val_loss: 0.5095 - learning_rate: 1.0000e-06
Epoch 5/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 344ms/step - accuracy: 0.6288 - loss: 0.8807 - val_accuracy: 0.8028 - val_loss: 0.5086 - learning_rate: 1.0000e-06
Epoch 6/20
[1m47/47[0m [32m━━━━━━━━━━━━━━━

In [None]:
loss, accuracy = model.evaluate(test_generator)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")


[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 54ms/step - accuracy: 0.8733 - loss: 0.4248
Test Loss: 0.4314420819282532, Test Accuracy: 0.8496993780136108


In [None]:
from tensorflow.keras.callbacks import ReduceLROnPlateau

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3, min_lr=1e-6)


In [None]:
# Training generator
train_generator = train_datagen.flow_from_directory(
    'kidney_ct_scan_split/train',  # Update with your training dataset path
    target_size=(224, 224),
    batch_size=64,  # Increased batch size
    class_mode='categorical'
)

# Validation generator
val_generator = val_datagen.flow_from_directory(
    'kidney_ct_scan_split/val',  # Update with your validation dataset path
    target_size=(224, 224),
    batch_size=64,  # Match batch size to training
    class_mode='categorical'
)

# Test generator
test_generator = test_datagen.flow_from_directory(
    'kidney_ct_scan_split/test',  # Update with your test dataset path
    target_size=(224, 224),
    batch_size=64,  # Match batch size to training
    class_mode='categorical'
)

Found 1491 images belonging to 4 classes.
Found 497 images belonging to 4 classes.
Found 499 images belonging to 4 classes.


In [None]:
from tensorflow.keras.optimizers import Adam

model.compile(optimizer=Adam(learning_rate=1e-4),  # Start with a higher learning rate
              loss='categorical_crossentropy',
              metrics=['accuracy'])


In [None]:
from tensorflow.keras.callbacks import EarlyStopping

early_stopping = EarlyStopping(monitor='val_loss',
                               patience=5,
                               restore_best_weights=True)

history = model.fit(train_generator,
                    validation_data=val_generator,
                    epochs=20,
                    callbacks=[reduce_lr, early_stopping])


Epoch 1/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m58s[0m 1s/step - accuracy: 0.6795 - loss: 0.8249 - val_accuracy: 0.7928 - val_loss: 0.5880 - learning_rate: 1.0000e-04
Epoch 2/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 567ms/step - accuracy: 0.7336 - loss: 0.7167 - val_accuracy: 0.8189 - val_loss: 0.4629 - learning_rate: 1.0000e-04
Epoch 3/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 591ms/step - accuracy: 0.7761 - loss: 0.5866 - val_accuracy: 0.8551 - val_loss: 0.3750 - learning_rate: 1.0000e-04
Epoch 4/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 578ms/step - accuracy: 0.7893 - loss: 0.5377 - val_accuracy: 0.8672 - val_loss: 0.3982 - learning_rate: 1.0000e-04
Epoch 5/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 563ms/step - accuracy: 0.8344 - loss: 0.4394 - val_accuracy: 0.8833 - val_loss: 0.3378 - learning_rate: 1.0000e-04
Epoch 6/20
[1m24/24[0m [32m━━━━━━━━━━━━━━━━━━

## 4. Model Evaluation

I now evaluate my model using my test set. I get a test loss of 0.22 and an accuracy of around 93 percent.

In [None]:
loss, accuracy = model.evaluate(test_generator)
print(f"Test Loss: {loss}, Test Accuracy: {accuracy}")


[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 88ms/step - accuracy: 0.9385 - loss: 0.1872
Test Loss: 0.22191764414310455, Test Accuracy: 0.9278557300567627


Here, I save the model.

In [None]:
import os

# Define the directory where you want to save the model
save_dir = '/content/saved_model'
os.makedirs(save_dir, exist_ok=True)  # Create the directory if it doesn't exist

# Save the model with the .keras extension
model.save(os.path.join(save_dir, 'my_model.keras'))




In [None]:
from google.colab import files

# Download the saved model
files.download('/content/saved_model/my_model.keras')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>