# AI-Driven X-ray Diagnosis & Report Generator
### Objective: Detect chest abnormalities in X-rays and auto-generate diagnostic reports.
* Use EfficientNet or Vision Transformers (ViT) trained on ChestX-ray14 or MIMIC-CXR datasets.
* Implement Grad-CAM++ or SHAP for visual explainability.
* Integrate Gemini Pro / GPT-4V for radiology report generation using visual and prediction context

In [91]:
%pip install tensorflow
%pip install tensorflow_hub





[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Collecting tensorflow_hubNote: you may need to restart the kernel to use updated packages.

  Downloading tensorflow_hub-0.16.1-py2.py3-none-any.whl.metadata (1.3 kB)
Collecting tf-keras>=2.14.1 (from tensorflow_hub)
  Downloading tf_keras-2.19.0-py3-none-any.whl.metadata (1.8 kB)
Collecting tensorflow<2.20,>=2.19 (from tf-keras>=2.14.1->tensorflow_hub)
  Downloading tensorflow-2.19.0-cp311-cp311-win_amd64.whl.metadata (4.1 kB)
Collecting tensorboard~=2.19.0 (from tensorflow<2.20,>=2.19->tf-keras>=2.14.1->tensorflow_hub)
  Downloading tensorboard-2.19.0-py3-none-any.whl.metadata (1.8 kB)
Collecting ml-dtypes<1.0.0,>=0.5.1 (from tensorflow<2.20,>=2.19->tf-keras>=2.14.1->tensorflow_hub)
  Downloading ml_dtypes-0.5.1-cp311-cp311-win_amd64.whl.metadata (22 kB)
Downloading tensorflow_hub-0.16.1-py2.py3-none-any.whl (30 kB)
Downloading tf_keras-2.19.0-py3-none-any.whl (1.7 MB)
   ---------------------------------------- 0.0/1.7 MB ? eta -:--:--
   ------------ --------------------------- 0.5

  You can safely remove it manually.
ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'c:\\Users\\subah\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\tensorflow\\compiler\\tf2tensorrt\\_pywrap_py_utils.pyd'
Consider using the `--user` option or check the permissions.


[notice] A new release of pip is available: 24.3.1 -> 25.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [93]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from glob import glob
import os

In [71]:


# Define the root directory where images are stored
image_root_dir = r"C:\Users\subah\Downloads\XRAYIMAGES"

# Load the CSV file
data = pd.read_csv('Data_Entry_2017.csv')

# Remove invalid ages
data = data[data['Patient Age'] < 100] 

# Collect image paths from both "images1" and "images2"
data_image_paths = {os.path.basename(x): x for x in 
                    glob(os.path.join(image_root_dir, 'images*', '*.png'))}

# Print number of images found
print('Scans found:', len(data_image_paths), ', Total Headers:', data.shape[0])

# Map the image paths to the dataframe
data['path'] = data['Image Index'].map(data_image_paths.get)

# Convert age to integer
data['Patient Age'] = data['Patient Age'].astype(int)

# Print sample rows
data.sample(3)

Scans found: 25066 , Total Headers: 112104


Unnamed: 0,Image Index,Finding Labels,Follow-up #,Patient ID,Patient Age,Patient Gender,View Position,OriginalImage[Width,Height],OriginalImagePixelSpacing[x,y],Unnamed: 11,path
23645,00006260_006.png,No Finding,6,6260,60,M,AP,3056,2544,0.139,0.139,,C:\Users\subah\Downloads\XRAYIMAGES\images03\0...
78000,00019150_018.png,No Finding,18,19150,67,M,PA,2992,2991,0.143,0.143,,
106822,00028832_000.png,No Finding,0,28832,27,M,PA,2021,2021,0.194311,0.194311,,


Total number of x ray diseases per images

In [72]:
data['Finding Labels'].value_counts()

Finding Labels
No Finding                                                         60353
Infiltration                                                        9546
Atelectasis                                                         4214
Effusion                                                            3955
Nodule                                                              2705
                                                                   ...  
Atelectasis|Consolidation|Edema|Effusion|Infiltration|Pneumonia        1
Atelectasis|Consolidation|Effusion|Emphysema|Mass|Pneumothorax         1
Cardiomegaly|Effusion|Pleural_Thickening|Pneumothorax                  1
Edema|Infiltration|Pneumothorax                                        1
Atelectasis|Consolidation|Mass|Pleural_Thickening|Pneumothorax         1
Name: count, Length: 836, dtype: int64

In [73]:
found_files_data = data[data['path'].notnull()]
found_files_data['Finding Labels'].value_counts()


Finding Labels
No Finding                                                    14405
Infiltration                                                   1663
Atelectasis                                                    1014
Effusion                                                        837
Nodule                                                          664
                                                              ...  
Effusion|Fibrosis|Nodule|Pleural_Thickening                       1
Edema|Effusion|Infiltration|Pleural_Thickening                    1
Consolidation|Infiltration|Pleural_Thickening|Pneumothorax        1
Effusion|Pleural_Thickening|Pneumonia                             1
Cardiomegaly|Infiltration|Nodule                                  1
Name: count, Length: 404, dtype: int64

In [74]:
# Total number of images 
found_files_data['Finding Labels'].value_counts().sum()

25065

In [75]:
data = data[data['Finding Labels'] != 'No Finding']

In [76]:
found_files_data1 = data[data['path'].notnull()]
found_files_data1['Finding Labels'].value_counts()

Finding Labels
Infiltration                                                  1663
Atelectasis                                                   1014
Effusion                                                       837
Nodule                                                         664
Mass                                                           458
                                                              ... 
Effusion|Fibrosis|Nodule|Pleural_Thickening                      1
Edema|Effusion|Infiltration|Pleural_Thickening                   1
Consolidation|Infiltration|Pleural_Thickening|Pneumothorax       1
Effusion|Pleural_Thickening|Pneumonia                            1
Cardiomegaly|Infiltration|Nodule                                 1
Name: count, Length: 403, dtype: int64

In [77]:
found_files_data1['Finding Labels'].value_counts().sum()

10660

In [78]:
found_files_data1.head(10)

Unnamed: 0,Image Index,Finding Labels,Follow-up #,Patient ID,Patient Age,Patient Gender,View Position,OriginalImage[Width,Height],OriginalImagePixelSpacing[x,y],Unnamed: 11,path
0,00000001_000.png,Cardiomegaly,0,1,58,M,PA,2682,2749,0.143,0.143,,C:\Users\subah\Downloads\XRAYIMAGES\images01\0...
1,00000001_001.png,Cardiomegaly|Emphysema,1,1,58,M,PA,2894,2729,0.143,0.143,,C:\Users\subah\Downloads\XRAYIMAGES\images01\0...
2,00000001_002.png,Cardiomegaly|Effusion,2,1,58,M,PA,2500,2048,0.168,0.168,,C:\Users\subah\Downloads\XRAYIMAGES\images01\0...
4,00000003_000.png,Hernia,0,3,81,F,PA,2582,2991,0.143,0.143,,C:\Users\subah\Downloads\XRAYIMAGES\images01\0...
5,00000003_001.png,Hernia,1,3,74,F,PA,2500,2048,0.168,0.168,,C:\Users\subah\Downloads\XRAYIMAGES\images01\0...
6,00000003_002.png,Hernia,2,3,75,F,PA,2048,2500,0.168,0.168,,C:\Users\subah\Downloads\XRAYIMAGES\images01\0...
7,00000003_003.png,Hernia|Infiltration,3,3,76,F,PA,2698,2991,0.143,0.143,,C:\Users\subah\Downloads\XRAYIMAGES\images01\0...
8,00000003_004.png,Hernia,4,3,77,F,PA,2500,2048,0.168,0.168,,C:\Users\subah\Downloads\XRAYIMAGES\images01\0...
9,00000003_005.png,Hernia,5,3,78,F,PA,2686,2991,0.143,0.143,,C:\Users\subah\Downloads\XRAYIMAGES\images01\0...
10,00000003_006.png,Hernia,6,3,79,F,PA,2992,2991,0.143,0.143,,C:\Users\subah\Downloads\XRAYIMAGES\images01\0...


In [79]:
found_files_data1['Finding Labels'].unique()

array(['Cardiomegaly', 'Cardiomegaly|Emphysema', 'Cardiomegaly|Effusion',
       'Hernia', 'Hernia|Infiltration', 'Mass|Nodule', 'Infiltration',
       'Effusion|Infiltration', 'Nodule', 'Emphysema', 'Effusion',
       'Atelectasis', 'Effusion|Mass', 'Emphysema|Pneumothorax',
       'Pleural_Thickening',
       'Effusion|Emphysema|Infiltration|Pneumothorax',
       'Emphysema|Infiltration|Pleural_Thickening|Pneumothorax',
       'Effusion|Pneumonia|Pneumothorax', 'Pneumothorax',
       'Effusion|Infiltration|Pneumothorax', 'Infiltration|Mass',
       'Infiltration|Mass|Pneumothorax', 'Mass',
       'Cardiomegaly|Infiltration|Mass|Nodule',
       'Cardiomegaly|Effusion|Emphysema|Mass',
       'Atelectasis|Cardiomegaly|Emphysema|Mass|Pneumothorax',
       'Emphysema|Mass', 'Emphysema|Mass|Pneumothorax',
       'Atelectasis|Pneumothorax', 'Cardiomegaly|Emphysema|Pneumothorax',
       'Mass|Pleural_Thickening', 'Cardiomegaly|Mass|Pleural_Thickening',
       'Effusion|Infiltration|Nodule',


Since we have a mix diseases in the same image we are going to explode the `Finding Labels` coloumn as rows. This way the model we are about to train would not be confused.

In [80]:
# Filling the null values with blank string and splitting the labels into a list
# This is important for the next step where we will be using the labels for training
print("Before: ",found_files_data1['Finding Labels'].isna().sum())  # Count NaN values
found_files_data1['Finding Labels'] = found_files_data1['Finding Labels'].fillna("")


Before:  0


In [81]:
print("After: ",found_files_data1['Finding Labels'].isna().sum())


After:  0


In [82]:
found_files_data1['Finding Labels'] = found_files_data1['Finding Labels'].astype(str).str.split("|")

In [83]:
df = found_files_data1.explode('Finding Labels')
df['Finding Labels'].unique()

array(['Cardiomegaly', 'Emphysema', 'Effusion', 'Hernia', 'Infiltration',
       'Mass', 'Nodule', 'Atelectasis', 'Pneumothorax',
       'Pleural_Thickening', 'Pneumonia', 'Fibrosis', 'Edema',
       'Consolidation'], dtype=object)

Dataset is imbalanced. It wouldn't be good for long run as the model can be baised towards majority class so we can perform data augmentation.

In [84]:
df['Finding Labels'].value_counts()

Finding Labels
Infiltration          3625
Effusion              2476
Atelectasis           2461
Nodule                1351
Mass                  1024
Pneumothorax           935
Consolidation          889
Cardiomegaly           795
Pleural_Thickening     715
Emphysema              533
Fibrosis               514
Edema                  391
Pneumonia              266
Hernia                  68
Name: count, dtype: int64