# Detecting COVID-19 with Chest X Ray using PyTorch

Image classification of Chest X Rays in one of three classes: Normal, Viral Pneumonia, COVID-19

Dataset from [COVID-19 Radiography Dataset](https://www.kaggle.com/tawsifurrahman/covid19-radiography-database) on Kaggle

Note on how to download dataset from Kaggle to Google Colab: 
[https://www.kaggle.com/general/74235](https://www.kaggle.com/general/74235)

Check also kaggle API: [https://www.kaggle.com/docs/api](https://www.kaggle.com/docs/api)


In [None]:
! pip install -q kaggle

In [None]:
from google.colab import files

In [None]:
files.upload()

Saving kaggle.json to kaggle.json


{'kaggle.json': b'{"username":"michamaciejewski","key":"57875e7b8338d22ffc4cc367ee4816b0"}'}

In [None]:
! mkdir ~/.kaggle

In [None]:
! cp kaggle.json ~/.kaggle/

In [None]:
! chmod 600 ~/.kaggle/kaggle.json

In [None]:
! kaggle datasets list -s 'COVID'

ref                                                title                                                size  lastUpdated          downloadCount  
-------------------------------------------------  --------------------------------------------------  -----  -------------------  -------------  
imdevskp/corona-virus-report                       COVID-19 Dataset                                     19MB  2020-08-07 03:47:47          75976  
sudalairajkumar/covid19-in-india                   COVID-19 in India                                   147KB  2020-10-07 08:19:31          79812  
roche-data-science-coalition/uncover               UNCOVER COVID-19 Challenge                          256MB  2020-08-10 14:53:45          18331  
andrewmvd/covid19-ct-scans                         COVID-19 CT scans                                     1GB  2020-04-23 12:29:33           4785  
kimjihoo/coronavirusdataset                        Data Science for COVID-19 (DS4C)                      7MB  2020-07-

In [None]:
! kaggle datasets download -d 'tawsifurrahman/covid19-radiography-database'

Downloading covid19-radiography-database.zip to /content
100% 1.15G/1.15G [00:18<00:00, 56.3MB/s]
100% 1.15G/1.15G [00:18<00:00, 67.2MB/s]


In [None]:
! ls

covid19-radiography-database.zip  dataset  drive  kaggle.json  sample_data


In [None]:
! mkdir dataset

In [None]:
! unzip covid19-radiography-database.zip -d dataset

Archive:  covid19-radiography-database.zip
  inflating: dataset/COVID-19 Radiography Database/COVID-19.metadata.xlsx  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (1).png  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (10).png  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (100).png  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (101).png  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (102).png  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (103).png  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (104).png  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (105).png  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (106).png  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (107).png  
  inflating: dataset/COVID-19 Radiography Database/COVID-19/COVID-19 (108)

In [None]:
!mkdir drive/My\ Drive/dataset/

In [None]:
!cp -r dataset drive/My\ Drive/

# Importing Libraries

In [None]:
import os
import shutil
import random

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
root_dir = 'drive/My Drive/dataset/COVID-19 Radiography Database'
source_dirs = ['NORMAL', 'Viral Pneumonia', 'COVID-19']
class_names = ['normal', 'viral', 'covid']

In [None]:
if os.path.isdir(os.path.join(root_dir, source_dirs[1])):

  for i, d in enumerate(source_dirs):
      os.rename(os.path.join(root_dir, d), os.path.join(root_dir, class_names[i]))

# Preparing Training and Test Sets

In [None]:
if os.path.exists(os.path.join(root_dir, 'dev')):
    shutil.rmtree(os.path.join(root_dir, 'dev'))
os.mkdir(os.path.join(root_dir, 'dev'))
if os.path.exists(os.path.join(root_dir, 'test')):
    shutil.rmtree(os.path.join(root_dir, 'test'))
os.mkdir(os.path.join(root_dir, 'test'))

for c in class_names:
    os.mkdir(os.path.join(root_dir, 'dev', c))

for c in class_names:
    os.mkdir(os.path.join(root_dir, 'test', c))

for c in class_names:
    images = [x for x in os.listdir(os.path.join(root_dir, c)) if x.lower().endswith('png')]
    # take 30 images for the dev set
    selected_images_dev = random.sample(images, 30)
    for image_dev in selected_images_dev:
        source_path = os.path.join(root_dir, c, image_dev)
        target_path = os.path.join(root_dir, 'dev', c, image_dev)
        shutil.move(source_path, target_path)
    
    # take 30 images for the test set
    selected_images_test = random.sample(list(set(images)-set(selected_images_dev)), 30)
    
    for image_test in selected_images_test:
        source_path = os.path.join(root_dir, c, image_test)
        target_path = os.path.join(root_dir, 'test', c, image_test)
        shutil.move(source_path, target_path)
