# Predict Covid-19 in X-ray images

## Part 1: prepare the dataset

### Step 1: get the images with confirmed covid19
Download and extract the [dataset](https://github.com/ieee8023/covid-chestxray-dataset) already prepared by [Dr. Joseph Cohen](https://josephpcohen.com/w/).

In [1]:
import pandas as pd
import pathlib
import shutil
import os

In [2]:
covid_ds_path = '/home/henriklg/Downloads/covid-chestxray-dataset-master'
covid_output_path = './dataset/covid'

# Check if output folder exist, if not, create it
directory = pathlib.Path(covid_output_path)
if not directory.exists():
        os.makedirs(covid_output_path)

In [3]:
# construct the path to the metadata CSV file and load it
csvPath = os.path.sep.join([covid_ds_path, "metadata.csv"])
df = pd.read_csv(csvPath)
covid_count = 0

# loop over the rows of the COVID-19 data frame
for (i, row) in df.iterrows():
    # if (1) the current case is not COVID-19 or (2) this is not
    # a 'PA' view, then ignore the row
    if row["finding"] != "COVID-19" or row["view"] != "PA":
        continue

    # build the path to the input image file
    imagePath = os.path.sep.join([covid_ds_path, "images", row["filename"]])

    # if the input image file does not exist (there are some errors in
    # the COVID-19 metadeta file), ignore the row
    if not os.path.exists(imagePath):
        continue

    # extract the filename from the image path and then construct the
    # path to the copied image file
    filename = row["filename"].split(os.path.sep)[-1]
    outputPath = os.path.sep.join([covid_output_path, filename])
    covid_count += 1

    # copy the image
    shutil.copy2(imagePath, outputPath)

print ("Succsessfuly copied over {} covid images!".format(covid_count))

FileNotFoundError: ignored

## Step 2: get the normal x-ray images
To find X-ray images of healthy patients I used this [Kaggle dataset](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia). Download and extract

In [None]:
from imutils import paths
import random

In [None]:
normal_ds_path = '/home/henriklg/Downloads/chest-xray-pneumonia/chest_xray/chest_xray'
normal_output_path = 'dataset/normal'
normal_count = covid_count # grab as many normal images as we have covid images (in my case 56)

# Check if output folder exist, if not, create it
directory = pathlib.Path(normal_output_path)
if not directory.exists():
        os.makedirs(normal_output_path)

In [None]:
# grab all training image paths from the Kaggle X-ray dataset
basePath = os.path.sep.join([normal_ds_path, "train", "NORMAL"])
imagePaths = list(paths.list_images(normal_ds_path))

# randomly sample the image paths
random.seed(42)
random.shuffle(imagePaths)
imagePaths = imagePaths[:normal_count]
count = 0

# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
    # extract the filename from the image path and then construct the
    # path to the copied image file
    filename = imagePath.split(os.path.sep)[-1]
    outputPath = os.path.sep.join([normal_output_path, filename])

    # copy the image
    shutil.copy2(imagePath, outputPath)
    count += 1

print ("Succsessfuly copied over {} normal images!".format(count))

Succsessfuly copied over 56 normal images!


### Now we have created a dataset with one folder for healthy patients and one for sick patients. Next: train the model!