# Pneumonia Classification
Project by Maria Kaltenbrunner and Mohammad Goha
<br>for Computer Vision (FH Technikum) Fall 2020

## What is our project?
We want to train a model which is able to classify y-ray images of lungs into pneumonic and non-pneumonic. 

## What is Pneumonia?

## Importing the necessary libraries

In [3]:
import os
import matplotlib as plt
import numpy as np
import pandas as pd
import keras
import cv2
from sklearn.model_selection import train_test_split

Using TensorFlow backend.


## Description of the Pneumonia Dataset

![0_basic_data_structure.png](documentation/images/0_basic_data_structure.png)


## Loading the Dataset

In [48]:
def load_images(data_dir, folder, img_size, color = False):
    """
    Loads images from a folder, resizes them and stores them in a numpy array.
   
    Parameters:
    data_dir -- string, the path to to the main data folder
    img_size: integer, to this value the loaded images are resized before returned
    
    Returns:
    data -- list of numpy arrays, containing data and labels of train, test and validation
    """
    data_dir = os.path.join(data_dir, folder)
    label_folders = os.listdir(data_dir)
    colors = {True: cv2.IMREAD_COLOR, False: cv2.IMREAD_GRAYSCALE}

    images_original = []
    string_labels = []
    for folder in label_folders:
        folder_path = os.path.join(data_dir, folder)
        filenames = os.listdir(folder_path)
        filepaths = [os.path.join(folder_path, name) for name in filenames]
        images_original += [cv2.imread(file, colors[color]) for file in filepaths]
        images_resized = [cv2.resize(img, (img_size, img_size)) for img in images_original]
        string_labels += [folder for _ in filenames]
        
    return (np.array(images_resized), np.array(string_labels))

In [49]:
data_dir = 'data'
folders = os.listdir(data_dir)

train_X, train_y = load_images(data_dir, folders[0], 100)
test_X, test_y =  load_images(data_dir, folders[1], 100)
val_X, val_y = load_images(data_dir, folders[2], 100)

In [53]:
print(train_X.shape)
print(test_X.shape)
print(val_X.shape)

print(train_y.shape)
print(test_y.shape)
print(val_y.shape)

print(np.unique(train_y))
print(np.unique(test_y))
print(np.unique(val_y))

(624, 100, 100)
(5216, 100, 100)
(16, 100, 100)
(624,)
(5216,)
(16,)
['NORMAL' 'PNEUMONIA']
['NORMAL' 'PNEUMONIA']
['NORMAL' 'PNEUMONIA']


## Exploring the Data

In [None]:
pip freeze > requirements.txt

## Sources
* [Kaggle: Chest X-Tay Pneumonia (Data)](https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia)
* [Kaggle: Pneumonia Detection using CNN](https://www.kaggle.com/madz2000/pneumonia-detection-using-cnn-92-6-accuracy)