# Module 2 Project: Diagnose Pneumonia from Chest X-Rays Using Computer Vision

**[Packages](#import-relevant-packages)**<br>
**[Get Initial Threshold Value](#derive-initial-threshold-value)**<br>
**[Get White Pixels](#count-white-pixels-and-normalize)**<br>
**[Healthy X-Ray-Analysis](#d#Identify-healthy-x-rays)**<br>
**[Final Diagnosis](#Identify-pneumonia-x-rays)**<br>
**[Recap](#recap)**<br>

## Import relevant packages

In [20]:
import cv2
import numpy as np
import os
from matplotlib import pyplot as plt


## Derive Initial Threshold Value

### Load example image of healthy and pneumonia x-ray

In [21]:
# load images from your computer (change path to math where they are stored) 
healthy_example = cv2.imread('code_md/module_2_Computer_Vision/diagnose_pneumonia_project/pneumonia_project_images/initial_xrays/healthy_xray.jpeg')
pneumonia_example = cv2.imread('code_md/module_2_Computer_Vision/diagnose_pneumonia_project/pneumonia_project_images/initial_xrays/pneumonia_xray.jpeg')

# check to see if rgb or greyscale as well as size of image
print(healthy_example.shape)
print(pneumonia_example.shape)

(1024, 1024, 3)
(425, 442, 3)


### Convert it to grayscale if it is in RGB format

In [22]:
# Check if images are in RGB format and convert to grayscale if necessary
if len(healthy_example.shape) == 3 and healthy_example.shape[2] == 3:
    healthy_example = cv2.cvtColor(healthy_example, cv2.COLOR_BGR2GRAY)
    print('converting healthy xray to grayscale')
    print(healthy_example.shape)

if len(pneumonia_example.shape) == 3 and pneumonia_example.shape[2] == 3:
    pneumonia_example = cv2.cvtColor(pneumonia_example, cv2.COLOR_BGR2GRAY)
    print('converting pneumonia xray to grayscale')
    print(pneumonia_example.shape)

converting healthy xray to grayscale
(1024, 1024)
converting pneumonia xray to grayscale
(425, 442)


In [None]:
# show the image, provide window name first
# matplotlib applies colormap by default so set colormap (cmap) to grey
plt.imshow(healthy_example,cmap='gray')
plt.title('Healthy xray')
plt.show()

plt.imshow(pneumonia_example, cmap='gray')
plt.title('Pneumonia xray')
plt.show()

## Lets try a few thresholding operations to see if we can identify the consolidation

In [None]:
# List of thresholds
thresholds = [50, 75, 100, 125, 150]

# Loop over the thresholds
for thresh in thresholds:
    # Apply binary thresholding using opencv
    # ret is the threhsold value that was used. We dont need to worry about why we have to include it, but you will get a friendly error if you take it out!
    ret, healthy_threshold = cv2.threshold(healthy_example, thresh, 255, cv2.THRESH_BINARY)
    ret, pneumonia_threshold = cv2.threshold(pneumonia_example, thresh, 255, cv2.THRESH_BINARY)

    # Plot the thresholded images using the matplotlib library
    # subplot is just a way to show multiple figures on the same plot
    plt.figure(figsize=(10, 5))

    plt.subplot(1, 2, 1)
    plt.imshow(healthy_threshold, cmap='gray')
    plt.title('Healthy xray, Threshold =' + str(thresh))

    plt.subplot(1, 2, 2)
    plt.imshow(pneumonia_threshold, cmap='gray')
    plt.title(f'Pneumonia xray, Threshold =' + str(thresh))

    plt.tight_layout()
    plt.show()

## 125 looks like a good value to use! Lets now count the white pixels in each image after applying the threshold

## Count White Pixels and Normalize

In [29]:
ret, healthy_threshold = cv2.threshold(healthy_example, 125, 255, cv2.THRESH_BINARY)
ret, pneumonia_threshold = cv2.threshold(pneumonia_example, 125, 255, cv2.THRESH_BINARY)

# Count white pixels
white_pixels_healthy = np.count_nonzero(healthy_threshold == 255)
white_pixels_pneumonia = np.count_nonzero(pneumonia_threshold == 255)

print('The number of white pixels in the healthy xray is: ' + str(white_pixels_healthy))
print('The number of white pixels in the healthy xray is: ' + str(white_pixels_pneumonia))

The number of white pixels in the healthy xray is: 588026
The number of white pixels in the healthy xray is: 130848


### Wait, that doesn't make sense. The white pixels should correlate to the amount of consolidation. And we KNOW that there is more consolidation in the pneumonia image... OH! We need to consider the SIZE of the IMAGE! Lets normalize the number of white pixels to the total number of pixels

In [30]:
# we take the value we got earlier and divide it by the total size of the image!
normalized_white_pix_healthy = white_pixels_healthy / healthy_threshold.size
normalized_white_pix_pneumonia = white_pixels_pneumonia / pneumonia_threshold.size

print(normalized_white_pix_healthy)
print(normalized_white_pix_pneumonia)

0.5607852935791016
0.6965557625765239


## Now that we have some of the basic moves underneath us, lets try and set the the ratio of white pixels/ to total pixels that define pneumonia based on a few images of healthy xrays
## Identify healthy X-Rays

In [33]:
# os is a library that lets you walk through every file in a given directory. You can look up the details if you want,
# but in broad strokes it lets us pull a bunch of data in and out of software easily to do stuff with

#define a list to store the data we accumulate
white_pix_ratio_list = []
for root, dirs, files in os.walk('code_md/module_2_Computer_Vision/diagnose_pneumonia_project/pneumonia_project_images/healthy_xrays'):
    for file in files:
        #make path to image as we walk through every image in the healthy xrays folder. Again, dont really need to understand how, just know thats what it does
        path = os.path.join(root, file)
        # load image using opencv
        image = cv2.imread(path)
        # Check if images are in RGB format and convert to grayscale if necessary
        if len(image.shape) == 3 and image.shape[2] == 3:
            image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        # apply threshold using 125 like we determined earlier
        ret, threshold_image = cv2.threshold(image, 125, 255, cv2.THRESH_BINARY)
        # Count white pixels
        white_pixels = np.count_nonzero(threshold_image == 255)   
        #normalize to image size
        normalized_white_pix = white_pixels / threshold_image.size
        #store the ratio of white pixels to image size in the earlier list we made 
        white_pix_ratio_list.append(normalized_white_pix)

# calculate average, min, and max of list to get some more useful data to make disgnoses with
avg_healthy_ratio = sum(white_pix_ratio_list)/len(white_pix_ratio_list)
max_healthy_ratio = max(white_pix_ratio_list)
min_healthy_ratio = min(white_pix_ratio_list)
print('average is:' + str(avg_healthy_ratio))
print('minimum is:' + str(max_healthy_ratio))
print('maximum is:' + str(min_healthy_ratio))



average is:0.4923174988542816
minimum is:0.5954362877226362
maximum is:0.34267572434980664


## We got a value of .49 as the average of the 5 images we tested. The max is .59. Lets try to make a highly specific algorithm and use a value of .61 as the threshold. This may increase the number of false negatives but maximize the true positives 
## Identify Pneumonia X-Rays

In [38]:
# Set the threshold as determined above
ratio_threshold = .61

# make a list to store the final diagnoses
diagnosis_list = []
for root, dirs, files in os.walk('code_md/module_2_Computer_Vision/diagnose_pneumonia_project/pneumonia_project_images/pneumonia_xrays'):
    for file in files:
        #make path to image as we walk through every image in the healthy xrays folder. Again, dont really need to understand how, just know thats what it does
        path = os.path.join(root, file)
        # load image using opencv
        image_pneumonia = cv2.imread(path)
        # Check if images are in RGB format and convert to grayscale if necessary
        if len(image_pneumonia.shape) == 3 and image_pneumonia.shape[2] == 3:
            image_pneumonia = cv2.cvtColor(image_pneumonia, cv2.COLOR_BGR2GRAY)
        # apply threshold using 125 like we determined earlier
        ret, threshold_image_pneu = cv2.threshold(image_pneumonia, 125, 255, cv2.THRESH_BINARY)
        # Count white pixels
        white_pixels_pneu = np.count_nonzero(threshold_image_pneu == 255)   
        #normalize to image size
        normalized_white_pix_pneu = white_pixels_pneu / threshold_image_pneu.size
        # Compare the white pixels to the ratio from above to make a diagnosis!
        if normalized_white_pix_pneu > ratio_threshold:
            diagnosis_list.append('Pneumonia')
        else:
            diagnosis_list.append('Healthy')

# Print the final diagnosis list to see how many we got right

print(diagnosis_list)

['Healthy', 'Pneumonia', 'Pneumonia', 'Pneumonia', 'Pneumonia', 'Pneumonia', 'Pneumonia', 'Pneumonia', 'Pneumonia', 'Pneumonia']


## Recap
## We got 90% correct! Not bad!
## Obviously, you would never use this tool clinically. Even in our data, we had a wide range of variance, and we didnt take into account key features such positioning of the patient in the xray, scope of the xray image, xray machine settings, etc. In practice, you might want to build a convolutinoal neural network to solve a problem such as this, but we hope this gave you a good introduction to some core principles in comptuter vision and showed how this type of technology can be used in clinical settings! Play around with the notebook/ import your own data and see if you can make it even better!