Dear Kagglers,

This kernel is intended to help beginners who are overwhelmed by this project get "up and running." We hope to provide a basic understanding of the data and how to work with it to make predictions that can be submitted to the competition. If you have any questions, comments, or especially criticisms please let us know in the comments so we can address them! Have a nice day, everyone!

---

# <a href='#contents'>Table of Contents</a> <a id="contents"></a> 
1. <a href='#intro'>Introduction</a>
   *   <a href='#overview'>Project Overview</a>
   *   <a href='#goals'>Goals</a>
2. <a href='#setup'>Set-up
   *   <a href='#dependencies'>Install Dependencies</a>
   *   <a href='#files'>Examine Files</a>
3. <a href='#data'>Data Exploration and Visualization</a>
   *   <a href='#classes'>Classes</a>
   *   <a href='#locations'>Pneumonia Locations</a>
4. <a href='#conclusion'>Conclusion</a>
    *   <a href='#kernels'>Other Helpful Kernels</a>
    *   <a href='#refs'>References</a>

 ---
 ---

## <span style="color:darkgreen">Introduction</span> <a id="intro"></a>
This kernel will provide a simplified approach to loading and examining data for the RSNA Pneumonia Detection Challenge. First we will briefly discuss the challenge. Then we will look at the `csv` files and examine the [DICOM](https://en.wikipedia.org/wiki/DICOM) images. Next we will manipulate the images to maximize their usefulness for model training. Lastly, we will point you to models to explore, including our own version of YOLO.

---

### Project Overview <a id='overview'></a>

Numerous advances in medicine have been accomplished through the use of machine learning on medical imagery. The Radiological Society of North America ([RSNA](http://www.rsna.org)) has sponsored this competition on Kaggle to incentivize the creation of new algorithms that can detect pneumonia in radiographic images. A more detailed definition of the of the competition is provided on the [Kaggle RSNA Pneumonia Detection Challenge website](https://www.kaggle.com/c/rsna-pneumonia-detection-challenge). 

---

### Goals <a id='goals'></a>

To classify pneumonia in medical images we will build an algorithm that detects lung opacities. We will need to differentiate between pneumonia opacities and other lung conditions that can create signals such as fluid overload, bleeding, volume loss, lung cancer, post-radiation or surgical changes, and fluid in the pleural space.

This will be done by training machine learning models on a set of images that have been labeled by experts to show a box drawn around the presumed pneumonia lung opacity or opacities, when present. Many control images with no pneumonia opacity or with other lung conditions are also present with which to train. A successful algorithm will be able to take unlabeled images and label them accurately by drawing a box around pneumonia lung opacities.

Scoring in the competition will be by finding the mean average precision of the predicted boxes at different intersection over union thresholds. A more detailed explanation can be found on the [challenge's evaluation page](https://www.kaggle.com/c/rsna-pneumonia-detection-challenge#evaluation).

---

(return to <a href='#contents'>Table of Contents</a>)

---
---

## <span style="color:darkgreen">Set-up</span> <a id='setup'></a>

### Install Dependencies <a id='dependencies'></a>

We need to import the libraries and packages we'll be using. These are:
*   [os](https://docs.python.org/3/library/os.html): Python module for operating system functionality (<a href='#os'>here</a>)
*   [pandas](https://pandas.pydata.org/index.html): Python data analysis and processing library, used for reading`csv`files (<a href='#pd'>here</a>)
*   [seaborn](https://seaborn.pydata.org/): Python visualization library (<a href='#seaborn'>here</a>)
*   [matplotlib.pyplot](https://matplotlib.org/users/pyplot_tutorial.html): graphical plotting output library (<a href='#plt'>here</a>)
*   [pydicom](https://github.com/pydicom/pydicom): inspect and modify [DICOM](https://www.dicomstandard.org/) data (<a href='#pydicom'>here</a>)
*   [numpy](https://docs.scipy.org/doc/numpy/reference/): Python package for scientific computations, including linear algebra (<a href='#np'>here</a>)
*   [multiprocessing](https://docs.python.org/3/library/multiprocessing.html): supports spawning processes (<a href='#multiprocessing'>here</a>)
*   [warnings](https://docs.python.org/3/library/warnings.html): provides options for warning control (<a href='#warnings'>here</a>)
*   [matplotlib.patches](https://matplotlib.org/api/_as_gen/matplotlib.patches.Patch.html#matplotlib.patches.Patch): 2D shapes with a face color and an edge color (<a href='#patches'>here</a>)
*   [imgaug](https://imgaug.readthedocs.io/en/latest/): library for image augmentation in machine learning experiments (<a href='#imgaug'>here</a>)
*   [tqdm](https://pypi.org/project/tqdm/): fast, extensible progress meter (<a href='#tqdm'>here</a>)
*   [scikit-learn GaussianMixture](http://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html): allows estimation of the parameters of a Gaussian mixture distribution (<a href='#gaussianmixture'>here</a>)
*   [scikit-image morphology, feature, measure, util, & transform](https://scikit-image.org/): a collection of algorithms for image processing (<a href='#skimage'>here</a>)

Click on `here` to the right of the library to find the code block where it is first used. <a id='warnings'></a>

In [None]:
import os, pandas as pd, seaborn as sns, matplotlib.pyplot as plt, pydicom, numpy as np
import multiprocessing, warnings
import matplotlib.patches as patches

from imgaug import augmenters as iaa
from tqdm import tqdm
from sklearn.mixture import GaussianMixture
from skimage import feature
from skimage import morphology
from skimage import measure
from skimage import util
from skimage import transform

warnings.filterwarnings('ignore')

### Examine Files <a id='files'></a>

We will start by listing the files included in the project. <a id='os'></a>

In [None]:
os.listdir("../input")

Let's first look at the `csv` files in order. We'll use [`pandas.read_csv`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) to read the data and then look at a few examples. <a id='pd'></a>

In [None]:
stage_1_detailed_class_info = pd.read_csv('../input/stage_1_detailed_class_info.csv')
print(stage_1_detailed_class_info.iloc[2:5])
print("Rows:", stage_1_detailed_class_info.shape[0])
print("Columns:", stage_1_detailed_class_info.shape[1])

There are 28,989 rows with two columns. The first column is the `patientId` and the second is the `class`. How many of those patient IDs are unique?

In [None]:
print("# of unique patient IDs: ", stage_1_detailed_class_info['patientId'].nunique())

Some patients are represented multiple times in the `patientId` column.

Let's break that down.

This line below counts up each `patientId`'s number of occurrences and then counts up how many have each number of occurrences. Got it?

In [None]:
stage_1_detailed_class_info['patientId'].value_counts().value_counts()

So 3062 patients are represented twice, 105 thrice, and 11 four times.

---

The sample submission file will show us the format that we should be using for our competition submission after we have created our predictions.

In [None]:
stage_1_sample_submission = pd.read_csv('../input/stage_1_sample_submission.csv')
print(stage_1_sample_submission.iloc[:3])
print("Rows:", stage_1_sample_submission.shape[0])
print("Columns:", stage_1_sample_submission.shape[1])

We see that it has two columns, titled `patientId` and `PredictionString`, and 1000 rows. This tells us we do NOT want to include the row index counters on our submission. We also know, from the [competition evaluation page](https://www.kaggle.com/c/rsna-pneumonia-detection-challenge#evaluation), that if a patient has multiple predicted bounding boxes that the submission should include them all by listing them one after another in the prediction column like this: 

`00322d4d-1c29-4943-afc9-b6754be640eb,0.8 10 10 50 50 0.75 100 100 5 5`

---

Next up we have the training labels file.

In [None]:
stage_1_train_labels = pd.read_csv('../input/stage_1_train_labels.csv')
pd.set_option('display.max_columns', 1000)
pd.set_option('display.width', 1000)
print(stage_1_train_labels.iloc[2:5])
print("Rows:", stage_1_train_labels.shape[0])
print("Columns:", stage_1_train_labels.shape[1])
print("# of unique patient IDs: ", len(list(stage_1_train_labels.patientId.unique())))

Each row consists of the `patientId`, as well as either `NaN` (not a number) if there are no pneumonia bounding boxes indicated or values representing the `x` and `y` coordinates of the upper-left corner of the bounding box followed by the `width` and `height` of the bounding box. `Target` is 0 for no boxes and 1 when a box is present. And again, some patients are represented multiple times. Spoiler: that is because some patients have multiple bounding boxes labeled!

---

Let's now look at the metadata available with each image. We'll use pydicom to read the images. <a id='pydicom'></a>

In [None]:
patientId = stage_1_train_labels['patientId'][0]
dicom_file = '../input/stage_1_train_images/%s.dcm' % patientId
dicom_data = pydicom.read_file(dicom_file)

dicom_data

Wow. There is a lot here. The information we are most interested in at this point are the patient's sex and age, the view position of the image (PA for posterior -> anterior or AP for anterior -> posterior), the image pixel spacing, and of course the image itself, which is contained in the `Pixel Data` array.

(return to <a href='#contents'>Table of Contents</a>)

---
---

## <span style="color:darkgreen">Data Exploration and Visualization</span> <a id='data'></a>

First of all, a giant shout-out to [thomasjpfan's kernel](https://www.kaggle.com/thomasjpfan/q-a-with-only-pictures) for a lot of these visualizations. We've adapted some and used others as is and explained the code behind them for this tutorial.

---

Next we will define functions that parse the DICOM metadata for the attributes we are interested in. We get the age, gender, view position, patient ID, pixel spacing, and a metric to determine the number of black pixels in each image (we'll see why later).

We use Python's multiprocessing package to accelerate the speed of these tasks. <a id='multiprocessing'></a> <a id='np'></a>

In [None]:
stage_1_train_labels['aspect_ratio'] = (stage_1_train_labels['width'] / 
                                        stage_1_train_labels['height'])
stage_1_train_labels['area'] = stage_1_train_labels['width'] * stage_1_train_labels['height']

def get_info(patientId, root_dir='../input/stage_1_train_images/'):
    file_name = os.path.join(root_dir, f'{patientId}.dcm')
    dicom_data = pydicom.read_file(file_name)
    return {'age': dicom_data.PatientAge, 
            'gender': dicom_data.PatientSex,
            'view_position': dicom_data.ViewPosition,
            'id': os.path.basename(file_name).split('.')[0],
            'pixel_spacing': float(dicom_data.PixelSpacing[0]),
            'mean_black_pixels': np.mean(dicom_data.pixel_array == 0)}

patient_ids = list(stage_1_train_labels.patientId.unique())
with multiprocessing.Pool(4) as pool:
    result = pool.map(get_info, patient_ids)
    
demo = pd.DataFrame(result)
demo['age'] = demo['age'].astype(int)
demo['gender'] = demo['gender'].astype('category')
demo['view_position'] = demo['view_position'].astype('category')

stage_1_train_labels = (stage_1_train_labels.merge(demo, left_on='patientId', 
                                                   right_on='id', how='left')
                        .drop(columns='id'))

In [None]:
stage_1_train_labels[2:5]

---
### Classes <a id='classes'></a>

How many of the patients have pneumonia compared to those that don't? <a id='seaborn'></a> <a id='plt'></a>

In [None]:
sns.set_style('darkgrid')
sns.set_context('notebook', font_scale=1.4)

plt.rcParams['figure.figsize'] = [12, 3]
plt.rcParams['lines.linewidth'] = 1

boxes_per_patient = stage_1_train_labels.groupby('patientId')['Target'].sum()

ax = (boxes_per_patient > 0).value_counts().plot.barh(color=['teal','orange'])
_ = ax.set_title('Pneumonia opacity present')
_ = ax.set_xlabel('Number of patients')
_ = ax.xaxis.set_tick_params(rotation=0)

---

How many bounding boxes designating a pneumonia opacity are present in each image?

In [None]:
ax = boxes_per_patient.value_counts().sort_index().plot.barh()
_ = ax.set_title('Pneumonia opacity bounding boxes per image')
_ = ax.set_xlabel('Number of patients')
_ = ax.set_ylabel('Boxes')
_ = ax.xaxis.set_tick_params(rotation=0)

---

What is the age distribution by gender and target?

In [None]:
g = sns.FacetGrid(col='Target', hue='gender', 
                  data=stage_1_train_labels.drop_duplicates(subset=['patientId']), 
                  height=9, palette=dict(F="red", M="blue"))
_ = g.map(sns.distplot, 'age', hist_kws={'alpha': 0.5}).add_legend()
_ = g.fig.suptitle("What is the age distribution by gender and target?", y=1.02, fontsize=20)

---

What are the areas of the bounding boxes by gender?

In [None]:
areas = stage_1_train_labels.dropna(subset=['area'])
g = sns.FacetGrid(hue='gender', data=areas, height=9, palette=dict(F="red", M="blue"), aspect=1.4)
_ = g.map(sns.distplot, 'area', hist_kws={'alpha': 0.5}).add_legend()
_ = g.fig.suptitle('What are the areas of the bounding boxes by gender?', y=1.01)

Here we see that females have smaller sized pneumonia bounding boxes. This is possible because [females generally have lung volumes that are 10%-12% less than that of males](https://www.atsjournals.org/doi/pdf/10.1164/rccm.200208-876OC).

### Pneumonia Locations <a id='locations'></a>

Where is the centroid of the bounding boxes of pneumonia located?

In [None]:
centers = (stage_1_train_labels.dropna(subset=['x'])
           .assign(center_x=stage_1_train_labels.x + stage_1_train_labels.width / 2, 
                   center_y=stage_1_train_labels.y + stage_1_train_labels.height / 2))
ax = sns.jointplot("center_x", "center_y", data=centers, height=6, alpha=0.03, color="red")
_ = ax.fig.suptitle("Where is Pneumonia located?", y=1.01)

Without overlaying this over the mean lung outline it is difficult to make any strong conclusions from this, as the lungs may not be centered on all images. It would be helpful to recalculate the centroid in relation to each lung after segmentation.

---

How is the pixel spacing distributed?

In [None]:
pixel_vc = stage_1_train_labels.drop_duplicates('patientId')['pixel_spacing'].value_counts()
pixel_vc.iloc[4] += pixel_vc.iloc[5] # combine into one as near identical values
ax = pixel_vc.iloc[0:5].plot.barh()
_ = ax.set_yticklabels([f'{ps:.3f}' for ps in pixel_vc.index[:6]])
_ = ax.set_xlabel('Count')
_ = ax.set_ylabel('Pixel Spacing')
_ = ax.set_title('How is the pixel spacing distributed?')

---

How are the bounding box areas distributed by the number of boxes?

In [None]:
areas_with_count = areas.merge(pd.DataFrame(boxes_per_patient).rename(columns={'Target': 'bbox_count'}), 
                               on='patientId')
g = sns.FacetGrid(hue='bbox_count', data=areas_with_count, height=8, aspect=1.4)
_ = g.map(sns.distplot, 'area').add_legend()
_ = g.fig.suptitle("How are the bounding box areas distributed by the number of boxes?", y=1.01)

---

Where are the outliers? <a id='gaussianmixture'></a>

In [None]:
plt.rcParams['figure.figsize'] = [10, 6]
clf = GaussianMixture(n_components=2)
clf.fit(centers[['center_x', 'center_y']])
center_probs = clf.predict_proba(centers[['center_x', 'center_y']])
Z = -clf.score_samples(centers[['center_x', 'center_y']])
outliers = centers.iloc[Z > 17]
fig, ax = plt.subplots()
centers.plot.scatter('center_x', 'center_y', c=Z, alpha=0.07, cmap='viridis', ax=ax)
outliers.plot.scatter('center_x', 'center_y', c='red', marker='x', s=100, ax=ax)
_ = ax.set_title('Where are the outliers?', fontsize=18)

Again, this is difficult to interpret without normalizing the location of the lungs in each image.

---

What do the outliers look like? <a id='patches'></a>

In [None]:
def get_image(patientId, root_dir='../input/stage_1_train_images/'):
    fn = os.path.join(root_dir, f'{patientId}.dcm')
    dcm_data = pydicom.read_file(fn)
    return dcm_data.pixel_array

def draw_bbs(bbs, ax):
    for bb in bbs.itertuples():
        rect = patches.Rectangle(
            (bb.x, bb.y), bb.width, bb.height,
            linewidth=2, edgecolor='red', facecolor='none')
        ax.add_patch(rect)

def draw_image(img, bbs, ax):
    ax.imshow(img, cmap='gray')
    ax.grid(False)
    ax.set_xticks([])
    ax.set_yticks([])
    if bbs is not None:
        draw_bbs(bbs, ax)

outliers_15 = outliers.drop_duplicates(subset=['patientId']).iloc[:15]
fig, axes = plt.subplots(3, 5)
for row, ax in zip(outliers_15.itertuples(), axes.flatten()):
    img = get_image(row.patientId)
    bbs = stage_1_train_labels.loc[stage_1_train_labels.patientId == row.patientId, ['x', 'y', 'width', 'height']]
    draw_image(img, bbs, ax)
fig.tight_layout(pad=-0.5)

Aha! As feared, there are images that are cropped and off-center. One thing we notice about those is that the have a large quantity of black pixels resulting from the cropping. Let's look into that.

---

What is the distribution of black pixels saturation in the images? How many have more than 10% of images comprised of black pixels?

In [None]:
plt.rcParams['figure.figsize'] = [12, 4]
ax = sns.distplot(stage_1_train_labels.mean_black_pixels)
_ = ax.set_xlabel('Percentage of black pixels')
_ = ax.set_title('Are there images with mostly black pixels?')
print("Images with more than 10% black pixels: ", len(stage_1_train_labels[stage_1_train_labels.mean_black_pixels > 0.1]))

---

What do the images with mostly black pixels look like?

In [None]:
high_black_pixel_patientIds = stage_1_train_labels.loc[stage_1_train_labels.mean_black_pixels > 0.1, 
                                                       'patientId'].drop_duplicates()
fig, axes = plt.subplots(4, 5)
for i, (patient_id, ax) in enumerate(zip(high_black_pixel_patientIds, axes.flatten())):
    row = stage_1_train_labels.loc[stage_1_train_labels.patientId == patient_id]
    img = get_image(row.patientId.iloc[0])
    bbs = row[['x', 'y', 'width', 'height']]
    draw_image(img, bbs, ax)
fig.tight_layout(pad=-1)

---

While we're at it, what do the images with mostly white pixels look like?

In [None]:
high_white_pixel_patientIds = stage_1_train_labels.loc[stage_1_train_labels.mean_black_pixels < 0.000001, 'patientId'].drop_duplicates()
fig, axes = plt.subplots(4, 5)
for patient_id, ax in zip(high_white_pixel_patientIds, axes.flatten()):
    row = stage_1_train_labels.loc[stage_1_train_labels.patientId == patient_id]
    img = get_image(row.patientId.iloc[0])
    bbs = row[['x', 'y', 'width', 'height']]
    draw_image(img, bbs, ax)
fig.tight_layout(pad=-1)

These ones are generally full frame. Some contain pneumonia opacities and others contain other conditions.

---

Can traditional image processing find a bounding box in the cropped images? (<a href='#skimage'>here</a>)

In [None]:
high_black_pixel_images = np.empty(shape=(high_black_pixel_patientIds.shape[0], 1024, 1024))

for i, patient_id in enumerate(high_black_pixel_patientIds):
    row = stage_1_train_labels.loc[stage_1_train_labels.patientId == patient_id]
    img = get_image(row.patientId.iloc[0])
    high_black_pixel_images[i] = img 
    
high_black_pixel_contours = []
for img in high_black_pixel_images:
    img2 = feature.canny(img != 0)
    img2 = morphology.convex_hull_image(img2)
    c = measure.find_contours(img2, 0)[0]
    c = measure.approximate_polygon(c, 20)
    high_black_pixel_contours.append(c)

fig, axes = plt.subplots(4, 5)
contours = []
for c, img, ax in zip(high_black_pixel_contours, high_black_pixel_images, axes.flatten()):
    draw_image(img, None, ax)
    _ = ax.plot(c[:, 1], c[:, 0], '-b', linewidth=4)
fig.tight_layout(pad=-1)

---

Can the bounding boxes be resized when cropping and resizing the cropped images?

In [None]:
def order_coordinates(coords):
    """Returns coordinates with order:
    (top left, top right, bottom right, bottom left)
    """
    coords = coords[:-1]
    output = np.empty((4, 2), dtype=np.float32)
    dists = coords[:, 1]**2 + coords[:, 0]**2
    ratios = coords[:, 1]/np.sqrt(dists)
    
    tl = coords[np.argmin(dists)]
    br = coords[np.argmax(dists)]
    
    tr = coords[np.argmax(ratios)]
    bl = coords[np.argmin(ratios)]
    
    output[0] = tl
    output[1] = tr
    output[2] = br
    output[3] = bl
    
    return output[:,::-1]

def _convert_bb(bb, tfm):
    x, y, w, h = bb.x, bb.y, bb.width, bb.height
    pts = np.array([
        [x, y],
        [x + w, y],
        [x + w, y + h],
        [x, y + h]
    ])
    new_pts = tfm.inverse(pts)
    pts_min = np.min(new_pts, axis=0)
    pts_max = np.max(new_pts, axis=0)
    
    x, y = pts_min
    w, h = pts_max - pts_min
    
    return np.array([x, y, w, h])

def convert_bbs(bboxs, tfm):
    output = np.empty_like(bboxs, dtype=np.float32)
    
    for i, bb in enumerate(bboxs.itertuples()):
        output[i] = _convert_bb(bb, tfm)
    
    return pd.DataFrame(output, columns=['x', 'y', 'width', 'height'])

fig, axes = plt.subplots(4, 2, figsize=(8, 10))

orig_coords = np.array([[0, 0], [1024, 0], [1024, 1024], [0, 1024]])
interesting_idices = [0, 2, 3, 17]

for i, (ax1, ax2) in zip(interesting_idices, axes):
    patient_id = high_black_pixel_patientIds.iloc[i]
    img = high_black_pixel_images[i]
    contour = high_black_pixel_contours[i]
    
    row = stage_1_train_labels.loc[stage_1_train_labels.patientId == patient_id]
    bbs = row[['x', 'y', 'width', 'height']]
    ordered_coors = order_coordinates(contour)
    tform = transform.estimate_transform('projective', orig_coords, ordered_coors)
    img_t = transform.warp(img, tform, output_shape=(1024, 1024))
    
    new_bbs = convert_bbs(bbs, tform)
    _ = draw_image(img, bbs, ax1)
    _ = draw_image(img_t, new_bbs, ax2)
    
fig.tight_layout(pad=-1)

---

How are the bounding box aspect ratios distributed?

In [None]:
ax = sns.distplot(stage_1_train_labels['aspect_ratio'].dropna(), norm_hist=True)
_ = ax.set_title("What does the distribution of bounding aspect ratios look like?")
_ = ax.set_xlabel("Aspect Ratio")

---

What does the images with a high aspect ratio look like?

In [None]:
aspect_ratios = stage_1_train_labels['aspect_ratio'].dropna()
high_aspect_ratio_tr = (stage_1_train_labels.iloc[aspect_ratios[aspect_ratios > aspect_ratios.quantile(q=0.99)].index]
                          .drop_duplicates(['patientId']))
fig, axes = plt.subplots(3, 5)
for row, ax in zip(high_aspect_ratio_tr.itertuples(), axes.flatten()):
    img = get_image(row.patientId)
    bbs = stage_1_train_labels.loc[stage_1_train_labels.patientId == row.patientId, ['x', 'y', 'width', 'height']]
    draw_image(img, bbs, ax)
fig.tight_layout(pad=-0.5)

---

Is there a relationship between the bounding box's aspect ratio and area?

In [None]:
g = sns.relplot(x='area', y='aspect_ratio', 
            data=stage_1_train_labels.dropna(subset=['area', 'aspect_ratio']), 
            height=8, alpha=0.8, aspect=1.4,)
_ = g.fig.suptitle("Is there a relationship between the bounding box's aspect ratio and area?", y=1.005)

(return to <a href='#contents'>Table of Contents</a>)

---
---

## <span style="color:darkgreen">Conclusion</span> <a id="conclusion"></a>

So there you have it. We've loaded the data, explored the files included, examined images to help us understand our goals and to brainstorm approaches, and run the provided model on the data to create our first submission file. Where do we go next? Well, there are several other models that have been shared in the kernels.

### Other Kernels  <a id="kernels"></a>

*   [CNN with segmentation](https://www.kaggle.com/jonnedtc/cnn-segmentation-connected-components)
*   [CheXNet](https://www.kaggle.com/ashishpatel26/chexnet-radiologist-level-pneumonia-detection)
*   [NASNet](https://www.kaggle.com/ashishpatel26/beginner-tutorial-nasnet-pneumonia-detection)

You can also check out these excellent Medium articles by NoMonia team members:
* [YOLO Object Detection Walkthrough for the RSNA Pneumonia Detection Challenge](https://medium.com/@hjhuney/yolo-object-detection-walkthrough-for-the-rsna-pneumonia-detection-challenge-123ec9a9adf2) by Jake Huneycutt: a walk-through to get a YOLOv3 model working locally
* [Kaggle RSNA Pneumonia Detection Challenge Explained](https://medium.com/@sebastiannorena/c140b19bf903) by Sebastian Norena: ideas for improving upon the competition-provided starter kernel
* [No-more-moaning…ia: A Journey in Medical Imaging](https://medium.com/@t7jackso/26951f901707) by Robert Jackson: An overview of the competition challenge with discussion on pneumonia, neural networks, and more

### References <a id="refs"></a>

https://www.atsjournals.org/doi/pdf/10.1164/rccm.200208-876OC

(return to <a href='#contents'>Table of Contents</a>)

---
---