In [None]:
import os
import random

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import pydicom

In [None]:
# Get directory names/locations
data_root = os.path.abspath("../input/rsna-intracranial-hemorrhage-detection/")

train_img_root = data_root + "/stage_1_train_images/"
test_img_root  = data_root + "/stage_1_test_images/"

train_labels_path = data_root + "/stage_1_train.csv"
test_labels_path  = data_root + "/stage_1_test.csv"

# Create list of paths to actual training data
train_img_paths = os.listdir(train_img_root)
test_img_paths  = os.listdir(test_img_root)

# Dataset size
num_train = len(train_img_paths)
num_test  = len(test_img_paths)

In [None]:
def create_efficient_df(data_path):
    
    # Define the datatypes we're going to use
    final_types = {
        "ID": "str",
        "Label": "float16"
    }
    features = list(final_types.keys())
    
    # Use chunks to import the data so that less efficient machines can only use a 
    # specific amount of chunks on import
    df_list = []

    chunksize = 1_000_000

    for df_chunk in pd.read_csv(data_path, dtype=final_types, chunksize=chunksize): 
        df_list.append(df_chunk)
        
    df = pd.concat(df_list)
    df = df[~df.isin([np.nan, np.inf, -np.inf]).any(1)]

    del df_list

    return df

train_labels_df = create_efficient_df(train_labels_path)
train_labels_df[train_labels_df["Label"] > 0].head()

In [None]:
hem_types = [
    "epidural",
    "intraparenchymal",
    "intraventricular",
    "subarachnoid",
    "subdural",
    "any"
]

new_cols = [
    "id",
    "type_0",
    "type_1",
    "type_2",
    "type_3",
    "type_4",
    "type_5"
]

num_ids = int(train_labels_df.shape[0] / len(hem_types))
print("Number of unique patient IDs: {}".format(num_ids))

empty_array = np.ones((num_ids, len(new_cols)))
hem_df = pd.DataFrame(data=empty_array, columns=new_cols)

# Fill in the ID of each image
hem_df["id"] = list(train_labels_df.iloc[::len(hem_types)]["ID"].str.split(pat="_").str[1])
    
# Fill in the categorical columns of each image
for hem_ix, hem_col in enumerate(list(hem_df)[1:]):
    hem_df[hem_col] = list(train_labels_df.iloc[hem_ix::len(hem_types), 1])
    
hem_df.info()
hem_df[hem_df["type_5"] > 0].head()

In [None]:
def show_random_img(df, img_root):
    
    random_ix = random.randint(0, df.shape[0])
    random_record = df.iloc[random_ix, :]
    random_id = random_record[0]
    random_path = img_root + "ID_" + random_id + ".dcm"
    
    title = "Patient {}\nEpidural: {}\nIntraparenchymal: {}\nIntraventricular: {}\nSubarachnoid: {}\nSubdural: {}"\
        .format(random_id, random_record[1], random_record[2], random_record[3], random_record[4], random_record[5])
    
    dicom = pydicom.dcmread(random_path)
    img_array = dicom.pixel_array
    
    plt.imshow(img_array)
    plt.axis("off")
    plt.title(title)

# Intracranial Hemorrhage Introduction

Hi there! My name is Mick and I'm a Biomedical Engineer. I have some industry experience working with stroke patients as well as working on some aortic intervention medical devices so I thought I would make a little introduction to the anatomy that goes along with the RSNA Intracranial Hemorrhage dataset here on Kaggle. Without further ado I'll jump right into everything that I think might be worth knowing to get people started on this dataset!

### Hemorrhage Background

<img src="https://5.imimg.com/data5/RH/SW/GLADMIN-58865352/brain-hemorrhage-treatment-service-500x500.png" width="500">

An intracranial hemorrhage occurs when blood gets somewhere in the skull that it's not supposed to be. Hemorrhages are a serious medical issue and can often lead to death if not treated almost immediately. Some of the most common causes for intracranial hemorrhage are hypertension, stroke, and traumatic brain injury. Hemorrhages are dangerous, not only because they result from a bleeding injury within the skull, but also because they result in *hematoma*s. Hematomas result when blood causes regions of intense pressure on an organ within the body. The term means literally "blood mass" and is quite dangerous. When pressure is increased on a region of the brain, that region can become *ischemic*. Ischemia is the deoxygenation and ultimate death of a region of an organ, typically the brain or heart. For instance, an ischemic stroke is one in which blood is blocked from entering a region of the brain and that region is allowed to become ischemic and die. The other type of stroke is *hemorrhagic* and results when blood bursts from a vessel and enters regions of the brain where it should not be. Heart attacks have similar symptoms and results which is why some doctors have pushed to refer to strokes as "brain attacks". Of the two types of stroke, hemorrhagic is significantly more dangerous and difficult to treat, which gives us an idea why analysis of hemorrhagic images is crucial in medicine.

### Keeping the Brain in Place

One of the most important anatomical relationships to understand for this dataset is that of the cranial meninges. These are the layers that protect the human brain from interacting with the skull or outside environment. They are crucial for keeping the brain seated safely within the skull while also providing functional support to the brain via blood flow. The diagram below shows the three meninges and should give a good orientation for the anatomy we're attempting to understand in this dataset.

For the purposes of this dataset we are trying to classify 5 different types of intracranial hemorrhage. Hemorrhages are classified typically based on the region of the brain in which they occur:
 * Epidural
 * Subdural
 * Intraparenchymal
 * Intraventricular
 * Subarachnoid

<img src="https://upload.wikimedia.org/wikipedia/commons/8/8e/Meninges-en.svg">

## Dataset Exploration

Now that we know a bit about hemorrhages generally, let's take a look at our image dataset=. I'd like to find the distribution of each type of hemorrhage and see if some occur more than others within the RSNA dataset. 

This dataset is large enough that we may see a scaled down representation of all cases of hemorrhage. For instance if 10% of the hemorrhages seen in this dataset are epidural, then does that mean that 10% of all hemorrhages are epidural?

In [None]:
hem_counts = hem_df[new_cols[1:]].astype(bool).sum(axis=0)
hem_portions = [hem_counts[hem_type] / hem_counts[-1] for hem_type in range(6)]
hem_total_portions = [hem_counts[hem_type] / num_train for hem_type in range(6)]

# What percent chance is there that a patient has a hemorrhage at all?
print("\nProbability that a patient in the dataset has any of the 5 hemorrhage types:")
print("Number of patients with hemorrhage:  {}".format(hem_counts[-1]))
print("Total number of patients:           {}".format(num_train))
print("p(hemorrhage): %.2f%%" % (hem_counts[-1] / num_train * 100))

# Given that a patient has a hemorrhage what is the percent chance that it is each type of hemorrhage?
print("\nNumber of each type of hemorrhage found in dataset and share of total hemorrhages [p(hem_type | hemorrhage)]: ")
print("%21s | %7s | %9s" % ("hemorrhage type", "count", "portion"))
for hem_type in range(5):
    print("%21s | %7d | %8.2f%%" % (hem_types[hem_type], hem_counts[hem_type], hem_portions[hem_type] * 100))
    
# What is the chance of each type of hemorrhage?
print("\nNumber of each type of hemorrhage found in dataset and share of the raw total [p(hem_type)]: ")
print("%21s | %7s | %9s" % ("hemorrhage type", "count", "portion"))
for hem_type in range(5):
    print("%21s | %7d | %8.2f%%" % (hem_types[hem_type], hem_counts[hem_type], hem_total_portions[hem_type] * 100))

# Epidural/Subdural Hemorrhage

<img src="https://www.topneurodocs.com/wp-content/uploads/2017/10/Epidural-Hematoma.jpg">

**Epidural**

An epidural hemorrhage is one that takes place outside of the *dura mater*. This membrane is the topmost meninge and is a vital component for homeostasis within the cranium. Injury of the *dura mater* can lead to many clinical dysfunctions due to its important structural role for maintaining the brain's position. Its strong structural durability gave it its name (*dura mater* in Latin means literally "tough mother"!). This tissue barrier is critical for the transportation of blood to the brain. Unfortunately, because of the precarious nature of the blood vessels supplying blood to the dura mater and removing blood from the brain, the membrane can be quite succeptible to tears and lacerations when traumatic injury occurs. Tears in the *dura mater* or pressure forming on it have been likened to an amplified feeling of having a brain freeze that never goes away. 

<img src="https://image.slidesharecdn.com/meninges-181111114745/95/meninges-13-638.jpg?cb=1541936925">

One such risk associated with the *dura mater* is a hemorrhage occurring within the brain and causing pressure on the membrane. **Epidural hemorrhages** are those that occur between the *dura mater* and the skull.

**Subdural**

 Subdural hemorrhages are hemorrhages that occur between the *dura mater* and the *arachnoid mater* meninges. Because of the *dura mater*'s two-fold nature, one side of it is responsible for carrying arterial blood (to the brain) and one side is responsible for carrying venuous blood (away from the brain). Hemorrhages that interact with these two different types of blood can have many different effects both on patient health and in imaging output. Venuous blood is devoid of oxygen since it has already delivered oxygen to the tissue it is supplying. This deooxygenation can make venuous blood more difficult to see in some imaging shemes that rely on oxygen concentration or physical components of oxygen.
 
 Subdural hemorrhages, as seen in the graphic above, release venuous blood so they will have a different oxygen concentration than epidural hemorrhages. Since subdural hemorrhages occur below the *dura mater* they are more typically associated with internal physiological damage as opposed to epidural hemorrhages which can result from traumatic injury.

### Visualizing Some Epidural and Subdural Hemorrhages

In [None]:
# Set the certainty that we want to use to visualize each of our types of hemorrhage
CERTAINTY = 0.95

In [None]:
epi_df = hem_df[hem_df["type_0"] > CERTAINTY]
show_random_img(epi_df, train_img_root)

In [None]:
sub_df = hem_df[hem_df["type_4"] > CERTAINTY]
show_random_img(sub_df, train_img_root)

# Intraparenchymal Hemorrhage

<img src="https://i.pinimg.com/originals/d9/16/0b/d9160b20c419ac1a4a1744501fa7bf8d.jpg">

Intraparenchymal hemmorhages are those that occur within the *parenchyma* of the brain (we're getting a nice refresher on some classics roots here!). *intra-* means within, so we know that we are within some part of the brain, but what is the *parenchyma*? Well it turns out the Greek-root definition is a bit less clear than we'd like. In Greek, *παρα* typically means "throughout" or "through", while *χύμα* means "at bulk" or "by volume/weight". So we've got a hemorrhage in a region of the brain that means "throughout the bulk". That's not very specific! And in fact, a *parenchyma* generally refers to any bulk substance within anatomy. So most of what we would desribe as the brain is the *parenchyma* of the brain.

The *parenchyma* contains the functional units of the brain, mainly the glial cells and the neurons that the glial cells maintain/assist. That means that an intraparenchymal hemorrhage is quite serious! They typically occur as a result of a burst blood vessel due to hypertension or otherwise from the formation and dissection of an aneurysm (think of the artery like a balloon that expands and bursts after a certain amount of pressure).

<img src="https://www.mayoclinic.org/-/media/kcms/gbs/patient-consumer/images/2018/02/22/17/14/10_13_17_brain-aneurysm-rerun_app_thumbnail_640x440_infographic_nocta.jpg" width="200">

### Visualizing Some Intraparenchymal Hemorrhages

In [None]:
iph_df = hem_df[hem_df["type_1"] > CERTAINTY]
show_random_img(iph_df, train_img_root)

# Intraventricular Hemorrhage

<img src="https://www.abclawcenters.com/wp-content/uploads/2016/08/Brain-Hemorrhage-in-Infant.jpg" width="400">

An intraventricular hemorrhage is one that takes place within the ventricles of the brain. We mostly picture the brain as being a fleshy ball of *parenchyma*, but an important anatomical feature of the brain is the presence of its negative space, or *ventricles* (from the Latin *venter* meaning belly or womb). Since by definition a ventricle is an absence of tissue, they don't make up a physical unit within the brain, however the ventricles are extremely useful when trying to diagnose disorders with the brain and they play a crucial role in the brain's physiology. If a ventricle is enlarged or shrunken it is often a sign of a serious neurological condition and is quite easy to diagnose with MRI or CT scanning technology. Another use of the ventricles is as a  set of landmarks within the brain. It is often useful to know where in the brain one is looking when viewing an image and the ventricles can be used to orient oneself to a region within the brain.

<img src="https://img.medscapestatic.com/pi/meds/ckb/24/12424tn.jpg">

The other purpose of the ventricles that is not directly related to imaging is the production of cerebrospinal fluid. CSF, as it is commonly acronymized to, plays myriad purposes in the healthy functioning of the human brain. CSF serves to hold the brain in suspension. Without it, the dura mater would be heavily strained due to the dense weight of the brain. CSF allows the brain to remain buoyant and reduces stress on the meninges that hold it in place. The other main purpose of CSF is to deliver nutrients to the brain. If the flow of CSF is tainted or interrupted then it can spell out big problems for a patient. 

Further, if hemorrhaging has occurred so intensely that there is bleeding within the ventricles, then the diagnosis is more serious dignosis than other types of hemorrhage. Intraventricular hemorrhage is usually comorbid with a brain injury that is quite intense and usually occurs with other types of hemorrhage simultaneously.

### Visualizing Some Intraventricular Hemorrhages

In [None]:
ivh_df = hem_df[hem_df["type_2"] > CERTAINTY]
show_random_img(ivh_df, train_img_root)

# Subarachnoid Hemorrhage

<img src="https://www.abclawcenters.com/wp-content/uploads/2017/12/Subarachnoid-hemorrhage.jpg" width="600">

Finally, the subarachnoid hemorrhage is one that occurs when blood enters the subarachnoid space below the *arachnoid mater*. Subarachnoid hemorrhages present many of the same dangers as the other hemorrhage types we've discussed. 

<img src="https://images.squarespace-cdn.com/content/v1/52ec8c1ae4b047ccc14d6f29/1466477429475-04CK480O2L6BZLZYYL2K/ke17ZwdGBToddI8pDm48kCFhql1qpbPINhl0TdZ0VLAUqsxRUqqbr1mOJYKfIPR7znJwASjrmwtS7aL_aQryCcm4T3hP7tXlp0Gk8RZ7lKFCRW4BPu10St3TBAUQYVKc_7AFIaSsJWjdIlc-Ia4oWsVN-veugZlBYs-W2_2BR9uMhz-Np6j6LToRdZYF-Jvb/subarachnoid-space.jpg" width="400">

### Visualizing Some Subarachnoidal Hemorrhages

In [None]:
sah_df = hem_df[hem_df["type_3"] > CERTAINTY]
show_random_img(sah_df, train_img_root)

# Conclusion

Well that's about it! This dataset provides images that contain these five types of hemorrhages and a labelling system to let you know which image contains each. The images are in Dicom format which is common for MRI or in this instance CT scanning imagery. In other notebooks I'll go through and attempt to do some segmentation and analysis of these images, but in general this notebook should give a good overview of the anatomical considerations that might be necessary to get people started on this dataset.