## Overview
Intracranial hemorrhage, bleeding that occurs inside the cranium, is a serious health problem requiring rapid and often intensive medical treatment. For example, intracranial hemorrhages account for approximately 10% of strokes in the U.S., where stroke is the fifth-leading cause of death. Identifying the location and type of any hemorrhage present is a critical step in treating the patient.

Diagnosis requires an urgent procedure. When a patient shows acute neurological symptoms such as severe headache or loss of consciousness, highly trained specialists review medical images of the patient’s cranium to look for the presence, location and type of hemorrhage. The process is complicated and often time consuming.

## Collaborators
Four research institutions provided large volumes of de-identified CT studies that were assembled to create the challenge dataset: Stanford University, Thomas Jefferson University, Unity Health Toronto and Universidade Federal de São Paulo (UNIFESP), The American Society of Neuroradiology (ASNR) organized a cadre of more than 60 volunteers to label over 25,000 exams for the challenge dataset. ASNR is the world’s leading organization for the future of neuroradiology representing more than 5,300 radiologists, researchers, interventionalists, and imaging scientists. MD.ai provided tooling and support for the data annotation process.

The RSNA is an international society of radiologists, medical physicists and other medical professionals with more than 54,000 members from 146 countries across the globe. They see the potential for AI to assist in detection and classification of hemorrhages in order to prioritize and expedite their clinical work.

## Hemorrhage Types
Hemorrhage in the head (intracranial hemorrhage) is a relatively common condition that has many causes ranging from trauma, stroke, aneurysm, vascular malformations, high blood pressure, illicit drugs and blood clotting disorders. The neurologic consequences also vary extensively depending upon the size, type of hemorrhage and location ranging from headache to death. The role of the Radiologist is to detect the hemorrhage, characterize the hemorrhage subtype, its size and to determine if the hemorrhage might be jeopardizing critical areas of the brain that might require immediate surgery.
While all acute (i.e. new) hemorrhages appear dense (i.e. white) on computed tomography (CT), the primary imaging features that help Radiologists determine the subtype of hemorrhage are the location, shape and proximity to other structures (see table).
Intraparenchymal hemorrhage is blood that is located completely within the brain itself; intraventricular or subarachnoid hemorrhage is blood that has leaked into the spaces of the brain that normally contain cerebrospinal fluid (the ventricles or subarachnoid cisterns). Extra-axial hemorrhages are blood that collects in the tissue coverings that surround the brain (e.g. subdural or epidural subtypes). ee figure.) Patients may exhibit more than one type of cerebral hemorrhage, which c may appear on the same image. While small hemorrhages are less morbid than large hemorrhages typically, even a small hemorrhage can lead to death because it is an indicator of another type of serious abnormality (e.g. cerebral aneurysm).

## Data Description
The training data is provided as a set of image Ids and multiple labels, one for each of five sub-types of hemorrhage, plus an additional label for any, which should always be true if any of the sub-type labels is true.
There is also a target column, Label, indicating the probability of whether that type of hemorrhage exists in the indicated image.
There will be 6 rows per image Id. The label indicated by a particular row will look like [Image Id]_[Sub-type Name], as follows:

1. Id,Label
2. 1_epidural_hemorrhage,0
3. 1_intraparenchymal_hemorrhage,0
4. 1_intraventricular_hemorrhage,0
5. 1_subarachnoid_hemorrhage,0.6
6. 1_subdural_hemorrhage,0
7. 1_any,0.9

## DICOM Images
All provided images are in DICOM format. DICOM images contain associated metadata. This will include PatientID, StudyInstanceUID, SeriesInstanceUID, and other features. You will notice some PatientIDs represented in both the stage 1 train and test sets. This is known and intentional. However, there will be no crossover of PatientIDs into stage 2 test. Additionally, per the rules, “Submission predictions must be based entirely on the pixel data in the provided datasets.” Therefore, you should not expect to use or gain advantage by use of this crossover in stage 1.

For this dataset we need to predict whether a hemorrhage exists in a given image, and what type it is.

## Files
1. stage_1_train.csv - the training set. Contains Ids and target information.
2. stage_1_train_images
3. stage_1_test_images

## Begin Here...

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
import scipy as sp

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
# for dirname, _, filenames in os.walk('/kaggle/input'):
#     for filename in filenames:
#         print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

## Load Data

In [2]:
train_data = pd.read_csv('/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train.csv')
print(train_data.head(10))

FileNotFoundError: [Errno 2] File b'/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train.csv' does not exist: b'/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train.csv'

### Splitting Data

In [3]:
splitData = train_data['ID'].str.split('_', expand = True)
train_data['class'] = splitData[2]
train_data['fileName'] = splitData[0] + '_' + splitData[1]
train_data = train_data.drop(columns=['ID'],axis=1)
del splitData
print(train_data.head(10))

NameError: name 'train_data' is not defined

In [4]:
pivot_train_data = train_data[['Label', 'fileName', 'class']].drop_duplicates().pivot_table(index = 'fileName',columns=['class'], values='Label')
pivot_train_data = pd.DataFrame(pivot_train_data.to_records())
print(pivot_train_data.head(10))

NameError: name 'train_data' is not defined

## Data Analysis

In [5]:
import matplotlib.image as pltimg
import pydicom

fig = plt.figure(figsize = (20,10))
rows = 5
columns = 5
trainImages = os.listdir('/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train_images/')
for i in range(rows*columns):
    ds = pydicom.dcmread('/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train_images/' + trainImages[i*100+1])
    fig.add_subplot(rows, columns, i+1)
    plt.imshow(ds.pixel_array, cmap=plt.cm.bone)
    fig.add_subplot

FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train_images/'

<Figure size 1440x720 with 0 Axes>

In [6]:
colsToPlot = ['any','epidural','intraparenchymal','intraventricular','subarachnoid','subdural']
rows = 5
columns = 5
for i_col in colsToPlot:
    fig = plt.figure(figsize = (20,10))
    trainImages = list(pivot_train_data.loc[pivot_train_data[i_col]==1,'fileName'])
    plt.title(i_col + ' Images')
    for i in range(rows*columns):
        ds = pydicom.dcmread('/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train_images/' + trainImages[i*100+1] +'.dcm')
        fig.add_subplot(rows, columns, i+1)
        plt.imshow(ds.pixel_array, cmap=plt.cm.bone)        
        fig.add_subplot

NameError: name 'pivot_train_data' is not defined

<Figure size 1440x720 with 0 Axes>

In [7]:
for i_col in colsToPlot:
    plt.figure()
    ax = sns.countplot(pivot_train_data[i_col])
    ax.set_title(i_col + ' class count')

NameError: name 'pivot_train_data' is not defined

<Figure size 432x288 with 0 Axes>

In [8]:
# dropping of corrupted image from dataset
pivot_train_data = pivot_train_data.drop(list(pivot_train_data['fileName']).index('ID_6431af929'))

NameError: name 'pivot_train_data' is not defined

## Training Dataset

In [9]:
import keras
from keras.layers import Dense, Activation,Dropout,Conv2D,MaxPooling2D,Flatten,Input,BatchNormalization,AveragePooling2D,LeakyReLU,ZeroPadding2D,Add
from keras.models import Sequential, Model
from keras.initializers import glorot_uniform
from tqdm import tqdm
from sklearn.model_selection import train_test_split
import cv2

pivot_train_data = pivot_train_data.sample(frac=1).reset_index(drop=True)
train_df,val_df = train_test_split(pivot_train_data,test_size = 0.03, random_state = 42)
batch_size = 64

Using TensorFlow backend.


NameError: name 'pivot_train_data' is not defined

In [10]:
y_train = train_df[['any','epidural','intraparenchymal','intraventricular','subarachnoid','subdural']]
y_val = val_df[['any','epidural','intraparenchymal','intraventricular','subarachnoid','subdural']]
train_files = list(train_df['fileName'])

def readDCMFile(fileName):
    ds = pydicom.read_file(fileName) # read dicom image
    img = ds.pixel_array # get image array
    img = cv2.resize(img, (64, 64), interpolation = cv2.INTER_AREA) 
    return img

def generateImageData(train_files,y_train):
    numBatches = int(np.ceil(len(train_files)/batch_size))
    while True:
        for i in range(numBatches):
            batchFiles = train_files[i*batch_size : (i+1)*batch_size]
            x_batch_data = np.array([readDCMFile('/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train_images/' + i_f +'.dcm') for i_f in tqdm(batchFiles)])
            y_batch_data = y_train[i*batch_size : (i+1)*batch_size]
            x_batch_data = np.reshape(x_batch_data,(x_batch_data.shape[0],x_batch_data.shape[1],x_batch_data.shape[2],1))            
            yield x_batch_data,y_batch_data
            
def generateTestImageData(test_files):
    numBatches = int(np.ceil(len(test_files)/batch_size))
    while True:
        for i in range(numBatches):
            batchFiles = test_files[i*batch_size : (i+1)*batch_size]
            x_batch_data = np.array([readDCMFile('/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_test_images/' + i_f +'.dcm') for i_f in tqdm(batchFiles)])
            x_batch_data = np.reshape(x_batch_data,(x_batch_data.shape[0],x_batch_data.shape[1],x_batch_data.shape[2],1))
            yield x_batch_data

NameError: name 'train_df' is not defined

In [11]:
dataGenerator = generateImageData(train_files,train_df[colsToPlot])
val_files = list(val_df['fileName'])
x_val = np.array([readDCMFile('/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_train_images/' + i_f +'.dcm') for i_f in tqdm(val_files)])

NameError: name 'generateImageData' is not defined

In [12]:
y_val = val_df[colsToPlot]

NameError: name 'val_df' is not defined

### Loss Function Definition

In [13]:
# loss function definition courtesy https://www.kaggle.com/akensert/resnet50-keras-baseline-model
from keras import backend as K
def logloss(y_true,y_pred):      
    eps = K.epsilon()
    class_weights = np.array([2., 1., 1., 1., 1., 1.])
    y_pred = K.clip(y_pred, eps, 1.0-eps)

    #compute logloss function (vectorised)  
    out = -( y_true *K.log(y_pred)*class_weights
            + (1.0 - y_true) * K.log(1.0 - y_pred)*class_weights)
    return K.mean(out, axis=-1)

def _normalized_weighted_average(arr, weights=None):
    """
    A simple Keras implementation that mimics that of 
    numpy.average(), specifically for the this competition
    """
    
    if weights is not None:
        scl = K.sum(weights)
        weights = K.expand_dims(weights, axis=1)
        return K.sum(K.dot(arr, weights), axis=1) / scl
    return K.mean(arr, axis=1)

def weighted_loss(y_true, y_pred):
    """
    Will be used as the metric in model.compile()
    ---------------------------------------------
    
    Similar to the custom loss function 'weighted_log_loss()' above
    but with normalized weights, which should be very similar 
    to the official competition metric:
        https://www.kaggle.com/kambarakun/lb-probe-weights-n-of-positives-scoring
    and hence:
        sklearn.metrics.log_loss with sample weights
    """      
    
    eps = K.epsilon()
    class_weights = K.variable([2., 1., 1., 1., 1., 1.])
    y_pred = K.clip(y_pred, eps, 1.0-eps)
    loss = -(y_true*K.log(y_pred)
            + (1.0 - y_true) * K.log(1.0 - y_pred))
    loss_samples = _normalized_weighted_average(loss,class_weights)
    return K.mean(loss_samples)

### Defining Convolutional and Identity Block

In [14]:
def convolutionBlock(X,f,filters,stage,block,s):
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'
    
    X_shortcut = X
    F1,F2,F3 = filters
    X = Conv2D(filters = F1, kernel_size = (1,1),strides = s, padding = 'valid',name = conv_name_base + '2a',
               kernel_initializer=glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3,momentum=0.99, epsilon=0.001,name = bn_name_base+'2a')(X)
    X = Activation('relu')(X)
    
    X = Conv2D(filters = F2, kernel_size = (f,f),strides = 1, padding = 'same',name = conv_name_base + '2b',
               kernel_initializer=glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3,momentum=0.99, epsilon=0.001,name = bn_name_base+'2b')(X)
    X = Activation('relu')(X)
    
    X = Conv2D(filters = F3, kernel_size = (1,1),strides = 1, padding = 'valid',name = conv_name_base + '2c',
               kernel_initializer=glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3,momentum=0.99, epsilon=0.001,name = bn_name_base+'2c')(X)

    X_shortcut = Conv2D(filters = F3, kernel_size = (1,1),strides = s, padding = 'valid',name = conv_name_base + '1',
               kernel_initializer=glorot_uniform(seed=0))(X_shortcut)
    X_shortcut = BatchNormalization(axis = 3,momentum=0.99, epsilon=0.001,name = bn_name_base+'1')(X_shortcut)
    
    X = Add()([X,X_shortcut])
    X = Activation('relu')(X)
    
    return X

def identityBlock(X,f,filters,stage,block):
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'
    
    X_shortcut = X
    F1,F2,F3 = filters
    X = Conv2D(filters = F1, kernel_size = (1,1),strides = 1, padding = 'valid',name = conv_name_base + '2a',
               kernel_initializer=glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3,momentum=0.99, epsilon=0.001,name = bn_name_base+'2a')(X)
    X = Activation('relu')(X)
    
    X = Conv2D(filters = F2, kernel_size = (f,f),strides = 1, padding = 'same',name = conv_name_base + '2b',
               kernel_initializer=glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3,momentum=0.99, epsilon=0.001,name = bn_name_base+'2b')(X)
    X = Activation('relu')(X)
    
    X = Conv2D(filters = F3, kernel_size = (1,1),strides = 1, padding = 'valid',name = conv_name_base + '2c',
               kernel_initializer=glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis = 3,momentum=0.99, epsilon=0.001,name = bn_name_base+'2c')(X)
    
    X = Add()([X,X_shortcut])
    X = Activation('relu')(X)
    
    return X

## Convolutional Neural Network

In [15]:
input_img = Input((64,64,1))
X = Conv2D(filters=3, kernel_size=(1, 1), strides=(1, 1), name="initial_conv2d")(input_img)
X = BatchNormalization(axis=3, name='initial_bn')(X)
X = Activation('relu', name='initial_relu')(X)
X = ZeroPadding2D((3, 3))(X)

# Stage 1
X = Conv2D(64, (7, 7), strides=(2, 2), name='conv1', kernel_initializer=glorot_uniform(seed=0))(X)
X = BatchNormalization(axis=3, name='bn_conv1')(X)
X = Activation('relu')(X)
X = MaxPooling2D((3, 3), strides=(2, 2))(X)

# Stage 2
X = convolutionBlock(X, f=3, filters=[64, 64, 256], stage=2, block='a', s=1)
X = identityBlock(X, 3, [64, 64, 256], stage=2, block='b')
X = identityBlock(X, 3, [64, 64, 256], stage=2, block='c')

# Stage 3 (≈4 lines)
X = convolutionBlock(X, f=3, filters=[128, 128, 512], stage=3, block='a', s=2)
X = identityBlock(X, 3, [128, 128, 512], stage=3, block='b')
X = identityBlock(X, 3, [128, 128, 512], stage=3, block='c')
X = identityBlock(X, 3, [128, 128, 512], stage=3, block='d')

# Stage 4 (≈4 lines)
X = convolutionBlock(X, f=3, filters=[256, 256, 1024], stage=4, block='a', s=2)
X = identityBlock(X, 3, [256, 256, 1024], stage=4, block='b')
X = identityBlock(X, 3, [256, 256, 1024], stage=4, block='c')
X = identityBlock(X, 3, [256, 256, 1024], stage=4, block='d')
X = identityBlock(X, 3, [256, 256, 1024], stage=4, block='e')
X = identityBlock(X, 3, [256, 256, 1024], stage=4, block='f')

# Stage 5 (≈4 lines)
X = convolutionBlock(X, f=3, filters=[512, 512, 2048], stage=5, block='a', s=2)
X = identityBlock(X, 3, [512, 512, 2048], stage=5, block='b')
X = identityBlock(X, 3, [512, 512, 2048], stage=5, block='c')


# AVGPOOL (≈1 line). Use "X = AveragePooling2D(...)(X)"
X = AveragePooling2D(pool_size=(2, 2), padding='same')(X)
# output layer
X = Flatten()(X)
out = Dense(6,name='fc' + str(6),activation='sigmoid')(X)

### Reshape

In [16]:
x_val = np.reshape(x_val,(x_val.shape[0],x_val.shape[1],x_val.shape[2],1))

NameError: name 'x_val' is not defined

## Model

In [17]:
model_conv = Model(inputs = input_img, outputs = out)
#model_conv.compile(optimizer='Adam',loss = 'categorical_crossentropy',metrics=['accuracy'])
model_conv.compile(optimizer='Adam',loss = logloss,metrics=[weighted_loss])
model_conv.summary()
history_conv = model_conv.fit_generator(dataGenerator,steps_per_epoch=500, epochs=20,validation_data = (x_val,y_val),verbose = False)

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 64, 64, 1)    0                                            
__________________________________________________________________________________________________
initial_conv2d (Conv2D)         (None, 64, 64, 3)    6           input_1[0][0]                    
__________________________________________________________________________________________________
initial_bn (BatchNormalization) (None, 64, 64, 3)    12          initial_conv2d[0][0]             
__________________________________________________________________________________________________
initial_relu (Activation)       (None, 64, 64, 3)    0           initial_bn[0][0]                 
____________________________________________________________________________________________

NameError: name 'dataGenerator' is not defined

## Organizing data for Test Data

In [18]:
test_data = pd.read_csv('/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_sample_submission.csv')
splitData = test_data['ID'].str.split('_', expand = True)
test_data['class'] = splitData[2]
test_data['fileName'] = splitData[0] + '_' + splitData[1]
test_data = test_data.drop(columns=['ID'],axis=1)
del splitData
pivot_test_data = test_data[['fileName','class','Label']].drop_duplicates().pivot_table(index = 'fileName',columns=['class'], values='Label')
pivot_test_data = pd.DataFrame(pivot_test_data.to_records())
test_files = list(pivot_test_data['fileName'])
testDataGenerator = generateTestImageData(test_files)
temp_pred = model_conv.predict_generator(testDataGenerator,steps = pivot_test_data.shape[0]/batch_size,verbose = True)

FileNotFoundError: [Errno 2] File b'/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_sample_submission.csv' does not exist: b'/kaggle/input/rsna-intracranial-hemorrhage-detection/stage_1_sample_submission.csv'

In [19]:
temp_pred.shape

NameError: name 'temp_pred' is not defined

### Prediction

In [20]:
submission_df = pivot_test_data
submission_df['any'] = temp_pred[:,0]
submission_df['epidural'] = temp_pred[:,1]
submission_df['intraparenchymal'] = temp_pred[:,2]
submission_df['intraventricular'] = temp_pred[:,3]
submission_df['subarachnoid'] = temp_pred[:,4]
submission_df['subdural'] = temp_pred[:,5]

NameError: name 'pivot_test_data' is not defined

In [21]:
submission_df = submission_df.melt(id_vars=['fileName'])
submission_df['ID'] = submission_df.fileName + '_' + submission_df.variable
submission_df['Label'] = submission_df['value']
print(submission_df.head(20))

NameError: name 'submission_df' is not defined

In [22]:
submission_df = submission_df.drop(['fileName','variable','value'],axis = 1)
print(submission_df.head(20))

NameError: name 'submission_df' is not defined

In [23]:
submission_df.to_csv('submission.csv', index=False)

NameError: name 'submission_df' is not defined