# Workshop In Data Science

Project made by:
- Ofir Paz
- Netanel Ulitszky

For the course ["Workshop in Data Science"](https://www.openu.ac.il/courses/20936.htm) (20936) at The Open University of Israel.

Personal links:
<style>
table {
    border-collapse: collapse;
    width: 50%;
    margin-left: auto;
    margin-right: auto;
}
td {
    padding: 6px;
    text-align: center;
    border-bottom: 1px solid #DDD;
}

tr:hover {background-color: #333333;}
</style>
<table>
    <tr>
        <td>Ofir Paz</td>
        <td>Netanel Ulitszky</td>
    </tr>
    <tr>
        <td><a href="https://www.linkedin.com/in/ofir-paz">Linkedin</a></td>
        <td><a href="https://www.linkedin.com/in/netanel-ulitszky">Linkedin</a></td>
    </tr>
    <tr>
        <td><a href="https://www.github.com/ofir-paz">Github</a></td>
        <td><a href="https://github.com/netane54544">Github</a></td>
    </tr>
    <tr>
        <td><a href="https://www.kaggle.com/ofirpaz">Kaggle</a></td>
        <td><a href="https://www.kaggle.com/netanelulitszky">Kaggle</a></td>
    </tr>
</table>

<!-- style copy paste -->

<!--
|| (ℹ️) Note
||  Content
 -->
<style>
.note-box {
    border-left: 4px solid #0078D4;
    border-image: 1;
    padding-left: 10px;
    border-radius: 0px;
    padding-top: 2px;
    padding-bottom: 2px;
}
.note-header {
    display: flex;
    align-items: center;
    padding-bottom: 0.3em;
}
.note-icon {
    font-size: 14px;
    color: #0078D4;
    background-color: #1E1E1E;
    border: 2px solid #0078D4;
    border-radius: 100%;
    width: 16px;
    height: 16px;
    display: flex;
    align-items: center;
    justify-content: center;
    margin-right: 8px;
}
.note {
    color: #0078D4;
    font-weight: bold;
}
.note-content {
}
</style>

<!-- Usage Example

<blockquote class="note-box">
 <div class="note-header">
  <span class="note-icon">ℹ️</span><strong class="note">Note</strong> 
 </div>
  <span class="note-content">Hello, this is a beautifully styled note with rounded corners.</span>
</blockquote>

 -->
<!-- End Note -->

## Table Of Contents

<style>
ol {
    list-style-position: inside;  /* Move the list marker inside the padding */
    padding-left: 2em;  /* Add padding to create the effect of a tab */
}
</style>
<details>
 <summary>1. <a href="#introduction">Introduction</a></summary>
 
 1. [Background](#background) 
 2. [The Competition](#the-competition)
</details>
<details>
 <summary>2. <a href="#exploratory-data-analysis">Exploratory Data Analysis</a></summary>

 1. [Imports](#imports)
 2. [MRI Imaging](#mri-imaging)
    - [About MRI](#about-mri)
    - [MRI Sequences](#mri-sequences)
    - [DICOM Files](#dicom-files)
 3. [Data Layout](#data-layout)
 4. [Data Analysis](#data-analysis)
</details>

## Introduction

### Background

We are Ofir Paz and Netanel Ulitzky, two students from The Open University of Israel, are excited to present our project in the course "Workshop in Data Science" (20936).

We are both very interested in the field of data science and are passionated about the endless possibilities it offers, so choosing this course was a natural choice for us. But in the other hand, choosing the topic for our project was a bit more challenging for the same reason. We wanted to choose a topic that will be interesting, challenging, relevant and useful with big potential for *exploratory data analysis* (EDA). After a long and thorough search, we have stumbled upon the ["Kaggle"](https://www.kaggle.com) platform and found the ["RSNA 2024 Lumbar Spine Degenerative Classification"](https://www.kaggle.com/competitions/rsna-2024-lumbar-spine-degenerative-classification) competition.

### The Competition

At a medical level, the competition is about helping radiologists to diagnose and classify spine degenerative conditions in a more accurate and efficient way. This can help to improve the quality of life for patients with spine degenerative conditions and to reduce the time and cost of the diagnosis process.

The problem is that the current diagnosis process is very manual and time-consuming, and it requires a lot of expertise and experience from the radiologist. The radiologist needs to analyze the [MRI](#mri-imaging) images of the patient's lumbar spine and to classify the different spine degenerative conditions and their severity. This process is very subjective and can be affected by the radiologist's experience, knowledge, and mood.

At a technical specification level, the competition is about classifying lumbar spine [MRI](#mri-imaging) images into 5 spine degenerative conditions:
1. Left Neural Foraminal Narrowing
2. Right Neural Foraminal Narrowing
3. Left Subarticular Stenosis
4. Right Subarticular Stenosis
5. Spinal Canal Stenosis

Notice that the conditions are divided into two types:
- [Neural Foraminal Narrowing](https://my.clevelandclinic.org/health/diseases/24856-foraminal-stenosis)
- [Stenosis](https://en.wikipedia.org/wiki/Spinal_stenosis)

You can read more about the conditions in the links above.

From those conditions, we need to classify **where** the condition is visible between the different spine disc levels: 
1. L1-L2
2. L2-L3
3. L3-L4
4. L4-L5
5. L5-S1

With that, we need to provide a severity score for the condition: 
1. Normal/Mild
2. Moderate
3. Severe

So, for each patient we need to provide a classification for the condition severity for each condition and spinal disc level, therefore we need to provide 25 classifications with 3 classes each.

## Exploratory Data Analysis

As we are given a very complicated dataset that is medically related, we need to understand the data before we can start working with it. In this section, we will explain the data, the MRI imaging, and the data layout.

We will also uncover various statictics about that data through data analysis.

But first, we need to import the python libraries that we will use in this section.

### Imports

In [None]:
%reload_ext autoreload
%autoreload 2

# Built-in modules.
import os

# Third-party modules.
import numpy as np
import pydicom
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from pathlib import Path
from typing import List, Tuple, Dict, Any, Union, Optional

# Custom modules.
import src.config as cfg
from src.plots import (
    plot_pixel_array,
    plot_dicom_series
)

%matplotlib inline

### MRI Imaging

#### About MRI

Magnetic Resonance Imaging ([MRI](https://en.wikipedia.org/wiki/Magnetic_resonance_imaging)) is a medical imaging technique used in radiology to form pictures of the anatomy and the physiological processes of the body. MRI scanners use strong magnetic fields, magnetic field gradients, and radio waves to generate images of the organs in the body.

MRI imaging is a very powerful tool in the medical field, as it can provide detailed images of the body's organs and tissues, and it can help to diagnose a wide range of medical conditions, including spine degenerative conditions.

#### MRI Sequences

MRI imaging can be used to scan different parts of the body, including the lumbar spine. When generating an MRI image, the radiologist needs to set the sequence of the MRI scanner to generate the desired image. The sequence in an MRI scan refers to the specific set of parameters used to generate the image, which affects the visual appearance of the image and the information it provides.
<style>
.note-box {
    border-left: 4px solid #0078D4;
    border-image: 1;
    padding-left: 10px;
    border-radius: 0px;
    padding-top: 2px;
    padding-bottom: 2px;
}
.note-header {
    display: flex;
    align-items: center;
    padding-bottom: 0.3em;
}
.note-icon {
    font-size: 14px;
    color: #0078D4;
    background-color: #1E1E1E;
    border: 2px solid #0078D4;
    border-radius: 100%;
    width: 16px;
    height: 16px;
    display: flex;
    align-items: center;
    justify-content: center;
    margin-right: 8px;
}
.note {
    color: #0078D4;
    font-weight: bold;
}
.note-content {
}
</style>

<blockquote class="note-box">
 <div class="note-header">
  <span class="note-icon">ℹ️</span><strong class="note">Note</strong> 
 </div>
  <span class="note-content">For further reading about MRI sequences and the specifics of the parameters used to configure one, you can read <a href="https://radiopaedia.org/articles/mri-sequence-parameters?lang=us">this article</a>.</span>
</blockquote>

Our data contains MRI images with the following sequences:

1. **Sagittal T2/STIR**
    - Sagittal Plane: This refers to the plane of imaging that divides the body into left and right halves. When imaging is done in the sagittal plane, the images show slices of the body from the side.
    - T2-weighted (T2): T2-weighted imaging highlights fluid and edema. In T2 images, fluids appear bright, and soft tissue contrast is well-differentiated. It is particularly useful for detecting abnormalities in tissues with high water content, such as inflammation, cysts, or tumors.
    - STIR (Short Tau Inversion Recovery): STIR is a special type of T2-weighted sequence that suppresses the fat signal, making it easier to identify areas of edema or inflammation. It's commonly used in musculoskeletal imaging to assess conditions like bone marrow edema, soft tissue injuries, and other pathologies where fat suppression is important.
2. **Sagittal T1**
    - Sagittal Plane: As mentioned above, the sagittal plane divides the body into left and right sections.
    - T1-weighted (T1): T1-weighted imaging provides good anatomical detail, particularly of fat-containing structures, which appear bright in T1 images. T1-weighted sequences are useful for evaluating anatomical structures, the integrity of tissues, and the presence of fatty lesions. T1 images are often used in conjunction with T2 images for a more comprehensive assessment of tissue contrast.
3. **Axial T2**
    - Axial Plane: The axial plane slices the body horizontally, dividing it into upper (superior) and lower (inferior) parts. Axial images are viewed as if looking from the feet upwards.
    - T2-weighted (T2): As with sagittal T2, axial T2-weighted imaging highlights fluid and is useful for detecting pathology in cross-sectional views. Axial T2 images are often used to evaluate the spinal cord, intervertebral discs, and surrounding soft tissues, as well as to assess brain and abdominal structures.

So, to sum things up:
- Sagittal T2/STIR: Side view images with emphasis on fluid and inflammation, with fat signal suppression.
- Sagittal T1: Side view images focusing on detailed anatomical structures with fat appearing bright.
- Axial T2: Horizontal cross-sectional images highlighting fluid and pathology in the body.

Before continuing to explore exactly what the data contains, we need to understand the format of the MRI images.

#### Dicom Files

Dicom (Digital Imaging and Communications in Medicine) is the standard format used to store and transmit medical images, such as MRI images. Dicom files contain both the image data and the metadata associated with the image, such as the patient information, the imaging parameters, and the image acquisition details.

MRI images are stored in Dicom files, which have the file extension ".dcm". Each Dicom file contains a single MRI image, along with the metadata associated with the image. The metadata in a Dicom file can provide valuable information about the image, such as the patient's name, the imaging parameters, and the image acquisition details. We can use this metadata to gain further insights into the MRI images and to understand the characteristics of the lumbar spine degenerative conditions.

To load and visualize the Dicom files, we have chosen to use the [pydicom](https://pydicom.github.io/pydicom/stable/index.html#) library, which is a Python library for working with Dicom files. This library provides functions to read and write Dicom files, as well as to extract the metadata from the Dicom files.

In the next code block, we will load a sample Dicom file from the dataset and visualize the MRI image and the metadata associated with the image.

In [None]:
example_dicom_path = cfg.TRAIN_IMAGES_PATH / cfg.EXAMPLE_ID / cfg.EXAMPLE_SAGITTAL_T1_ID / "1.dcm"
example_dicom = pydicom.dcmread(example_dicom_path)
example_dicom

To get the MRI image from the Dicom file, we will use the `pixel_array` attribute of the Dicom file, which contains the pixel data of the image. We will also extract the metadata from the Dicom file and display it to gain insights into the image acquisition details.

In [None]:
plot_pixel_array(example_dicom.pixel_array)

We can also further enhance the visualization of the MRI image by plotting the entire 3D sequence of images in a 3D plot. This can provide a more comprehensive view of the MRI sequence and help to visualize the spatial relationships between the images. For this, we will use [Plotly](https://plotly.com/python/), a Python library for interactive data visualization.

In [None]:
plot_dicom_series(example_dicom_path.parent)

As you can see, indeed the sagittal T1 sequence is a side view image focusing on detailed anatomical structures with fat appearing bright (you can see fat tissue at the right side, near the skin of the back). We will later use this visulization tool to understand how the conditions are visible in the MRI images.

### Data Layout

### Data Analysis