<img src="./images/logo_hackathon.png" width="100%" align="left" />

<table style="float:right;">
    <tr>
        <td>                      
            <div style="text-align: right">Dr Ali Sarrami Foroushani</div>
            <div style="text-align: right">Lecturer in Cardiovascular Biomechanics</div>
            <div style="text-align: right">School of Health Sciences</div>
            <div style="text-align: right">University of Manchester</div>
         </td>
     </tr>
</table>

# Hackathon 01: Analysing Brain Aneurysm Rupture Data
Welcome to Hackathon 01 of the Programing for Health Data Science Unit. In this Hackathon, we perform analysis on a dataset of aneurysms. We will read the data, explore it, and generate insights about aneurysms rupture. 

A brain aneurysm, also known as a cerebral or intracranial aneurysm, is a bulge or ballooning in a blood vessel in the brain (see Figure 1). This condition can lead to serious complications if the aneurysm ruptures, potentially causing haemorrhagic stroke or other neurological damage. 

<div style="text-align: center;">
    <img src="./images/aneurysm.jpg" width="15%" style="display: inline-block;"/>
    <p style="text-align: center;"><strong>Figure 1:</strong> Brain Aneurysm <a href="https://www.uhsussex.nhs.uk/resources/brain-aneurysm-2/" target="_blank">(source)</a>.</p>
</div>

Understanding the risk factors and characteristics associated with ruptured aneurysms is critical for improving patient outcomes and developing preventive strategies. Common locations of aneurysms include:

Internal Carotid Artery (ICA): Supplies blood to the brain. Specific ICA segments include: ICA Ophthalmic (ICA Oph): Supplies the eyes. ICA Cavernous (ICA Cav): Passes through the cavernous sinus. ICA Tip: Bifurcation point into anterior and middle cerebral arteries.
Middle Cerebral Artery (MCA): Supplies the lateral cerebral hemisphere.
Vertebral Artery (VA): Part of the vertebrobasilar system, supplying the brain's posterior circulation.
Basilar Artery (BA): Formed by the union of the vertebral arteries, supplying the brainstem and cerebellum.
Anterior Communicating Artery (AComA): Connects the two anterior cerebral arteries and is a common site for aneurysms.

In this analysis, we will work with an open source dataset called AneuX. More information about the dataset can be found <a href="https://zenodo.org/records/6678442" target="_blank">here</a>. 

This dataset contains information about 750 aneurysms, focusing on attributes such as location, size, and various indices related to their characteristics.

You do not need to download the data. The data has been provided in a CSV file provided in Hackathon 01 folder (data_aneurysms.csv)

<div style="text-align: center;">
    <img src="./images/aneurysm_sizing.png" width="25%" style="display: inline-block;"/>
    <p style="text-align: center;"><strong>Figure 2:</strong> Brain Aneurysm Sizing. </p>
</div>

The data we will be using includes:
- **Patient ID:** Unique identifier for each patient
- **Status:** Rupture status (ruptured or unruptured)
- **Location:** Location of the aneurysm in the brain
- **Side:** Left or right side of the brain
- **Gender:** Gender of the patient
- **Age:** Age of the patient
- **AR:** Aspect ratio of the aneurysm (see Figure 2)
- **NSI:** Non-sphericity index of the aneurysm
- **Dmax:** Maximum diameter of the aneurysm (see Figure 2)
- **Dn:** Normal diameter of the neck of the aneurysm (see Figure 2)
- **H:** Height of the aneurysm (see Figure 2)

Through this analysis, we aim to identify patterns and insights that can contribute to a better understanding of aneurysms and their rupture.

## Task 1: Read the data
Let's start by importing the necessary libraries and reading the CSV file containing the aneurysm data. In this task, we will load the dataset `data_aneurysms.csv` into a Pandas DataFrame. This will allow us to manipulate and analyse the data effectively. Use the `pd.read_csv()` function to read the CSV file.

In [2]:
import pandas as pd

# Read the data
df = pd.read_csv('./data/data_aneurysms.csv')
df.head()

Unnamed: 0,Patient id,Status,Location,Side,Gender,Age,AR,NSI,Dmax,Dn,H
0,p001,unruptured,ICA oph,left,female,64.2,0.876819,0.104587,6.22176,4.89235,4.28971
1,p002,unruptured,VA V4,left,female,72.7,0.991997,0.091552,5.62359,4.40747,4.3722
2,p003,unruptured,MCA bif,right,male,50.9,1.07643,0.168756,14.7354,7.97337,8.5828
3,p004,ruptured,ICA pcom,left,female,47.3,2.61395,0.256854,15.9934,5.04588,13.1897
4,p005,unruptured,ICA oph,right,female,47.3,1.65683,0.209242,4.87997,3.18561,5.27801


## Task 2: Print how many aneurysms are in this dataset
We will count the number of rows in the dataframe to determine the total number of aneurysms. Hint: You can use `df.shape` to do this.

In [None]:
# Count the number of aneurysms
# write your code here
icase
print(f'Total number of aneurysms: {num_aneurysms}')

## Task 3: Print how many missing values are in each column
We will check for missing values in the dataframe using the `isnull()` method and `sum()`.

In [None]:
# Check for missing values
# write your code here
print('Missing values in each column:')
print(missing_values)

## Task 4: Extract all data whose Location starts with ICA
ICA aneurysms are one of the most prevalent type of brain aneurysms. Lets first focus on analysing this type of aneurysms in our dataset. We will filter the dataframe to extract rows where the 'Location' column starts with 'ICA'. Hint: You can use `df[df['Location'].str.startswith('ICA')]` to do this.

In [None]:
# Extract ICA cases
# write your code here
ica_cases.head()

## Task 5: Save ICA aneurysms in a new CSV file
Save the extracted ICA aneurysm data into a new CSV file named 'data_aneurysms_ICA.csv'.

In [None]:
# Save ICA cases to CSV
ica_cases.to_csv('data_aneurysms_ICA.csv', index=False)

## Task 6: Print how many cases are ICA
Count the number of rows in the ICA cases dataframe to find out how many cases are in it.

In [None]:
# Count ICA cases
# write your code here
print(f'Total number of ICA cases: {num_ica_cases}')

## Task 7: Print how many missing values in each column for ICA cases
Check for missing values specifically in the ICA cases dataframe. Print how many missing values are in each column for ICA cases.

In [None]:
# Check for missing values in ICA cases
# write your code here
print('Missing values in each column for ICA cases:')
print(missing_values_ica)

## Task 8: Analyse the rupture status for ICA cases
Calculate the number and percentage of ruptured and unruptured cases in the status column.

In [None]:
# Analyze status for ICA cases
# write your code here

print(f'Number of Ruptured cases: {ruptured} ({ruptured_percentage:.2f}%)')
print(f'Number of Unruptured cases: {unruptured} ({unruptured_percentage:.2f}%)')

## Task 9: Analyse side column for ICA cases
Calculate the number and percentage of cases on the left and right side.

In [None]:
# Analyze side for ICA cases
# write your code here

print(f'Number of Left side cases: {left} ({left_percentage:.2f}%)')
print(f'Number of Right side cases: {right} ({right_percentage:.2f}%)')

## Task 10: Analyse gender column for ICA cases
Calculate the number and percentage of male and female cases.

In [None]:
# Analyze gender for ICA cases
# write your code here

print(f'Number of Female cases: {female} ({female_percentage:.2f}%)')
print(f'Number of Male cases: {male} ({male_percentage:.2f}%)')

## Task 11: Summary statistics for Age
Calculate and print the mean, median, standard deviation, minimum, and maximum for the Age column.

In [None]:
# Statistics for Age
# you can simply use age_stats = ica_cases['Age'].describe() or
# write your code here
print('Statistics for Age:')
print(age_stats)

## Task 12: Summary statistics for Aspect Ratio (AR)
Calculate and print the mean, median, standard deviation, minimum, and maximum for the AR column.

In [None]:
# Statistics for AR
# write your code here
print('Statistics for AR:')
print(ar_stats)

## Task 13: Summary statistics for Non-sphericity Index (NSI)
Calculate and print the mean, median, standard deviation, minimum, and maximum for the NSI column.

In [None]:
# Statistics for NSI
# write your code here
print('Statistics for NSI:')
print(nsi_stats)

## Task 14: Summary statistics for Maxmimum Aneurysm Diameter
Calculate and print the mean, median, standard deviation, minimum, and maximum for the Dmax column.

In [None]:
# Statistics for Dmax
# write your code here
print('Statistics for Dmax:')
print(dmax_stats)

## Task 15: Summary statistics for Dn
We will calculate and print the mean, median, standard deviation, minimum, and maximum for the Dn column.

In [None]:
# Statistics for Dn
# write your code here
print('Statistics for Dn:')
print(dn_stats)

## Task 16: Summary statistics for H
We will calculate and print the mean, median, standard deviation, minimum, and maximum for the H column.

In [None]:
# Statistics for H
# write your code here
print('Statistics for H:')
print(h_stats)

## Task 17: Handle missing values in ICA cases
Decide on the best way to handle missing values in each column and create a new dataset without missing values. Hint: fill missing numeric values with the mean and drop missing categorical values. Use finctions like `ica_cases.fillna({'Age': ica_cases['Age'].mean()})` and `ica_cases_no_missing.dropna(subset=['Status', 'Location', 'Side', 'Gender'])`.

In [None]:
# Handle missing values
# For demonstration, we will fill missing numeric values with the mean and drop missing categorical values.

# write your code here

# Save the new dataset without missing values
ica_cases_no_missing.to_csv('data_aneurysms_ICA_noMissing.csv', index=False)

## Task 18: Analyse status column for ICA cases with no missing values
Now that you have managed the missing values, you have different number of cases in your data. So, re-calculate the number and percentage of ruptured and unruptured cases from the dataset with no missing values.

In [None]:
# Analyze status for ICA cases no missing values
# write your code here

print(f'Number of Ruptured cases (no missing): {ruptured_no_missing} ({ruptured_percentage_no_missing:.2f}%)')
print(f'Number of Unruptured cases (no missing): {unruptured_no_missing} ({unruptured_percentage_no_missing:.2f}%)')

## Task 19: Analyse side column for ICA cases with no missing values
Now that you have managed the missing values, you have different number of cases in your data. So, re-calculate the number and percentage of cases on the left and right side from the dataset with no missing values.

In [None]:
# Analyze side for ICA cases no missing values
# write your code here

print(f'Number of Left side cases (no missing): {left_no_missing} ({left_percentage_no_missing:.2f}%)')
print(f'Number of Right side cases (no missing): {right_no_missing} ({right_percentage_no_missing:.2f}%)')

## Task 20: Analyse gender column for ICA cases with no missing values
Now that you have managed the missing values, you have different number of cases in your data. So, re-calculate the number and percentage of male and female cases from the dataset with no missing values.

In [None]:
# Analyze gender for ICA cases no missing values
# write your code here

print(f'Number of Female cases (no missing): {female_no_missing} ({female_percentage_no_missing:.2f}%)')
print(f'Number of Male cases (no missing): {male_no_missing} ({male_percentage_no_missing:.2f}%)')

## Task 21: Summary statistics for Age from ICA cases with no missing values
Now that you have managed the missing values, you have different number of cases in your data. So, re-calculate and print the mean, median, standard deviation, minimum, and maximum for the Age column.

In [None]:
# Statistics for Age no missing values
# write your code here
print('Statistics for Age (no missing):')
print(age_stats_no_missing)

## Task 22: Summary statistics for AR from ICA cases with no missing values
Now that you have managed the missing values, you have different number of cases in your data. So, re-calculate and print the mean, median, standard deviation, minimum, and maximum for the AR column.

In [None]:
# Statistics for AR no missing values
# write your code here
print('Statistics for AR (no missing):')
print(ar_stats_no_missing)

## Task 23: Summary statistics for NSI from ICA cases with no missing values
Now that you have managed the missing values, you have different number of cases in your data. So, re-calculate and print the mean, median, standard deviation, minimum, and maximum for the NSI column.

In [None]:
# Statistics for NSI no missing values
# write your code here
print('Statistics for NSI (no missing):')
print(nsi_stats_no_missing)

## Task 24: Summary statistics for Dmax from ICA cases with no missing values
Now that you have managed the missing values, you have different number of cases in your data. So, re-calculate and print the mean, median, standard deviation, minimum, and maximum for the Dmax column.

In [None]:
# Statistics for Dmax no missing values
# write your code here
print('Statistics for Dmax (no missing):')
print(dmax_stats_no_missing)

## Task 25: Summary statistics for Dn from ICA cases with no missing values
Now that you have managed the missing values, you have different number of cases in your data. So, re-calculate and print the mean, median, standard deviation, minimum, and maximum for the Dn column.

In [None]:
# Statistics for Dn no missing values
# write your code here
print('Statistics for Dn (no missing):')
print(dn_stats_no_missing)

## Task 26: Summary statistics for H from ICA cases with no missing values
Now that you have managed the missing values, you have different number of cases in your data. So, re-calculate and print the mean, median, standard deviation, minimum, and maximum for the H column.

In [None]:
# Statistics for H no missing values
# write your code here
print('Statistics for H (no missing):')
print(h_stats_no_missing)

## Task 27: Generate histogram and density plots for Age
Generate histogram and density plots of Age for patients with ruptured and unruptured aneurysms, displaying them side-by-side. Overlay the plots for ruptured and unruptured aneurysms to visually compare the age distributions, which can be informative for understanding rupture risks. You can use different colours from the tab20c palette.

In [None]:
# Generate histogram and density plots for Age
import matplotlib.pyplot as plt
import seaborn as sns

# Separate cases with ruptured and unruptured aneurysms
ruptured_cases = ica_cases_no_missing[ica_cases_no_missing['Status'] == 'ruptured']
unruptured_cases = ica_cases_no_missing[ica_cases_no_missing['Status'] == 'unruptured']

# Set the colors
colors = sns.color_palette('tab20c')

# Create figure and axes for subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

# write your code here

# Show the plots
plt.tight_layout()
plt.show()

## Task 28: Generate histogram and density plots for AR
Generate histogram and density plots of Age for patients with ruptured and unruptured aneurysms, displaying them side-by-side. Overlay the plots for ruptured and unruptured aneurysms to visually compare the AR distributions, which can be informative for understanding rupture risks. You can use different colours from the tab20c palette.

In [None]:
# Generate histogram and density plots for AR
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

# write your code here

# Show the plots
plt.tight_layout()
plt.show()

## Task 29: Generate histogram and density plots for NSI
Generate histogram and density plots of Age for patients with ruptured and unruptured aneurysms, displaying them side-by-side. Overlay the plots for ruptured and unruptured aneurysms to visually compare the NSI distributions, which can be informative for understanding rupture risks. You can use different colours from the tab20c palette.

In [None]:
# Generate histogram and density plots for NSI
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

# write your code here

# Show the plots
plt.tight_layout()
plt.show()

## Task 30: Generate histogram and density plots for Dmax
Generate histogram and density plots of Age for patients with ruptured and unruptured aneurysms, displaying them side-by-side. Overlay the plots for ruptured and unruptured aneurysms to visually compare the Dmax distributions, which can be informative for understanding rupture risks. You can use different colours from the tab20c palette.

In [None]:
# Generate histogram and density plots for Dmax
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

# write your code here

# Show the plots
plt.tight_layout()
plt.show()

## Task 31: Generate histogram and density plots for Dn
Generate histogram and density plots of Age for patients with ruptured and unruptured aneurysms, displaying them side-by-side. Overlay the plots for ruptured and unruptured aneurysms to visually compare the Dn distributions, which can be informative for understanding rupture risks. You can use different colours from the tab20c palette.

In [None]:
# Generate histogram and density plots for Dn
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

# write your code here

# Show the plots
plt.tight_layout()
plt.show()

## Task 32: Generate histogram and density plots for H
Generate histogram and density plots of Age for patients with ruptured and unruptured aneurysms, displaying them side-by-side. Overlay the plots for ruptured and unruptured aneurysms to visually compare the H distributions, which can be informative for understanding rupture risks. You can use different colours from the tab20c palette.

In [None]:
# Generate histogram and density plots for H
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

# write your code here

# Show the plots
plt.tight_layout()
plt.show()

## Task 33: Generate heatmap of correlation for ICA cases without missing values
Generate a heatmap of correlations between all numerical data except the 'Patient id', 'Location', and 'Side' columns. First, we will convert 'Status' to numeric (1 for ruptured, 0 for unruptured) and 'Gender' to numeric (1 for female, 0 for male), then we will plot the heatmap.

In [None]:
# Drop unnecessary columns and convert 'Status' and 'Gender' to numeric
ica_clean_numeric = ica_cases_no_missing.drop(['Patient id', 'Location', 'Side'], axis=1)

ica_clean_numeric['Status'] = ica_clean_numeric['Status'].apply(lambda x: 1 if x == 'ruptured' else 0)

# write your code here - convert Gender to numeric

# Generate correlation matrix
# write your code here

# Plot heatmap
# write your code here

## Task 34: Generate heatmap of correlation between 'Status' and other features
Generate a heatmap to show the correlation between 'Status' (whether an aneurysm is ruptured or not) and all other numerical data, excluding 'Patient id', 'Location', and 'Side'. Again, 'Status' will be converted to numeric (1 for ruptured, 0 for unruptured) and 'Gender' will be converted to numeric (1 for female, 0 for male).

In [None]:
# Correlation matrix focusing on 'Status'
# write your code here

# Plot heatmap
# write your code here

## Task 35: Categorising Aneurysm Location and Handling Missing Values
So far we only looked into the ICA aneurysms. Let's go back and analyse the entire dataset. In this task, we will categorise the aneurysm location based on the 'Location' column. We'll create a new column 'Site' in the dataframe. If the aneurysm 'Location' contains the word 'ICA', the site will be categorised as 'A'. If it contains 'MCA', the site will be 'B'. If it contains 'VA', 'BA', or 'AComA', the site will be categorised as 'C'. For other locations, the 'Site' will be left empty.
Once we categorise the aneurysm locations, we will decide how to handle missing values (like we did before - filling missing values within the numerical data with the mean and dropping the missing categorical data) in each column and create a new dataset with no missing values.

In [None]:
# Categorise the aneurysm location into the 'Site' column
df['Site'] = df['Location'].apply(lambda loc: 'A' if 'ICA' in loc else ('B' if 'MCA' in loc else ('C' if 'VA' in loc or 'BA' in loc or 'AComA' in loc else '')))

# Check for missing values before handling
df.isnull().sum()

# Handling missing values
# We need to decide how to handle the missing values depending on the type of column
# Numerical columns can be imputed using mean, median, or other methods
# Categorical columns (like 'Site' and 'Gender') can use mode or other strategies

# For numerical columns, we'll use mean imputation
num_columns = ['Age', 'AR', 'NSI', 'Dmax', 'Dn', 'H']
for col in num_columns:
    df[col].fillna(df[col].mean(), inplace=True)

# For categorical columns like 'Site' and 'Gender', we'll use the mode (most frequent value)
df['Site'].fillna(df['Site'].mode()[0], inplace=True)

# write your code here -  do the same for Gender

# Verify if there are any missing values left
df.isnull().sum()

# Save the cleaned dataset with no missing values
df.to_csv('data_aneurysms_all_noMissing.csv', index=False)

# Display a preview of the cleaned data
df.head()

## Task 36: Calculate PHASES Score for Rupture Prediction
In this task, we will calculate the PHASES score for each patient. The PHASES score is a clinical tool used to assess the risk of rupture in patients with intracranial aneurysms (see this [reference](https://www.sciencedirect.com/science/article/pii/S1474442213702631?via%3Dihub) for more about PHASES). It takes into account several key factors, including Patient Ethnicity, Age, Hypertension, Size of Aneurysm, Location of Aneurysm, and History of Previous Bleeding. Each factor contributes to the overall score, which helps in stratifying patients based on their risk levels. Although our dataset lacks information on ethnicity, hypertension, and bleeding history, we can still calculate a modified PHASES score using the available attributes to better understand the potential risks associated with aneurysm rupture. Thus, our PHASES score is based on the following criteria:
- **Site Score**: If 'Site' = A, score = 0; If 'Site' = B, score = 2; If 'Site' = C, score = 4.
- **Age Score**: If 'Age' < 70, score = 0; If 'Age' >= 70, score = 1.
- **Size Score**: Based on 'Dmax'.
   - If 'Dmax' < 7, score = 0.
   - If 'Dmax' >= 7 but < 10, score = 3.
   - If 'Dmax' >= 10 but < 20, score = 6.
   - If 'Dmax' >= 20, score = 10.
The final **PHASES** score is the sum of the 'Site Score', 'Age Score', and 'Size Score'. Create a new column called PHASES, then calculate and store PHASES scores in there.

In [None]:
# Load the cleaned data
df_clean = pd.read_csv('data_aneurysms_all_noMissing.csv')

# Create Site Score column

# write your code here

# Create Age Score column

# write your code here

# Create Size Score column based on Dmax

# write your code here - write a finction called size_score that returns the size scores

df_clean['Size Score'] = df_clean['Dmax'].apply(size_score)

# Calculate PHASES score

# write your code here

# Preview the data with the new columns
df_clean[['Site', 'Age', 'Dmax', 'Site Score', 'Age Score', 'Size Score', 'PHASES']].head()


## Task 37: Box Plots of PHASES Score for Ruptured and Unruptured Aneurysms
Generate box plots of the PHASES score for people with ruptured and unruptured aneurysms. You can use colours from the `tab20c` palette: colour 2 for ruptured and color 6 for unruptured cases.

In [None]:
# Set up colors
colors = sns.color_palette('tab20c')

# Generate box plots for PHASES score by rupture status

# write your code here

## Task 38: Count and Percentage of Ruptured and Unruptured Aneurysms with Dmax > 10
Having a diameter greater than 10mm is a known risk factor for aneurysm rupture. Iterate over the dataset and count how many ruptured aneurysms have a Dmax greater than 10, and what percentage they represent. Similarly, we will count how many unruptured aneurysms have a Dmax greater than 10 and what percentage they represent.

In [None]:
# Initialise counters for ruptured and unruptured aneurysms with Dmax > 10
ruptured_count = 0
unruptured_count = 0

# Total counts for ruptured and unruptured aneurysms
total_ruptured = 0
total_unruptured = 0

# Iterate over the dataframe rows

# write your code here

# Calculate percentages
ruptured_percentage = (ruptured_count / total_ruptured) * 100 if total_ruptured != 0 else 0
unruptured_percentage = (unruptured_count / total_unruptured) * 100 if total_unruptured != 0 else 0

# Print the results
print(f"Ruptured aneurysms with Dmax > 10: {ruptured_count} ({ruptured_percentage:.2f}%)")
print(f"Unruptured aneurysms with Dmax > 10: {unruptured_count} ({unruptured_percentage:.2f}%)")

## Task 39: Count and Percentage of Ruptured and Unruptured Aneurysms with AR > 2
Having an aspect ratio greater than 2.0 is a known risk factor for aneurysm rupture. Iterate over the dataset and count how many ruptured aneurysms have an AR greater than 2, and what percentage they represent. Similarly, we will count how many unruptured aneurysms have an AR greater than 2 and what percentage they represent.

In [None]:
# Initialise counters for ruptured and unruptured aneurysms with AR > 2
ruptured_count_ar = 0
unruptured_count_ar = 0

# Total counts for ruptured and unruptured aneurysms
total_ruptured_ar = 0
total_unruptured_ar = 0

# Iterate over the dataframe rows

# write your code here

# Calculate percentages
ruptured_percentage_ar = (ruptured_count_ar / total_ruptured_ar) * 100 if total_ruptured_ar != 0 else 0
unruptured_percentage_ar = (unruptured_count_ar / total_unruptured_ar) * 100 if total_unruptured_ar != 0 else 0

# Print the results
print(f"Ruptured aneurysms with AR > 2: {ruptured_count_ar} ({ruptured_percentage_ar:.2f}%)")
print(f"Unruptured aneurysms with AR > 2: {unruptured_count_ar} ({unruptured_percentage_ar:.2f}%)")

## Task 40: Count and Percentage of Ruptured and Unruptured Aneurysms with NSI > 0.2
A large non-sphericity is also known as a risk factor for aneurysm rupture. Iterate over the dataset and count how many ruptured aneurysms have an NSI greater than 0.2, and what percentage they represent. Similarly, we will count how many unruptured aneurysms have an NSI greater than 0.8 and what percentage they represent.

In [None]:
# Initialise counters for ruptured and unruptured aneurysms with NSI > 0.2

# write your code here

# Total counts for ruptured and unruptured aneurysms

# write your code here

# Iterate over the dataframe rows

# write your code here

# Calculate percentages

# write your code here

# Print the results
print(f"Ruptured aneurysms with NSI > 0.2: {ruptured_count_nsi} ({ruptured_percentage_nsi:.2f}%)")
print(f"Unruptured aneurysms with NSI > 0.2: {unruptured_count_nsi} ({unruptured_percentage_nsi:.2f}%)")

## Conclusion
Reflect on your results. Look how age, aneurysm maxmum diameter, aspect ratio and non-sphericity can be associated to aneurysm rupture? Have you seen a greater PHASES score in the ruptured aneurysms?