<img src="./intro_images/logo_excercises.png" width="100%" align="left" />

<table style="float:right;">
    <tr>
        <td>                      
            <div style="text-align: right">Dr Ali Sarrami Foroushani</div>
            <div style="text-align: right">Lecturer in Cardiovascular Biomechanics</div>
            <div style="text-align: right">School of Health Sciences</div>
            <div style="text-align: right">University of Manchester</div>
         </td>
     </tr>
</table>

# Python Week One - Exercises

This notebook contains exercises to evaluate your knowledge in Python, including variables, operators, iteration, data structures, functions, testing, NumPy, vectorised computation, and pandas. Each problem is followed by a code cell for you to write your solution.

> **Note**: In this version, each exercise is followed by a **fully worked solution** so you can verify your approach and results.

In [None]:
# Common imports used across exercises
import numpy as np
import pandas as pd
from pathlib import Path
pd.set_option('display.precision', 3)
pd.set_option('display.max_columns', None)

## Problem 1: Vectorised Computation in Python (NumPy)

In the context of health data science, we are going to calculate the Mean Arterial Pressure (MAP) for a set of patients using the following formula:

$$ MAP = \frac{\text{systolic} + 2 \times \text{diastolic}}{3} $$

You are given the following systolic and diastolic blood pressure values for 5 patients in a table format:

| Patient | Systolic (mm Hg) | Diastolic (mm Hg) |
|---------|------------------|-------------------|
| 1       | 120              | 80                |
| 2       | 130              | 85                |
| 3       | 110              | 75                |
| 4       | 140              | 90                |
| 5       | 135              | 88                |

1. Use vectorized computation to calculate the MAP for each patient.
2. Print the results as a NumPy array.

In [None]:
# Solution for Problem 1
systolic = np.array([120, 130, 110, 140, 135])
diastolic = np.array([80, 85, 75, 90, 88])
map_values = (systolic + 2*diastolic) / 3
map_values

## Problem 2: Standard Deviation and Mean for Health Metrics (NumPy)

You are given the following data representing Body Mass Index (BMI) values for a sample of patients:

`bmi = np.array([22.5, 24.7, 28.1, 30.2, 27.5, 26.1, 23.0])`

1. Calculate the mean and standard deviation of the BMI values.
2. Interpret the results: What do the mean and standard deviation tell us about the population's BMI values?

In [None]:
# Solution for Problem 2
bmi = np.array([22.5, 24.7, 28.1, 30.2, 27.5, 26.1, 23.0])
mean_bmi = bmi.mean()
std_bmi = bmi.std(ddof=0)  # population std; use ddof=1 for sample
print(f"Mean BMI: {mean_bmi:.2f}")
print(f"Std Dev BMI: {std_bmi:.2f}")
print("Interpretation:")
print("- The mean gives the central tendency of BMI in this sample.")
print("- The standard deviation indicates how spread out the BMIs are around the mean.")

## Problem 3: Matrix Multiplication for Health Prediction Models (NumPy)

In a health prediction model, you are given a matrix of patient data `X` with each row representing a patient and each column representing a health metric (e.g., age, glucose level, cholesterol level). You are also given a vector `W` of weights for each metric, which represents their importance in predicting the risk score for each patient.

Perform matrix multiplication to compute the risk score for each patient. You are given the following data:

Matrix `X`:

$$ X = \begin{pmatrix} 60 & 110 & 220 \\ 50 & 90 & 180 \\ 70 & 130 & 210 \\ 40 & 85 & 190 \end{pmatrix} $$

Vector `W`:

$$ W = \begin{pmatrix} 0.5 \\ 0.3 \\ 0.2 \end{pmatrix} $$

1. Multiply the matrix `X` with the vector `W` to obtain the risk score for each patient.
2. Print the resulting risk scores as a NumPy array.

In [None]:
# Solution for Problem 3
X = np.array([[60, 110, 220],
              [50,  90, 180],
              [70, 130, 210],
              [40,  85, 190]])
W = np.array([0.5, 0.3, 0.2])
risk_scores = X @ W
risk_scores

## Problem 4: Vectorised Computation in Python (NumPy)

Given a NumPy array of patient weights and heights, compute the BMI for each patient in a vectorized manner. Use the formula:

$$\text{BMI} = \frac{\text{weight}}{\text{height}^2}$$

Use the following data:
- weights = `[70, 85, 90, 60, 75]`
- heights = `[1.75, 1.80, 1.78, 1.60, 1.70]`

In [None]:
# Solution for Problem 4
weights = np.array([70, 85, 90, 60, 75])
heights = np.array([1.75, 1.80, 1.78, 1.60, 1.70])
bmi_vals = weights / (heights ** 2)
bmi_vals

## Problem 5: Pandas in Python

In this task, you will create a DataFrame from the following patient data and save it as a CSV file. Then, you will read the CSV file and calculate the average values for each column.

The patient data is as follows (with units provided):

| Patient ID | Age (years) | Glucose Level (mg/dL) | Cholesterol Level (mg/dL) | Blood Pressure (mmHg) |
|------------|--------------|------------------------|----------------------------|-----------------------|
| 1          | 65           | 110                    | 180                        | 120/80                |
| 2          | 50           | 95                     | 170                        | 130/85                |
| 3          | 45           | 105                    | 190                        | 140/90                |
| 4          | 60           | 100                    | 200                        | 150/95                |
| 5          | 55           | 90                     | 160                        | 125/75                |

### Steps:
1. Create a Pandas DataFrame from the given patient data.
2. Save the DataFrame to a CSV file called `patient_data.csv`.
3. Read the `patient_data.csv` file and display the DataFrame.
4. Calculate the average values for each column (except for `Patient ID`).
5. Display the first few rows of the new DataFrame.

In [None]:
# Solution for Problem 5
data = {
    'Patient ID': [1,2,3,4,5],
    'Age (years)': [65,50,45,60,55],
    'Glucose Level (mg/dL)': [110,95,105,100,90],
    'Cholesterol Level (mg/dL)': [180,170,190,200,160],
    'Blood Pressure (mmHg)': ['120/80','130/85','140/90','150/95','125/75']
}
df = pd.DataFrame(data)

# Save to CSV
csv_path = Path('patient_data.csv')
df.to_csv(csv_path, index=False)

# Read back
df_read = pd.read_csv(csv_path)
print("Data read from CSV:")
display(df_read)

# Split Blood Pressure into Systolic/Diastolic numeric columns for averaging
bp_split = df_read['Blood Pressure (mmHg)'].str.split('/', expand=True).astype(int)
df_read['Systolic'] = bp_split[0]
df_read['Diastolic'] = bp_split[1]

# Select numeric columns except Patient ID for averaging
numeric_cols = ['Age (years)', 'Glucose Level (mg/dL)', 'Cholesterol Level (mg/dL)', 'Systolic', 'Diastolic']
averages = df_read[numeric_cols].mean().to_frame(name='Average').T
print("\nAverages:")
display(averages.head())

## Problem 6: Importing and Using Custom Modules

Create and use a custom Python module for basic health metrics.

1. Create a new Python file named `health_utils.py` with two functions:
   - `calculate_bmi(weight, height)` → returns BMI using `weight / height**2`
   - `is_healthy_bmi(bmi)` → returns `True` if BMI is between 18.5 and 24.9, else `False`.
2. Import the module into your notebook.
3. Use these functions to calculate and print whether a person weighing 72 kg and 1.78 m tall is in the healthy range.

In [None]:
# Solution for Problem 6
# Create the module file
from pathlib import Path
module_code = '''\
def calculate_bmi(weight, height):
    """Compute BMI given weight (kg) and height (m)."""
    return weight / (height ** 2)

def is_healthy_bmi(bmi):
    """Return True if BMI is within [18.5, 24.9], else False."""
    return 18.5 <= bmi <= 24.9
'''
Path('health_utils.py').write_text(module_code)

# Import and use the module
import importlib
health_utils = importlib.import_module('health_utils')

bmi_val = health_utils.calculate_bmi(72, 1.78)
print(f"BMI: {bmi_val:.2f}")
print("Healthy range:", health_utils.is_healthy_bmi(bmi_val))

## Problem 7: Statistical Analysis with NumPy

A researcher collected resting heart rates (in beats per minute) from 10 patients:  
`rates = np.array([72, 75, 78, 70, 69, 80, 77, 74, 76, 73])`

1. Compute the **mean**, **median**, and **standard deviation** using NumPy.
2. Identify which patients (by index) have a heart rate **above one standard deviation from the mean**.

In [None]:
# Solution for Problem 7
rates = np.array([72, 75, 78, 70, 69, 80, 77, 74, 76, 73])
mean_rate = rates.mean()
median_rate = np.median(rates)
std_rate = rates.std(ddof=0)

above_one_sd_idx = np.where(rates > mean_rate + std_rate)[0]

print(f"Mean: {mean_rate:.2f}, Median: {median_rate:.2f}, Std: {std_rate:.2f}")
print("Indices above mean + 1 SD:", above_one_sd_idx.tolist())

## Problem 8: Boolean and Fancy Indexing with NumPy

Given patient glucose readings:  
`glucose = np.array([90, 110, 130, 95, 145, 160, 125, 105])`

1. Use **boolean indexing** to select patients with glucose levels above 120 mg/dL.  
2. Use **fancy indexing** to extract the glucose readings of patients 1, 3, and 5 (remember: Python indexing starts from 0).  
3. Print both results.

In [None]:
# Solution for Problem 8
glucose = np.array([90, 110, 130, 95, 145, 160, 125, 105])
high_glucose = glucose[glucose > 120]
selected = glucose[[1, 3, 5]]
print("Glucose > 120 mg/dL:", high_glucose)
print("Patients 1, 3, 5 readings:", selected)

## Problem 9: Working with Pandas – Data Cleaning

You are given the following data on patients’ blood tests. Some cholesterol values are missing.

| Patient ID | Age | Glucose | Cholesterol |
|-------------|-----|----------|--------------|
| 1 | 45 | 105 | 180 |
| 2 | 50 | 98 | NaN |
| 3 | 60 | 110 | 200 |
| 4 | 55 | 95 | NaN |
| 5 | 48 | 115 | 190 |

1. Create a Pandas DataFrame from this data.  
2. Use `isnull()` to identify missing values.  
3. Replace missing cholesterol values with the **mean** of the available values.  
4. Display the cleaned DataFrame.

In [None]:
# Solution for Problem 9
df9 = pd.DataFrame({
    'Patient ID': [1,2,3,4,5],
    'Age': [45,50,60,55,48],
    'Glucose': [105,98,110,95,115],
    'Cholesterol': [180, np.nan, 200, np.nan, 190]
})
print("Missing values (True indicates missing):")
display(df9.isnull())

chol_mean = df9['Cholesterol'].mean()
df9['Cholesterol'] = df9['Cholesterol'].fillna(chol_mean)
print("\nCleaned DataFrame:")
display(df9)

## Problem 10: Data Analysis with Pandas

Using the following dataset:

| Patient ID | Gender | BMI | Blood Pressure | Cholesterol |
|-------------|---------|-----|----------------|--------------|
| 1 | Male | 25.3 | 120 | 180 |
| 2 | Female | 22.4 | 110 | 170 |
| 3 | Male | 27.5 | 130 | 210 |
| 4 | Female | 23.1 | 115 | 190 |
| 5 | Male | 29.2 | 140 | 220 |

1. Create a DataFrame.  
2. Calculate the **average BMI** for male and female patients using `groupby()`.  
3. Add a new column called `High_Cholesterol` that contains `True` if cholesterol > 200, else `False`.  
4. Save the resulting DataFrame as `health_summary.csv`.

In [None]:
# Solution for Problem 10
df10 = pd.DataFrame({
    'Patient ID': [1,2,3,4,5],
    'Gender': ['Male','Female','Male','Female','Male'],
    'BMI': [25.3, 22.4, 27.5, 23.1, 29.2],
    'Blood Pressure': [120,110,130,115,140],
    'Cholesterol': [180,170,210,190,220]
})
avg_bmi_by_gender = df10.groupby('Gender')['BMI'].mean()
print("Average BMI by Gender:")
display(avg_bmi_by_gender)

df10['High_Cholesterol'] = df10['Cholesterol'] > 200
out_path = Path('health_summary.csv')
df10.to_csv(out_path, index=False)
print(f"\nSaved DataFrame to {out_path.resolve()}\n")
display(df10)