# Python Exercises

This notebook is intended to give you a chance to practice the Python you learned in the `python_basics.ipynb` notebook!  
  
This is divided into cells that you can try out. If you get the correct answer to the puzzle, it will say "Correct" when you run the next cell

## 1) Importing Libraries

**Task**: Import the `numpy` library as `np`, the `pandas` library as `pd` and the `matplotlib.pyplot` library as `plt`

In [1]:
# write your code here

import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt

Try to use the `help()` function to produce the help information for a `array` in numpy

In [2]:
# write your code here

help(np.array)

Help on built-in function array in module numpy:

array(...)
    array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
          like=None)

    Create an array.

    Parameters
    ----------
    object : array_like
        An array, any object exposing the array interface, an object whose
        ``__array__`` method returns an array, or any (nested) sequence.
        If object is a scalar, a 0-dimensional array containing object is
        returned.
    dtype : data-type, optional
        The desired data-type for the array. If not given, NumPy will try to use
        a default ``dtype`` that can represent the values (by applying promotion
        rules when necessary.)
    copy : bool, optional
        If ``True`` (default), then the array data is copied. If ``None``,
        a copy will only be made if ``__array__`` returns a copy, if obj is
        a nested sequence, or if a copy is needed to satisfy any of the other
        requirements (``dtype``, ``order``, 

**Task**: Fix this next cell so that it no longer raises an error

In [4]:
# fix the error in this cell

import sklearn as skl

**Task**: The next cell should be formatted as a comment. Make the cell into a comment so that it no longer raises an error

In [6]:
# I am supposed to be a comment and I will not run until I am

## 2) Reading in data

**Task**: Use the pandas library to read in the "sample_patient_data.csv" file in the cell below. You should call your dataset `df`.

In [7]:
# write your code here
df = pd.read_csv("sample_patient_data.csv")

**Task**: Explore the data
Answer the following questions when you are done

In [8]:
# you can write code for data exploration here
df.head()

Unnamed: 0,patient_id,age,gender,diagnosis,systolic_bp,hba1c,medication_adherence
0,P001,45,M,Hypertension,140,5.2,0.85
1,P002,62,F,Diabetes,130,8.1,0.72
2,P003,38,F,Asthma,110,5.0,0.95
3,P004,71,M,Diabetes,145,9.2,0.68
4,P005,29,F,Hypertension,125,4.8,0.9


In [10]:
df.describe()

Unnamed: 0,age,systolic_bp,hba1c,medication_adherence
count,8.0,8.0,8.0,8.0
mean,51.25,132.25,6.3375,0.81875
std,15.097303,15.229201,1.736941,0.09433
min,29.0,110.0,4.8,0.68
25%,41.0,122.5,5.075,0.7425
50%,50.0,134.0,5.35,0.835
75%,63.5,141.25,7.875,0.885
max,71.0,155.0,9.2,0.95


In [11]:
df.dtypes

patient_id               object
age                       int64
gender                   object
diagnosis                object
systolic_bp               int64
hba1c                   float64
medication_adherence    float64
dtype: object

In [13]:
df.shape

(8, 7)

In [14]:
df.columns

Index(['patient_id', 'age', 'gender', 'diagnosis', 'systolic_bp', 'hba1c',
       'medication_adherence'],
      dtype='object')

#### 1. How many patients do we have?

<div style="border: 2px solid #ddd; background-color: #f9f9f9; padding: 15px; margin: 10px 0; border-radius: 8px; color: #333;">

*Write your answer in this box*

8

#### 2. What are the features of our data?

<div style="border: 2px solid #ddd; background-color: #f9f9f9; padding: 15px; margin: 10px 0; border-radius: 8px; color: #333;">

*Write your answer in this box*
7:



#### 3. What are the data types in our data?
Remember: each feature has a data type. It can be an integer (a whole number), a float (a decimal) or a string (text)

<div style="border: 2px solid #ddd; background-color: #f9f9f9; padding: 15px; margin: 10px 0; border-radius: 8px; color: #333;">

*Write your answer in this box*




## 3) Tasks to check understanding

Use pandas to answer the following questions:

#### 1. Find all female patients who are over the age of 50

In [16]:
# write your code here 
df[(df["age"] > 50) & (df["gender"] == "F")]

Unnamed: 0,patient_id,age,gender,diagnosis,systolic_bp,hba1c,medication_adherence
1,P002,62,F,Diabetes,130,8.1,0.72
6,P007,68,F,Asthma,115,5.1,0.88


#### 2. Calculate the average medication adherence by diagnosis

In [17]:
# write your code here
df["diagnosis"].value_counts()

diagnosis
Hypertension    3
Diabetes        3
Asthma          2
Name: count, dtype: int64

In [21]:
df.loc[df["diagnosis"] == "Hypertension", "medication_adherence"].mean()

np.float64(0.8566666666666666)

#### 3. Find patients with systolic BP between 120-140 mmHg

In [22]:
# write your code here

df[(df["systolic_bp"] > 120) & (df["systolic_bp"] < 140)]

Unnamed: 0,patient_id,age,gender,diagnosis,systolic_bp,hba1c,medication_adherence
1,P002,62,F,Diabetes,130,8.1,0.72
4,P005,29,F,Hypertension,125,4.8,0.9
7,P008,42,M,Hypertension,138,5.5,0.82


In [23]:
df[120 < df["systolic_bp"] < 140]

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

#### 4. What is the average age of the patients?

In [26]:
# write your code here

df["age"].median()

50.0

#### 5. Save the dataset to a file 
  
Hint: To save a dataset, we can use `df.to_csv()`. 
Use `help(df.to_csv())` for information about how to use this function

In [29]:
df.to_csv("saved_file.csv")