# Python Exercises

This notebook is intended to give you a chance to practice the Python you learned in the `python_basics.ipynb` notebook!  
  
This is divided into cells that you can try out. If you get the correct answer to the puzzle, it will say "Correct" when you run the next cell

## 1) Importing Libraries

**Task**: Import the `numpy` library as `np`, the `pandas` library as `pd` and the `matplotlib.pyplot` library as `plt`

In [3]:
# write your code here

import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt

Try to use the `help()` function to produce the help information for a `array` in numpy

In [4]:
# write your code here

help(np.array)

Help on built-in function array in module numpy:

array(...)
    array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0,
          like=None)
    
    Create an array.
    
    Parameters
    ----------
    object : array_like
        An array, any object exposing the array interface, an object whose
        __array__ method returns an array, or any (nested) sequence.
        If object is a scalar, a 0-dimensional array containing object is
        returned.
    dtype : data-type, optional
        The desired data-type for the array.  If not given, then the type will
        be determined as the minimum type required to hold the objects in the
        sequence.
    copy : bool, optional
        If true (default), then the object is copied.  Otherwise, a copy will
        only be made if __array__ returns a copy, if obj is a nested sequence,
        or if a copy is needed to satisfy any of the other requirements
        (`dtype`, `order`, etc.).
    order : {'K', 'A', '

**Task**: Fix this next cell so that it no longer raises an error

In [5]:
# fix the error in this cell

import sklearn as skl

**Task**: The next cell should be formatted as a comment. Make the cell into a comment so that it no longer raises an error

In [6]:
# I am supposed to be a comment and I will not run until I am

## 2) Reading in data

**Task**: Use the pandas library to read in the "sample_patient_data.csv" file in the cell below. You should call your dataset `df`.

In [8]:
# write your code here
df = pd.read_csv("sample_patient_data.csv")

**Task**: Explore the data
Answer the following questions when you are done

In [9]:
# you can write code for data exploration here
df

Unnamed: 0,patient_id,age,gender,diagnosis,systolic_bp,hba1c,medication_adherence
0,P001,45,M,Hypertension,140,5.2,0.85
1,P002,62,F,Diabetes,130,8.1,0.72
2,P003,38,F,Asthma,110,5.0,0.95
3,P004,71,M,Diabetes,145,9.2,0.68
4,P005,29,F,Hypertension,125,4.8,0.9
5,P006,55,M,Diabetes,155,7.8,0.75
6,P007,68,F,Asthma,115,5.1,0.88
7,P008,42,M,Hypertension,138,5.5,0.82


#### 1. How many patients do we have?

<div style="border: 2px solid #ddd; background-color: #f9f9f9; padding: 15px; margin: 10px 0; border-radius: 8px; color: #333;">

*Write your answer in this box*

There are 8 patients. Be careful: remember that python starts counting at 0, not 1.

#### 2. What are the features of our data?

<div style="border: 2px solid #ddd; background-color: #f9f9f9; padding: 15px; margin: 10px 0; border-radius: 8px; color: #333;">

*Write your answer in this box*

The **features** are the **columns** of the data. This is:
* patient_id
* age
* gender
* diagnosis 
* systolic_bp 
* hba1c 
* medication_adherence



#### 3. What are the data types in our data?
Remember: each feature has a data type. It can be an integer (a whole number), a float (a decimal) or a string (text)

<div style="border: 2px solid #ddd; background-color: #f9f9f9; padding: 15px; margin: 10px 0; border-radius: 8px; color: #333;">

*Write your answer in this box*

* patient_id -> string; mix of numbers and letters is a string
* age -> integer 
* gender -> string 
* diagnosis -> string
* systolic_bp -> integer 
* hba1c -> float 
* medication_adherence -> float



## 3) Tasks to check understanding

Use pandas to answer the following questions:

#### 1. Find all female patients who are over the age of 50

In [10]:
# write your code here 
df[df["age"] >= 50]

Unnamed: 0,patient_id,age,gender,diagnosis,systolic_bp,hba1c,medication_adherence
1,P002,62,F,Diabetes,130,8.1,0.72
3,P004,71,M,Diabetes,145,9.2,0.68
5,P006,55,M,Diabetes,155,7.8,0.75
6,P007,68,F,Asthma,115,5.1,0.88


#### 2. Calculate the average medication adherence by diagnosis

In [None]:
# write your code here
diabetes = df[df["diagnosis"] == "Diabetes"]
diabetes

# avg is (0.72 + 0.68 + 0.75)/3 = 0.72

Unnamed: 0,patient_id,age,gender,diagnosis,systolic_bp,hba1c,medication_adherence
1,P002,62,F,Diabetes,130,8.1,0.72
3,P004,71,M,Diabetes,145,9.2,0.68
5,P006,55,M,Diabetes,155,7.8,0.75


In [None]:
hypertension = df[df["diagnosis"] == "Hypertension"]
hypertension

# avg is (0.85 + 0.90 + 0.82)/3 = 0.86

Unnamed: 0,patient_id,age,gender,diagnosis,systolic_bp,hba1c,medication_adherence
0,P001,45,M,Hypertension,140,5.2,0.85
4,P005,29,F,Hypertension,125,4.8,0.9
7,P008,42,M,Hypertension,138,5.5,0.82


In [None]:
asthma = df[df["diagnosis"] == "Asthma"]
asthma

# avg is (0.95 + 0.88)/2 = 0.92

Unnamed: 0,patient_id,age,gender,diagnosis,systolic_bp,hba1c,medication_adherence
2,P003,38,F,Asthma,110,5.0,0.95
6,P007,68,F,Asthma,115,5.1,0.88


#### 3. Find patients with systolic BP between 120-140 mmHg

In [None]:
# write your code here
df[(df["systolic_bp"] <= 140) & (df["systolic_bp"] >= 120)]

# remember, to do "and", we use &

Unnamed: 0,patient_id,age,gender,diagnosis,systolic_bp,hba1c,medication_adherence
0,P001,45,M,Hypertension,140,5.2,0.85
1,P002,62,F,Diabetes,130,8.1,0.72
4,P005,29,F,Hypertension,125,4.8,0.9
7,P008,42,M,Hypertension,138,5.5,0.82


#### 4. What is the average age of the patients?

In [19]:
# write your code here
df["age"].mean()

51.25

#### 5. Save the dataset to a file 
  
Hint: To save a dataset, we can use `df.to_csv()`. 
Use `help(df.to_csv())` for information about how to use this function

In [None]:
help(df.to_csv())

NameError: name 'df' is not defined