# Notebook template
- This template is inspired by deeplearning.ai AI4M course

Welcome to assignment **Course** !

You will do *tasks* that align with *learning objectives*

**You will learn:**
- Learning Objective #1
  - Example activity or topic #1
  - Example activity or topic #2
- ...
**e.g., C3M1**
- How to analyze data from randomized control trial
  - traditional statistical methods
  - and the more recent machine learning techniques

### This assignment covers the following topic

- [1. Topic 1](#1)
    - [1.1 Subtopic 1.1](#1-1)
        - [1.1.1 Subsubtopic 1.1.1](#1-1-1)
        - [Exercise 1](#ex-01)

**e.g., C3M1**
- [4. Machine Learning Approaches](#4)
  - [4.1 T-Learner](#4-1)
      - [Exercise 9](#ex-09)
      - [Exercise 10](#ex-10)
      - [Exercise 11](#ex-11)            

## Packages

We'll first import all the packages that we need for this assignment. 

- `pandas` is what we'll use to manipulate our data
- `numpy`  is a library for mathematical and scientific operations
- `matplotlib` is a plotting library
- `sklearn` contains a lot of efficient tools for machine learning and statistical modeling
- `random` allows us to generate random numbers in python
- `lifelines` is an open-source library that implements c-statistic
- `itertools` will help us with hyperparameters searching

## Import Package
Run next cell to import all necessary packages, dependencies and custom util functions.

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn
import random
# import lifelines
import itertools

plt.rcParams['figure.figsize'] = [10, 7]

<a name="1"></a>
## 1 Topic 1
<a name="1-1"></a>
### 1.1 Subtopic 1
In this section, we will ...
### 1.1.1. Subsubtopic 1

<a name="1-1"></a>
### 1.2 Subtopic 2
In this next section we will

In [8]:
# load data
data = pd.read_csv("levamisole_data.csv", index_col=0)

In [9]:
# explore data
print(f"Data Dimensions: {data.shape}")
data.head()

Data Dimensions: (607, 14)


Unnamed: 0,sex,age,obstruct,perfor,adhere,nodes,node4,outcome,TRTMT,differ_2.0,differ_3.0,extent_2,extent_3,extent_4
1,1,43,0,0,0,5.0,1,1,True,1,0,0,1,0
2,1,63,0,0,0,1.0,0,0,True,1,0,0,1,0
3,0,71,0,0,1,7.0,1,1,False,1,0,1,0,0
4,0,66,1,0,0,6.0,1,1,True,1,0,0,1,0
5,1,69,0,0,0,22.0,1,1,False,1,0,0,1,0


Below is a description of all the fields (one-hot means a different field for each level):
- `sex (binary): 1 if Male, 0 otherwise`
- `age (int): age of patient at start of the study`
- `obstruct (binary): obstruction of colon by tumor`
- `perfor (binary): perforation of colon`
- `adhere (binary): adherence to nearby organs`
- `nodes (int): number of lymphnodes with detectable cancer`
- `node4 (binary): more than 4 positive lymph nodes`
- `outcome (binary): 1 if died within 5 years`
- `TRTMT (binary): treated with levamisole + fluoroucil`
- `differ (one-hot): differentiation of tumor`
- `extent (one-hot): extent of local spread`

<a name='ex-01'></a>
### Exercise 01

Since this is an RCT, the treatment column is randomized. Let's warm up by finding what the treatment probability is.

$$p_{treatment} = \frac{n_{treatment}}{n}$$

- $n_{treatment}$ is the number of patients where `TRTMT = True`
- $n$ is the total number of patients.

In [10]:
# UNQ_C1 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def proportion_treated(df):
    """
    Compute proportion of trial participants who have been treated

    Args:
        df (dataframe): dataframe containing trial results. Column
                      'TRTMT' is 1 if patient was treated, 0 otherwise.
  
    Returns:
        result (float): proportion of patients who were treated
    """
    
    ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###

    proportion = len(df[(df['TRTMT'] == True)]) / len(df)
    
    ### END CODE HERE ###

    return proportion

**Test Case**

In [11]:
print("dataframe:\n")
example_df = pd.DataFrame(data =[[0, 0],
                                 [1, 1], 
                                 [1, 1],
                                 [1, 1]], columns = ['outcome', 'TRTMT'])
print(example_df)
print("\n")
treated_proportion = proportion_treated(example_df)
print(f"Proportion of patient treated: computed {treated_proportion}, expected: 0.75")

dataframe:

   outcome  TRTMT
0        0      0
1        1      1
2        1      1
3        1      1


Proportion of patient treated: computed 0.75, expected: 0.75


##### Expected output

```CPP
Proportion of patient treated: 0.75
```