# Week 13 - Earn-Back Points Assignment #3

These exercises are entirely optional, but they provide good practice. And you can use them to earn extra points toward your semester grade.  Each problem in this notebook can you earn you back up to 2 points.  There are key requirements, though. If your code does not following these rules, you will earn no points for your work.
* You MUST include docstrings that explain the purpose of your code.
* You MUST include at least 2 example tests in your docstrings for each function you write.
* You MUST run your docstrings within the notebook to show me your code and docstrings work correctly.
* You MUST submit your own individual work.  You may not collaborate with other students on these assignments.

There will be 4 assignments like this between now and the end of the semester, each with 4 problems, each worth 2 points, for a total of 32 points.

**If anything about the above rules is unclear, please message me on Canvas or via email**

---

## Earn-Back 1: Fetal Activity Data

Cardiotocograms are a useful tool for monitoring the health of fetuses and potential mortality of fetuses and pregnant women. This [dataset on Kaggle](https://www.kaggle.com/datasets/andrewmvd/fetal-health-classification?resource=download) is a collection of measurements 2,126 test subjects. Let's do some interesting things with this data!

I've already downloaded it and put in the Jupyter server under /data/fetal_health.csv

Your first step is to write a python function called **risk_score()** that takes three input parameters (shown below) and returns back a new series computed using the following rules to compute a total risk score:
* If the histograph number of peaks is greater than 5, add 1 to the risk score
* If the number of accelerations per second is greater than 0.01, add 1 to the risk score
* If the number of light decelerations per second is greather than 0.005, add 1 to the risk score

*Note that these rules were made up by the instructor. They have no scientific basis behind them.*


In [1]:
import pandas as pd
df = pd.read_csv('/data/fetal_health.csv')

In [5]:
def risk_score(peak, acceleration, deceleration):
    """(int, float, float)->int
    This function will use the number of peaks on a histograph, accelerations per second, and light decelerations per second from fetal
    cardiotocograms to compute a risk score. If a parameter meets its assigned threshold, the score increases by 1 which can be at most 3.
    
    >>> risk_score(6, 0.017, 0.0062)
    3
    
    >>> risk_score(1, 0.02, 0.0051)
    2
    """
    score = 0
    
    if peak > 5:
        score += 1
    
    if acceleration > 0.01:
        score += 1
        
    if deceleration > 0.005:
        score += 1
        
    return score

In [3]:
assert risk_score(2, 0.0, 0.0) == 0
assert risk_score(6, 0.006, 0.003) == 1
assert risk_score(5, 0.015, 0.0) == 1

In [6]:
import doctest
doctest.run_docstring_examples(risk_score, globals(), verbose=True)

Finding tests in NoName
Trying:
    risk_score(6, 0.017, 0.0062)
Expecting:
    3
ok
Trying:
    risk_score(1, 0.02, 0.0051)
Expecting:
    2
ok


---

## Earn-Back Part 2: Score the Data

In this next step, write a function called **score_subjects()** that takes your whole Dataframe as input and returns a new Series with scores for every record.  I recommend doing this with the `apply()` function similar to the BMI example below:

In [7]:
# 1. Create our sample dataframe
people = pd.DataFrame([
    ['Joe', 170, 85],
    ['Alex', 190, 110]
], columns=['Name','Height (cm)','Weight (kg)'])

# 2. Define our function just like we would a normal function to calculate BMI
def bmi(height_cm, weight_kg):
    return weight_kg / ((height_cm/100) ** 2)

# 3. Apply the bmi() function to each row using a lambda function
#    Set "axis=1" to apply the function to each row (instead of column)
#    Use an anonymous "lambda" function to call bmi function
#    passing the height and weight columns from our dataframe.
people.apply(lambda x: bmi(x['Height (cm)'],x['Weight (kg)']), axis=1)

0    29.411765
1    30.470914
dtype: float64

#### Start Your Code Here

In [25]:
def score_subjects(dataframe):
    """(pandas dataframe)->series
    This function will read the passed through dataframe and call upon the risk_score() function to produce a risk score for every entry
    using three of the dataframe columns. It returns a series of scores.
    
    >>> round(score_subjects(df).mean(),2)
    0.47
    
    >>> score_subjects(df).median()
    0.0
    """
    return dataframe.apply(lambda x: risk_score(x['histogram_number_of_peaks'], x['accelerations'], x['light_decelerations']), axis=1)

In [26]:
assert len(score_subjects(df).value_counts()) == 4
assert max(score_subjects(df)) == 3
assert min(score_subjects(df)) == 0
assert score_subjects(df).value_counts()[0] == 1341
assert score_subjects(df).value_counts()[1] == 574

In [27]:
import doctest
doctest.run_docstring_examples(score_subjects, globals(), verbose=True)

Finding tests in NoName
Trying:
    round(score_subjects(df).mean(),2)
Expecting:
    0.47
ok
Trying:
    score_subjects(df).median()
Expecting:
    0.0
ok


## Earn-Back Part 3:

Insert the results of your **score_subjects()** function as a new column in your dataframe. Call that new column **Risk Score**
You do not need to write a function to this.  Just write this code directly in the cells below.  Provide some minimal documentation to explain what you're doing.

In [35]:
df['Risk Score'] = score_subjects(df)

# new df column called 'Risk Score' is assigned the values in the series produced by the score_subjects function.
# since every entry has a score and no sorting was performed, we can directly insert those values into the column

## Earn-Back Part 4:

Summarize your dataframe in the following way. For each **Risk Score** provide the following aggregates:
* Count
* Average **histogram_number_of_peaks**
* Average **acceleration**
* Average **light_deceleration**

In [36]:
# count
risk_group = df.groupby('Risk Score')
risk_group['baseline value'].agg('count')

Risk Score
0    1341
1     574
2     210
3       1
Name: baseline value, dtype: int64

In [37]:
# average histogram_number_of_peaks
risk_group['histogram_number_of_peaks'].agg('mean')

Risk Score
0    2.384787
1    6.576655
2    7.952381
3    6.000000
Name: histogram_number_of_peaks, dtype: float64

In [38]:
# average accelerations
risk_group['accelerations'].agg('mean')

Risk Score
0    0.002371
1    0.004643
2    0.004295
3    0.011000
Name: accelerations, dtype: float64

In [39]:
# average light_decelerations
risk_group['light_decelerations'].agg('mean')

Risk Score
0    0.000675
1    0.003051
2    0.006448
3    0.007000
Name: light_decelerations, dtype: float64

## Submit your work to github in your week 11 folder by 11/18 11:59 PM