# Lab 5: Applying Functions and External Datasets

Welcome to lab 5! This week, we'll get more practice with functions and the table method `apply` from [Section 8.1](https://www.inferentialthinking.com/chapters/08/1/applying-a-function-to-a-column.html).  We'll also practice loading and working with external dataset(s) to prepare for your course project.

First, set up the tests and imports by running the cell below.

In [None]:
import numpy as np
from datascience import *

# These lines set up graphing capabilities.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)
warnings.simplefilter('ignore', RuntimeWarning)

from gofer.ok import check

## 1. Counting Calories Burned from Exercise


Suppose you'd like to count how many calories you've burned from exercise.  You do 4 kinds of exercise: yoga, walking, sprinting, and volleyball.  Every day in January, you record how many minutes of each kind of exercise you did that day.  Those data are in a table called `exercise.csv`.

In [None]:
exercise = Table.read_table('exercise.csv')
exercise

Different forms of exercise burn calories at different rates; for example, sprinting is more vigorous than walking. The table `calories_per_minute` contains estimates of the calories per minute burned by each activity.

In [None]:
calories_per_minute = Table.read_table('calories_per_minute.csv')
calories_per_minute

Let's start by finding the total number of minutes you spent exercising each day. 

**Question 1.1.** Write a function called `compute_exercise_time`.  It should take one argument, a row from the `exercise` table that contains the day and amounts of time (in minutes) spent on yoga, walking, sprinting, and volleyball. It should return the total time spent exercising.

*Hint* You can `tbl.row(n)` to get the `n`th row of a table. `row.item("column_name")` will allow you to select the element that corresponds to `column_name` in a particular row.

<!--
BEGIN QUESTION
name: q1_1
manual: false
-->

In [None]:
def compute_exercise_time(exercise_row):
    ...

compute_exercise_time(exercise.row(0))

In [None]:
check("tests/q1_1.py")

**Question 1.2.** Create a new table `exercise_time` that is a copy of the `exercise` table, with a new column called `Total Exercise Time` that describes the total time (in minutes) spent exercising on each day.

*Hint*: When you only pass a function name through `tbl.apply()`, the function gets applied to every row in `tbl`

<!--
BEGIN QUESTION
name: q1_2
manual: false
-->

In [None]:
exercise_time = ...
exercise_time

In [None]:
check("tests/q1_2.py")

To compute the calories you've burned on a particular day, you multiply the time spent on each kind of exercise by the calories burned per minute by that exercise, then add up those 4 numbers.

**Question 1.3.** Write a function called `compute_calories`.  It should take 4 arguments, the amounts of time (in minutes) spent on yoga, walking, sprinting, and volleyball, respectively.  It should return the total number of calories burned.

<!--
BEGIN QUESTION
name: q1_3
manual: false
-->

In [None]:
def compute_calories(..., ..., ..., ...)
    ...

In [None]:
check("tests/q1_3.py")

**Question 1.4.** Make a table called `exercise_with_totals` that's a copy of `exercise`, but with an additional column called "Total calories burned exercising".  That column should contain the total number of calories burned from exercise on each day.  Compute that column using `apply` and your `compute_calories` function.

*Hint:* If you want to apply a function that takes multiple arguments, you can pass through multiple column names as arguments in `tbl.apply()` to call your function on all corresponding columns.

<!--
BEGIN QUESTION
name: q1_4
manual: false
-->

In [None]:
exercise_with_totals = ...
exercise_with_totals

In [None]:
check("tests/q1_4.py")

**Question 1.5.** How many calories were burned from exercise in the 2nd week of January (that is, on days 8 through 14, inclusive)?  Call that number `calories_burned`.  Compute the answer with code, not by looking at the data.

<!--
BEGIN QUESTION
name: q1_5
manual: false
-->

In [None]:
calories_burned = ...
calories_burned

In [None]:
check("tests/q1_5.py")

## 2. Working with External Datasets

To help you prepare for your course project later in the semester, we would like to get you to practice loading and working with external datasets. 

Recall that in Quiz 1, we worked with a dataset on New York Energy Production. In fact, the full dataset contains much more information. Let's try mannually uploading the `annualstate.csv` file to your folder and working with this file.

**Question 2.1.** Download the `annualstate.csv` file from Moodle, Section Climate datasets. Upload this file to your lab05 folder here.

*Hint:* To get to your lab05 folder, click the jupyter icon on the top left to bring you to your directory. Find you lab05 folder. Click the "Upload" button on the top right, and upload the `annualstate.csv` file. Don't forget to click "Upload" in blue to confirm the upload.

**Question 2.2.** Read in the `annualstate.csv` file using appropriate table methods and name it as fulldata. Show the first 10 rows.

**Question 2.3.** Create a table called `annualNYtotal`, which only contains rows of "NY" in the "STATE" column, "Total" in the "ENERGY SOURCE" column. Show the first 10 rows of `annualNYtotal`.

**Question 2.4.** Drop the "ENERGY SOURCE" column from `annualNYtotal`, and rename the remaining columns as: "Year", "State", "Producer", and "Generation (megawatthours)". Assign this new table to `annualNYtotal`, and show the first 10 rows of `annualNYtotal`.

**Question 2.5.** Reformat the "Generation (Megawatthours)" column into "Generation (thousand megawatthours)". For example, 135,345,692 should be 135,346. Assign this new table to `annualNYtotal`, and show the first 10 rows of `annualNYtotal`.

## 3. Submission


Great job! :D You're finished with lab 5! Be sure to...
- **run all the tests and verify that they all pass** (the next cell has a shortcut for that), 
- **Save and Checkpoint** from the `File` menu,

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
import glob
from gofer.ok import grade_notebook
if not globals().get('__GOFER_GRADER__', False):
    display(grade_notebook('lab05.ipynb', sorted(glob.glob('tests/q*.py'))))