## Part 0. Setup Steps

- Create a repo on GitHub named `eds217-trypy-03`
- Clone to create a version-controlled project
- Create some subfolder infrastructure (docs, data)
- Create a new python notebook.
- Complete all tasks for Part 1 in this .ipynb*

## Part 1. Conditional statements & for loops


*Complete each of the following in a separate code chunk.*

### Conditional statements

#### Task 1

Create an object called `pm2_5` with a value of 48 (representing Particulate Matter 2.5, an indicator for air quality, in $\frac{\mu g}{m^3}$ (see more about PM2.5 [here](https://www3.epa.gov/region1/airquality/pm-aq-standards.html)). 

Write an `if - else if - else` statement that returns "Low to moderate risk" if `pm2_5` (for Particulate Matter 2.5) is less than 100, "Unhealthy for sensitive groups" if PM 2.5 is 100 <= pm2_5 < 150, and "Health risk present"  if PM 2.5 is >= 150. 

Test by changing the value of your pm2_5 object and re-running your statement to check. 

In [None]:
#| echo: false
#| eval: false

pm2_5 = 48
if (pm2_5 < 100): 
    print("Low to moderate health risk")
elif (pm2_5 >= 100 & pm2_5 < 150):
    print("Unhealthy for sensitive groups")
elif (pm2_5 >= 150):
    print("Health risk present")

#### Task 2

Store the string "blue whale" as an object called `species`. 

Write an if statement that returns "You found a whale!" if the string "whale" is detected in species, otherwise return nothing. 

Test by changing the species string & re-running to see output. 

In [None]:
#| echo: false
#| eval: false

species = "gray whale"
if ("whale" in species):
    print("You found a whale!")

#### Task 3

Create a vector stored as `max_airtemp_c` with a value of 24.1. 

Write an `if else` statement that will print "Temperature too high" if `max_airtemp_c` is greater than 27, or "Temperature OK" if temperature is less than or equal to 27. 

In [None]:
#| echo: false
#| eval: false

max_airtemp_c = 24.1

if (max_airtemp_c > 27): 
  print("Temperature too high")
else:
  print("Temperature OK")

#### Task 4

Store the base price of a burrito as `base_burrito` with a value of 6.50. Store `main_ingredent` with a starting string of "veggie." 

Write a statement that will return the price of a burrito based on what a user specifies as "main_ingredient" (either "veggie", "chicken" or "steak") given the following: 

- A veggie burrito is the cost of a base burrito
- A chicken burrito costs 3.00 more than a base burrito
- A steak burrito costs 3.25 more than a base burrito

In [None]:
#| echo: false
#| eval: false

base_burrito = 6.50
main_ingredient = "steak"
if (main_ingredient == "veggie"):
    price = base_burrito
elif (main_ingredient == "chicken"):
    price = base_burrito + 3.00
elif (main_ingredient == "steak"):
    price = base_burrito + 3.25
    
print(f"Ingredient: {main_ingredient}, Price:{price}")


### For loops

*Complete each of the following in a separate code chunk.*

#### Task 5

Create a new vector called `fish` that contains the values `8, 10, 12, 23` representing counts of different fish types in a fish tank (goldfish, tetras, guppies, and mollies, respectively). 

Write a for loop that iterates through `fish`, and returns what proportion of total fish in the tank are that species. 

Assume that these counts represent all fish in the tank. 

In [None]:
#| echo: false
#| eval: false

fish = [8, 10, 12, 23]
sum_fish = sum(fish)
for i in fish:
    fish_prop = i / sum_fish
    print(round(fish_prop,2))


#### Task 6

Python has a list of month names stored in the `calendar` library (part of the standard library in Python). You can load this list using `from calendar import month_name`. These items are stored so that "January" is in `month_name[1]`, meaning this is one of the rare arrays in python that is not zero-indexed.  

 **Write a for loop** that iterates over all months in `month_name` and prints "January is month 1," "February is month 2", etc. 

**Hint:** you can index values in the `month_name` vector just like you would any other vector (e.g., try running `month_name[5]`).

In [None]:
#| echo: false
#| eval: false

from calendar import month_name

for i in range(1,13):
    print(f"{month_name[i]} is month {i}")


## Part 2. Real data

*You will complete Part 3 in a separate notebook*

Explore this [data package](https://portal.edirepository.org/nis/mapbrowse?packageid=knb-lter-arc.10341.5) from EDI, which contains a "Data file describing the biogeochemistry of samples collected at various sites near Toolik Lake, North Slope of Alaska". Familiarize yourself with the metadata (particularly, View full metadata > expand 'Data entities' to learn more about the variables in the dataset). 

**Citation:** Kling, G. 2016. Biogeochemistry data set for soil waters, streams, and lakes near Toolik on the North Slope of Alaska, 2011. ver 5. Environmental Data Initiative. https://doi.org/10.6073/pasta/362c8eeac5cad9a45288cf1b0d617ba7 

1. Download the CSV containing the Toolik biogeochemistry data
2. Take a look at it - how are missing values stored? Keep that in mind. 
3. Drop the CSV into your data folder of your project
4. Create a new qmd document, save in docs as `toolik_chem.ipynb`
5. Import the `pandas` and `janitor` package in your setup code chunk.
6. Read in the data as `toolik_biochem`. Remember, you'll want to specify here how `NA` values are stored. Use the `clean_names()` function to convert all column names to lower case/underscore format.

In [None]:
#| echo: false
import pandas as pd
import janitor
toolik_biochem = pd.read_csv(
    '../data/2011_Kling_Akchem.csv',
    na_values=".").clean_names()

7. Create a subset of the data that contains only observations from the "Toolik Inlet" site, and that only contains the variables (columns) for pH, dissolved organic carbon (DOC), and total dissolved nitrogen (TDN). Store this subset as `inlet_biochem`. Make sure to LOOK AT the subset you've created. 

In [None]:
#| echo: false
valid = toolik_biochem["site"] == "Toolik Inlet"
inlet_biochem = toolik_biochem[valid][[
    'ph','doc_um','tdn_um']]

8. Find the mean value of each column in `inlet_biochem` 2 different ways: 

a. Write a for loop from scratch to calculate the mean for each
b. Use *one other method* (e.g. `.mean()`, or `.apply()`) to find the mean for each column.

In [None]:
#| echo: false
import numpy as np

# Strategy a:
print("Using for loop:")
for col in inlet_biochem.columns:
    mean_val = np.nanmean(inlet_biochem[col])
    print(f"col {col}: {mean_val:.2f}")

# Strategy b: 
print("Using list comprehension:")
[print(
    f"col {col}:",
    f"{np.nanmean(inlet_biochem[col]):.2f}")
 for col in inlet_biochem.columns]

# Strategy c: 
print("Using df.mean()")
print(inlet_biochem.mean())

# Strategy d: 
print("Using .apply()")
print(inlet_biochem.apply(np.nanmean))

### Save, stage, commit, pull, push!

## END activities