<div><img style="float: left; padding-right: 3em;" src="https://pbs.twimg.com/profile_images/1537109064093532160/mG03dW9G_400x400.jpg" width="150" /><div/>

# It's another STARS 2023 Earth Data Science Workflow!
This notebook contains your next earth data science coding challenge! Before we get started, make sure to read or review the guidelines below. These will help make sure that your code is readable and reproducible. 

## Don't get **caught** by these Jupyter notebook gotchas

<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*o0HleR7BSe8W-pTnmucqHA.jpeg" height=300 style="padding: 1em; border-style: solid; border-color: grey;" />

  > *Image source: https://alaskausfws.medium.com/whats-big-and-brown-and-loves-salmon-e1803579ee36*

These are the most common issues that will keep you from getting started and delay your code review:

1. When you try to run some code, you may be prompted to select a **kernel**.
   * The **kernel** refers to the version of Python you are using
   * You should use the **base** kernel, which should be the default option. 
   * You can also use the `Select Kernel` menu in the upper right to select the **base** kernel
2. Before you commit your work, make sure it runs **reproducibly** by clicking:
   1. `Restart` (this button won't appear until you've run some code), then
   2. `Run All`

## Check your code to make sure it's clean and easy to read

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSO1w9WrbwbuMLN14IezH-iq2HEGwO3JDvmo5Y_hQIy7k-Xo2gZH-mP2GUIG6RFWL04X1k&usqp=CAU" height=200 />

* Format all cells prior to submitting (right click on your code).
* Use expressive names for variables so you or the reader knows what they are. 
* Use comments to explain your code -- e.g. 
  ```python
  # This is a comment, it starts with a hash sign
  ```

## Label and describe your plots

![Source: https://xkcd.com/833](https://imgs.xkcd.com/comics/convincing.png)

Make sure each plot has:
  * A title that explains where and when the data are from
  * x- and y- axis labels with **units** where appropriate
  * A legend where appropriate


## Icons: how to use this notebook
We use the following icons to let you know when you need to change something to complete the challenge:
  * <img src="https://static.thenounproject.com/png/4260107-200.png" width=20 style="float: left; padding: 3px" /> means you need to write or edit some code.
  
  * <img src="https://uxwing.com/wp-content/themes/uxwing/download/education-school/read-book-icon.png" width=20 style="float: left; padding: 3px" /> indicates recommended reading
  
  * <img src="https://static.thenounproject.com/png/5640527-200.png" width=20 style="float: left; padding: 3px" /> marks written responses to questions
  
  * <img src="https://static.thenounproject.com/png/3842781-200.png" width=20 style="float: left; padding: 3px" /> is an optional extra challenge
  

---

# Get started with open reproducible science!

[Open reproducible science](https://www.earthdatascience.org/courses/intro-to-earth-data-science/open-reproducible-science/get-started-open-reproducible-science/) makes scientific methods, data and outcomes available to everyone. That means that *everyone* who wants should be able to **find**, **read**, **understand**, and **run** your workflows for themselves.

<img alt-text="Components of open science - accessible, reproducible, inclusive" src="https://www.earthdata.nasa.gov/s3fs-public/2021-11/Circle_Diagram_UPDATE_2.jpg?VersionId=pFRniRpjtgc_MEXUJKi9_sXLoMsSX.pB" width=500 />

 > Image from https://www.earthdata.nasa.gov/esds/open-science/oss-for-eso-workshops

Few if any science projects are 100% open and reproducible (yet!). However, members of the open science community have developed open source tools and practices that can help you move toward that goal. You will learn about many of those tools in [the Intro to Earth Data Science textbook](https://www.earthdatascience.org/courses/intro-to-earth-data-science/). Don't worry about learning all the tools at once -- we've picked a few for you to get started with.

## Your turn: what does open reproducible science mean to you?

<img src="https://uxwing.com/wp-content/themes/uxwing/download/education-school/read-book-icon.png" width=20 style="float: left; padding: 3px" /> First, read about some of our thoughts in [the textbook chapter on open reproducible science](https://www.earthdatascience.org/courses/intro-to-earth-data-science/open-reproducible-science/get-started-open-reproducible-science/).

**Then, create a new Markdown cell below this one** using the `+ Markdown` button in the upper left

<img src="https://static.thenounproject.com/png/5640527-200.png" width=20 style="float: left; padding: 3px" /> In the new cell, answer the following questions using a **numbered list** in Markdown:

  1. In 1-2 sentences, define open reproducible science.
  2. In 1-2 sentences, choose one of the open source tools that you have learned about (i.e. Shell, Git/GitHub, Jupyter Notebook, Python) and explain how it supports open reproducible science.
  3. In 1-2 sentences, does this Jupyter Notebook file have a machine-readable name? Explain your answer.


1.
2.
3.

---

## **Readable**, **well-documented** scientific workflows are easier to reproduce

As the comic below suggests, code that is hard to read is also hard to get working. We refer to code that is easy to read as **clean** code.

[![And because if you just leave it there, it's going to start contaminating things downstream even if no one touches it directly. (from https://xkcd.com/2138/)](https://imgs.xkcd.com/comics/wanna_see_the_code.png)](https://www.explainxkcd.com/wiki/index.php/2138:_Wanna_See_the_Code%3F)


<img src="https://static.thenounproject.com/png/5640527-200.png" width=20 style="float: left; padding: 3px" /> **In the prompt below, list 3 things you can do to write clean code, and then list 3 more advantages of doing so.**
  * Double click on the cell to edit
  * You can use examples from the textbook, or come up with your own. 
  * Use [**Markdown**](https://www.markdownguide.org/) to format your list.
  

I can write clean code by:
  * `YOUR ANSWER HERE`


Advantages of clean code include:
  * `YOUR ANSWER HERE`

---

## What the fork?! Who wrote this?

Below is a scientific Python workflow. But something's wrong -- The code won't run! Your task is to follow the instructions below to **clean and debug** the Python code below so that it runs.
 > Don't worry if you can't solve every bug right away. We'll get there! The most important thing is to identify problems with the code and write high-quality [**GitHub Issues**](https://docs.github.com/en/issues/tracking-your-work-with-issues/creating-an-issue#creating-an-issue-from-a-repository)

At the end, you'll **repeat the workflow** for a location and measurement of your choosing.

### Alright! Let's clean up this code. First things first...

<img src="https://static.thenounproject.com/png/4260107-200.png" width=20 style="float: left; padding: 3px" /> Rename this notebook if necessary with an [**expressive and machine-readable file name**](https://www.earthdatascience.org/courses/intro-to-earth-data-science/open-reproducible-science/get-started-open-reproducible-science/best-practices-for-organizing-open-reproducible-science/)

---

## Python **packages** let you use code written by experts around the world

Because Python is open source, lots of different people and organizations can contribute (including you!). Many contributions are in the form of **packages** which do not come with a standard Python download. Read more in your textbook: 
  * <img src="https://uxwing.com/wp-content/themes/uxwing/download/education-school/read-book-icon.png" width=20 style="float: left; padding: 3px" />  [Packages need to be installed and **imported**](https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/use-python-packages/). 

  * <img src="https://uxwing.com/wp-content/themes/uxwing/download/education-school/read-book-icon.png" width=20 style="float: left; padding: 3px" /> In the cell below, someone was trying to import the **pandas package**, which helps us to work with [**tabular data** such as comma-separated value or csv files](https://www.earthdatascience.org/courses/intro-to-earth-data-science/file-formats/use-text-files/).

<img src="https://static.thenounproject.com/png/4260107-200.png" width=20 style="float: left; padding: 3px" /> Your task -- **uncomment** the code in the cell below by removeing the `#` symbol on the left of line 2, and correct the typo to properly import the pandas package under its **alias** pd.

In [1]:
#can't get this to work :(
import pandas as pd



Once you have run the cell above and imported `pandas`, **run the cell below**. It is a test cell that will tell you if you completed the task successfully. If a test cell isn't working the way you expect, check that you ran your code **immediately before** running the test.

In [2]:
# DO NOT MODIFY THIS TEST CELL
points = 0
try:
    pd.DataFrame()
    points += 5
    print('\u2705 Great work! You correctly imported the pandas library.')
except:
    print('\u274C Oops - pandas was not imported correctly.')
print('You earned {} of 5 points for importing pandas'.format(points))

✅ Great work! You correctly imported the pandas library.
You earned 5 of 5 points for importing pandas


---

## There are more Earth Observation data online than any one person could ever look at

[NASA's Earth Observing System Data and Information System (EOSDIS) alone manages over 9PB of data](https://www.earthdata.nasa.gov/learn/articles/getting-petabytes-people-how-eosdis-facilitates-earth-observing-data-discovery-and-use). 1 PB is roughly 100 times the entire Library of Congress (a good approximation of all the books available in the US). It's all available to **you** once you learn how to download what you want.

The following workflow looks at **maximum daily average temperatures** over time in Rapid City, South Dakota. This notebook uses data from the National Centers for Environmental Information (NCEI). [Check out the NCEI Climate at a Glance website where you can search for more data like this](https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/).
  > **Wait a second - what is maximum daily average temperature?** NCEI first takes the daily average temperature. Then, they take the annual maximum. You'll notice these temperatures are a bit lower than we would expect from maxima - that's because nighttime temperatures get incorporated into the daily average.

<img src="https://static.thenounproject.com/png/5640527-200.png" width=20 style="float: left; padding: 3px" /> Your task:
  1. Research the **[Climate at a Glance](https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/)** data source. 
  2. In the cell below, write a 2-3 sentence description of the data source. You should describe:
     - who takes the data
     - where the data were taken
     - what the maximum temperature units are
     - how the data are collected.
  3. Include a citation of the data (HINT: NCEI has a section for 'Citing this page', but you will have to select a particular dataset such as `City` > `Time Series`).



**YOUR DATA DESCRIPTION AND CITATION HERE**

## You can access NCEI Climate At a Glance Data from the internet using its URL

The cell below contains the URL for the data you will use in this part of the notebook. We got that URL by right-clicking on the blue `CSV` download button. You don't have to do that just yet -- this URL is correct! However, we still have a problem - we can't get the URL back  later on because it isn't saved in a **variable**. In other words, we need to give the url a name so that we can request in from Python later (sadly, Python has no 'hey what was that thingy I typed earlier?' function)

<img src="https://uxwing.com/wp-content/themes/uxwing/download/education-school/read-book-icon.png" width=20 style="float: left; padding: 3px" /> Check out the [textbook section on variables](https://www.earthdatascience.org/courses/intro-to-earth-data-science/python-code-fundamentals/get-started-using-python/variables/)

<img src="https://static.thenounproject.com/png/4260107-200.png" width=20 style="float: left; padding: 3px" /> **Your task:**
  1. Pick an expressive variable name for the URL
     > HINT: click on the `Variables` button up top to see all your variables. Your new url variable will not be there until you define it and run the code
  2. Reformat the URL so that it adheres to the [79-character PEP-8 line limit](https://peps.python.org/pep-0008/#maximum-line-length)
     > HINT: You should see two vertical lines in each cell - don't let your code go past the second line
  3. At the end of the cell where you define your url variable, **call your variable (type out its name)** so it can be tested.

In [3]:
("https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/city"
 "/time-series/USW00024090/tmax/ann/2/1949-2023.csv")

'https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/city/time-series/USW00024090/tmax/ann/2/1949-2023.csv'

In [4]:
# DO NOT MODIFY THIS TEST CELL
resp_url = _
points = 0

if type(resp_url)==str:
    points += 3
    print('\u2705 Great work! You correctly called your url variable.')
else:
    print('\u274C Oops - your url variable was not called correctly.')

if len(resp_url)==117:
    points += 3
    print('\u2705 Great work! Your url is the correct length.')
else:
    print('\u274C Oops - your url variable is not the correct length.')

print('You earned {} of 6 points for defining a url variable'.format(points))

✅ Great work! You correctly called your url variable.
✅ Great work! Your url is the correct length.
You earned 6 of 6 points for defining a url variable


---

## Download and get started working with NCEI data

The `pandas` library you imported can download data from the internet directly into a type of Python **object** called a `DataFrame`. In the code cell below, you can see an attempt to do just this. But there are some problems...

YOUR ANSWER HERE

<img src="https://static.thenounproject.com/png/4260107-200.png" width=20 style="float: left; padding: 3px" /> You're ready to fix some code! Your task is to:
  1. Make any changes needed to get this code to run. Here's some hints:
     > HINT: The my_url variable doesn't exist - you need to replace it with the variable name **you** chose.
  2. Modify the value of the `header` parameter so that only numeric data values are included in each column.
  3. Clean up the code by using **expressive variable names**, **expressive column names**, **PEP-8 compliant code**, and **descriptive comments**

**Make sure to call your `DataFrame` by typing it's name as the last line of your code cell** Then, you will be able to run the test cell below and find out if your answer is correct.


In [5]:
#download
dataframe = pd.read_csv(my_url, header=2, names=['col_1', 'col_2'])
dataframe



NameError: name 'my_url' is not defined

In [None]:
# DO NOT MODIFY THIS TEST CELL
tmax_df_resp = _
points = 0

if isinstance(tmax_df_resp, pd.DataFrame):
    points += 1
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')
    
summary = [round(val, 2) for val in tmax_df_resp.mean().values]
if summary == [198562.0, 58.89]:
    points += 4
    print('\u2705 Great work! You correctly downloaded data.')
else:
    print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for downloading data'.format(points))

  > HINT: Check out the `type()` function below - you can use it to check that your data is now in `DataFrame` type object

In [None]:
# Check that the data was imported into a pandas DataFrame
type(dataframe)

---

## Cleaning up your `DataFrame`

Take a look at your data. Do you want to use it as is, or does it need to be modified? The original author of this code thought it needed some modification, but didn't document their work very well.

<img src="https://static.thenounproject.com/png/4260107-200.png" width=20 style="float: left; padding: 3px" /> Playing with code: your task

 1. Replace `dataframe` with the name of **your** dataframe whenever it appears.
 2. Run the code below.

In [None]:
# ncei has wacky years
dataframe.iloc[:,0] = dataframe.iloc[:,0] // 100
dataframe


In [None]:
# DO NOT MODIFY THIS TEST CELL
tmax_df_resp = _
points = 0

if isinstance(tmax_df_resp, pd.DataFrame):
    points += 1
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')
    
summary = [round(val, 2) for val in tmax_df_resp.mean().values]
if summary == [1985.5, 58.89]:
    points += 4
    print('\u2705 Great work! You correctly cleaned up years.')
else:
    print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for cleaning up years'.format(points))

<img src="https://static.thenounproject.com/png/3842781-200.png" width=20 style="float: left; padding: 3px" /> Want an EXTRA CHALLENGE? Modify the code to be **more expressive**.

Rewrite the code below to select columns by **name** instead of by **index**. You might find the [pandas User Guide section on slicing and dicing](https://pandas.pydata.org/docs/user_guide/indexing.html) to be useful. However - don't worry if you can't figure this out yet! We're going to talk a lot about how to use pandas `DataFrame`s. 

YOUR ANSWER HERE

YOUR ANSWER HERE

YOUR ANSWER HERE

In [None]:
#convert to celcius
dataframe.iloc[:,1] = dataframe.iloc[:,1] - 32 * 5 / 9
dataframe


In [None]:
# DO NOT MODIFY THIS TEST CELL
tmax_df_resp = _
points = 0

if isinstance(tmax_df_resp, pd.DataFrame):
    points += 1
    print('\u2705 Great work! You called a DataFrame.')
else:
    print('\u274C Oops - make sure to call your DataFrame for testing.')
    
summary = [round(val, 2) for val in tmax_df_resp.mean().values]
if summary == [1985.5, 58.89, 14.94]:
    points += 4
    print('\u2705 Great work! You correctly converted to Celcius.')
else:
    print('\u274C Oops - your data are not correct.')
print('You earned {} of 5 points for converting to Celcius'.format(points))

<img src="https://static.thenounproject.com/png/3842781-200.png" width=20 style="float: left; padding: 3px" /> Want an **EXTRA CHALLENGE**?
  1. As you did above, rewrite the code to be more expressive
  2. Using the code below as a framework, write and apply a **function** that converts to Celcius.
     > **Functions** let you reuse code you have already written
  
  3. You should also rewrite this function name to be more expressive.
  
        ```python
        def convert(temperature):
            """Convert temperature to Celcius"""
            return temperature # Put your equation in here

        dataframe['temp_c'] = dataframe['temp_f'].apply(convert)
        ```

## Plot the maximum annual temperature in Rapid City, SD, USA

Plotting in Python is easy, but not quite this easy! You'll always need to add some instructions on labels and how you want your plot to look.

  1. Change `dataframe` to **your** `DataFrame` name.
  2. Change `'col_1'` and `'col_2'` to **your** column names
  3. Use the `title`, `ylabel`, and `xlabel` parameters to add key text to your plot.
  
> HINT: labels have to be a type in Python called a **string**. You can make a string by putting quotes around your label, just like the column names in the sample code.

In [None]:
dataframe.plot(x='col_1', y='col_2')



**THIS ISN'T THE END! Don't forget to complete the next task where you will describe your plot**
    
<img src="https://www.nps.gov/pais/learn/nature/images/NPS-KempsRidley-Hatchlings.JPG" height=150 style="padding: 1em; border-style: solid; border-color: grey;" />

> Image source: https://www.nps.gov/pais/learn/nature/hatchlingreleases.htm

<img src="https://static.thenounproject.com/png/3842781-200.png" width=20 style="float: left; padding: 3px" /> Want an **EXTRA CHALLENGE**?

There are many other things you can do to customize your plot. Take a look at the [pandas plotting galleries](https://pandas.pydata.org/docs/user_guide/visualization.html) and the [documentation of plot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html) to see if there's other changes you want to make to your plot. Some possibilities include:
  * Remove the legend since there's only one data series
  * Increase the figure size
  * Increase the font size
  * Change the colors
  * Use a bar graph instead (usually we use lines for time series, but since this is annual it could go either way)
  * Add a trend line

## Describe your plot **in the Markdown cell below**

We like to use an approach called "Assertion-Evidence" for presenting scientific results. There's a lot of video tutorials and example talks available on [the Assertion-Evidence web page](https://www.assertion-evidence.com/). The main thing you need to do now is to practice writing a **message** or **headline** rather than descriptions or topic sentences for the plot you just made (what they refer to as "visual evidence").

For example, it would be tempting to write something like "A plot of maximum annual temperature in Rapid City, South Dakota over time (1947-2023)". However, this doesn't give the reader anything to look at, or explain why we made this particular plot (we know, you made **this** one because we told you to)

Some alternatives that are more of a starting point for a presentation or conversation are:
  * Rapid City, SD, USA experienced extreme heat in 2013
  * Extreme temperatures in Rapid City, SD appear to be on the rise over the past 70 years
  * Maximum annual temperatures in Rapid City, SD are becoming more variable over the previous 70 years
  
We could back up some of these claims with further analysis included later on, but we want to make sure that our audience has some guidance on what to look for in the plot.


## YOUR RAPID CITY PLOT HEADLINE HERE
Describe your plot in this cell in 2-3 sentences

**THIS ISN'T THE END EITHER! Don't forget to reproduce your analysis in a new location!**

<img src="https://static.independent.co.uk/s3fs-public/thumbnails/image/2008/12/26/20/107000.jpg" height=150 style="padding: 1em; border-style: solid; border-color: grey;" >

> Image source: https://www.independent.co.uk/climate-change/news/by-the-left-quick-march-the-emperor-penguins-migration-1212420.html

## Your turn: pick a new location and/or measurement to plot
Below, recreate the workflow you just did in a place that interests you OR with a different measurement. See the instructions above fore how to get your URL. You will need to make your own new Markdown and Code cells below this one.

## Congratulations, you finished this coding challenge -- now make sure that your code is **reproducible**

1. If you didn't already, go back to the code you modified about and write more descriptive **comments** so the next person to use this code knows what it does.

2. Make sure to `Restart` and `Run all`  up at the top of your notebook. This will clear all your variables and make sure that your code runs in the correct order. It will also export your work in Markdown format, which you can put on your website.

<img src="https://dfwurbanwildlife.com/wp-content/uploads/2018/03/SnowGeese16.jpg" height=150 style="padding: 1em; border-style: solid; border-color: grey;" />

> Image source: https://dfwurbanwildlife.com/2018/03/25/chris-jacksons-dfw-urban-wildlife/snow-geese-galore/

In [None]:
!jupyter nbconvert --to markdown *.ipynb --TagRemovePreprocessor.remove_cell_tags='{"remove_cell"}'