Originally created for onboarding in the Computational Geomorphology Lab with repo [here](https://github.com/pycogss/getting-started/tree/main). Adapted for 437. 

# Data basics

# Python and `numpy`

Although there are **many** resources out there for Python and `numpy`, I will only direct you to a few:
1. The [official documentation](https://docs.python.org/3/tutorial/index.html) is always a great resource, but may overwhelm a novice and includes aspects that will be less important for your first tasks. 
2. [Software Carpentry](https://swcarpentry.github.io/python-novice-inflammation/01-intro.html) provides an excellent introduction for novice coders interested in scientific applications. 
3. A data-science oriented [cheat sheet](https://www.datacamp.com/cheat-sheet/getting-started-with-python-cheat-sheet). 

To complete the following task, I recommend going *backwards* through these resources (try to use the cheat sheet, then Software Carpentry etc) and/or utilizing Google searches.

## Mini-assignment 1:

Let's say I wrote down some climate data and I'm typing it up as a [list](https://docs.python.org/3/tutorial/datastructures.html), and I want to convert to Celcius

In [None]:
import numpy as np  # Import NumPy for data manipulation

day = [1, 2, 3, 4, 5]
temp = [60, 65, 68, 61, 58]
humidity = [20, 25, 32, 28, 25]


In [None]:
temp_c = (temp - 32) * (5/9)

Why doesn't this work?

What if we convert our lists to [arrays](https://numpy.org/doc/stable/reference/generated/numpy.array.html)?

In [None]:
day_arr = np.array(day)

# Continue the script so that you can perform the following conversion:

temp_c = 

## Mini-assignment 2

Print the **humidity** value that corresponds to the highest **temperature** value in one line of code. Do this once for the *list* version of the data and once for the *array* version of the data. 

In [None]:
# Your code here

# Using `numpy` functions to generate synthetic data

In Python, functions are reusable blocks of code that perform specific tasks.You can define a function using the `def` keyword, followed by the function's name and parentheses, which can include <b>parameters</b> (inputs) that the function will use. After defining the function, you can "call" it by using its name followed by parentheses, optionally passing in arguments (values) for the parameters. Functions can also return a value using the `return` statement, allowing you to capture the result and use it elsewhere in your code.

Here is an example of a super-simple function:

In [None]:
def add_value(data, value):
    return data + value

In [None]:
x = np.arange(0,20,1)

# Do you remember what np.arange() does?

x_new = add_value(x, 3)

print(x_new)

Consider how you can use the `random` module within `numpy` to artificially generate noise in an existing signal

In [None]:
x = np.arange(0,20,1)
y = np.arange(10,30,1)

def add_noise(data, magnitude):
    # the * in front of np.shape unpacks the tupe...don't worry about that
    return data + (np.random.rand(*np.shape(data)) * magnitude)

y_noise = add_noise(y, 1)

In [None]:
print(y_noise)

What happens if you change the `magnitude` argument in the `add_noise` function?

In [None]:
# Your answer here

Now I am going to get tricky here and model annual temperature as a sine wave for us to play with. 

In [None]:
def annual_temperature(day, amplitude, avg_temp, hottest_day_of_year):
    """
    Here's some docs for this function:
    This simulates annual temperature! You can change these parameters:

    amplitude = 10  # Annual range of temperature in C
    avg_temp = 15  # In C
    hottest_day_of_year = 201  # July 20th is the hottest day of the year
    """
    return amplitude * np.sin((2 * np.pi / 365) * (day - hottest_day_of_year + 365/4)) + avg_temp

Let's spin this neat ""model"" up!

In [None]:
day = np.arange(1,366,1) # list of days of the year

amplitude = 10  # Annual range of temperature in C
avg_temp = 15  # In C
hottest_day_of_year = 201  # July 20th is the hottest day of the year

temperature_data = annual_temperature(day, amplitude, avg_temp, hottest_day_of_year)


Now you have a big long array of temperatures!

In [None]:
temperature_data

## Mini-assignment 3

Using the examples above, let's make the synthetic temperature data a little more realistic by introducing random noise to the daily temperature. Generate a `daily_temperature` array with realistic-looking variability!

In [None]:
# Your code here

And now, generate a year's worth of random precipitation data. Get creative - is there a seasonality to your synthetic data?

In [None]:
# Your code here
precipitation = add_noise(np.ones_like(day), 3)

# `matplotlib`

`pyplot` is the basic plotting package for Python and strongly resembles that of MATLAB. Good introductions are [here](https://matplotlib.org/stable/tutorials/pyplot.html) and [here](https://matplotlib.org/stable/users/explain/quick_start.html). 

In [None]:
import matplotlib.pyplot as plt  # Import matplotlib for plotting

We can now use the `plt` module we loaded to make a simple plot. You can always [read the docs](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html) to understand the arguments that plotting functions take. (And you *find* the docs by searching "matplotlib [function]")

In [None]:
plt.plot(day, temperature_data)

## Mini-assignment 4

Create a [scatter](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html) plot of your realistic temperature data (with the noise) over time with points colored by precipitation. Most plotting functions allow you to specify a `c` axis that colors certain datapoints to be a third data axis for data-rich plots. When you do that you'll want to add a [colorbar](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.colorbar.html) so your viewers know what they're looking at. 

I'll get you started:

In [None]:
# Specify the "c" keyword to add precipitation data as colors!
plt.scatter(???, ???, c=???)


# Call the "colorbar()" class to add a colorbar!
plt.colorbar(label="precipitation")

# Set the title of the plot
plt.title('??')

# Label the x-axis
plt.xlabel('??')

# Label the y-axis
plt.ylabel('???')

## Using `plt.plot()`

In Matplotlib, both plt.figure() and fig, ax = plt.subplots() are used to create figures for object-oriented plotting, but they have different use cases and behaviors:

`plt.figure()`:

- `plt.figure()` is used to create a single figure object, and it returns a reference to that figure. This figure can contain one or more subplots (Axes).

- When you create plots using `plt.plot()`, `plt.scatter()`, etc., without explicitly specifying an Axes object, Matplotlib will automatically create an Axes within the current figure.

- It is useful when you want to create a single plot without multiple subplots, and you are not concerned about creating multiple axes explicitly.

Here, we will create a `figure` object

In [None]:
# Create a figure with a specific size (10x6 inches)
plt.figure(figsize=(10, 6))

# Create a line plot using time on the x-axis and temperature on the y-axis
# Customize the plot with blue color, circular markers, solid lines, and marker size
plt.plot(day, ???, color='blue', marker='o', linestyle='-', markersize=4)

# Set the title of the plot
plt.title('Temperature Over Time')

# Label the x-axis
plt.xlabel('Time')

# Label the y-axis
plt.ylabel('Temperature (°C)')


Note we can do things like [set the limits of axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylim.html):

In [None]:
# Create a figure with a specific size (10x6 inches)
plt.figure(figsize=(10, 6))

# Create a line plot using time on the x-axis and temperature on the y-axis
# Customize the plot with blue color, circular markers, solid lines, and marker size
plt.plot(day, ???, color='blue', marker='o', linestyle='-', markersize=4)

# Set the title of the plot
plt.title('Temperature Over Time')

# Label the x-axis
plt.xlabel('Time')

# Set axis limits from 0 to 100
plt.ylim(0, 100)

# Label the y-axis
plt.ylabel('Temperature (°C)')

## Using `ax` objects

In contrast, we can use `fig, ax = plt.subplots()`:

- Multiple Subplots: `plt.subplots()` is used to create a figure (Fig) and one or more subplots (Axes) within that figure. It returns both the figure and an array of Axes objects.

- Explicit Axes: You explicitly create and specify the Axes objects when using `fig, ax = plt.subplots()`. This allows you to have more control over the placement and arrangement of subplots.

- Usage: It is useful when you need to create multiple subplots within a single figure, such as creating a grid of plots.

A main difference is that the syntax for customizing `ax` objects will often include "`set_`" as in `set_xlabel()` as opposed to just `plt.xlabel()`

But you don't need to create multiple plots if you don't want to:

In [None]:
# Create a fig and ax object
fig, ax = plt.subplots(figsize=(10, 6))

# Plot something on the ax object
ax.plot(day, ???, color='blue', marker='o', linestyle='-', markersize=4)

# Set the title of the axis
ax.set_title('Temperature Over Time')

# Label the x-axis
ax.set_xlabel('Time')

# Label the y-axis
ax.set_ylabel('Temperature (°C)')


## Mini-assignment 5

Make two separate plots for the simple and noisy temperature data. 

In [None]:
#Create a fig and ax object
fig, ax = plt.subplots(1,2, figsize=(10,6), sharey=True)

# Now you have an ax object that has two objects in it
# ax[0] is the zeroeth (first) element, ax[1] is the first element, etc. 

# Plot something on the ax object
ax[0].plot(day, ???,  color='???', marker='o', linestyle='-', markersize=4)

# Plot something on the ax object
ax[1].plot(day, ???,  color='???', marker='o', linestyle='-', markersize=4)

# # Set the title of the axis

# # Label the x-axes

# # Label the y-axis


In [None]:
# Note that they can be displayed on the same plot
# It just depends on what you want to show

# Create a fig and ax object
fig, ax = plt.subplots(figsize=(10, 6))

# Plot something on the ax object
ax.plot(day, ???, color='???', marker='o', linestyle='-', markersize=4)
ax.plot(day, ???, color='???', marker='o', linestyle='-', markersize=4)

# Real data!

The following is a way of reading in real weather station data from January 1 to December 31, 2022. 

Note that after next week you will probably never not use `pandas` to do this, so bear with me (hah, pandas, bear...)

In [60]:
precipitation = np.genfromtxt('williamsburg_meteo.csv', delimiter=',', usecols=[4], skip_header=True)
tmax = np.genfromtxt('williamsburg_meteo.csv', delimiter=',', usecols=[5], skip_header=True)
tmin = np.genfromtxt('williamsburg_meteo.csv', delimiter=',', usecols=[6], skip_header=True)

Your in-class assignment will constitute plotting these data and creating <b>deliverables</b> from these data. 

In [None]:
# Code here!

# Week 2 deliverables

Bringing all your knowledge together, 
- Show in a single plot the maximum and minimum temperature throughout the year, colored in an inuitive way. 
- Create a figure that shows both a line plot (which cannot be colored by another variable) and a scatter plot (which can be colored) that shows time series data of your choice.
- Create a scatterplot with one dataset against another where the `x` axis is <b>not</b> time/day of year. 
- Using `numpy`, print out the minimum and maximum temperatures of the five rainiest days of the record. 

Your `.ipynb` file must be executable by me on my computer in order with no errors. You must have mini-assignments completed. You must add any URLs that help you as comments in your code blocks. Your plots must be publication-worthy (axes and colormaps labeled).

