# Data basics

# Python and `numpy`

Although there are **many** resources out there for Python and `numpy`, I will only direct you to a few:
1. The [official documentation](https://docs.python.org/3/tutorial/index.html) is always a great resource, but may overwhelm a novice and includes aspects that will be less important for your first tasks. 
2. [Software Carpentry](https://swcarpentry.github.io/python-novice-inflammation/01-intro.html) provides an excellent introduction for novice coders interested in scientific applications. 
3. A data-science oriented [cheat sheet](https://www.datacamp.com/cheat-sheet/getting-started-with-python-cheat-sheet). 

To complete the following task, I recommend going *backwards* through these resources (try to use the cheat sheet, then Software Carpentry etc) and/or utilizing Google searches.

## Mini-assignment 0.5:

Let's say I wrote down some climate data and I'm typing it up as a [list](https://docs.python.org/3/tutorial/datastructures.html), and I want to convert to Celcius

In [None]:
import numpy as np  # Import NumPy for data manipulation

day = [1, 2, 3, 4, 5]
temp = [60, 65, 68, 61, 58]
humidity = [20, 25, 32, 28, 25]


In [None]:
temp_c = (temp - 32) * (5/9)

Why doesn't this work?

What if we convert our lists to [arrays](https://numpy.org/doc/stable/reference/generated/numpy.array.html)?

In [None]:
day_arr = np.array(day)

# Continue the script so that you can perform the following conversion:

temp_c = 

## Mini-assignment 1

Print the **humidity** value that corresponds to the highest **temperature** value in one line of code. Do this once for the *list* version of the data and once for the *array* version of the data. 

In [None]:
# Your code here

# `pandas`

`pandas` is an alternative to Excel for managing tabular data. An excellent introduction is [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html).

In [None]:
import pandas as pd  # Import Pandas for data handling

# Load temperature data from a CSV file

# As long as this .csv file is in the current directory
weather_data = pd.read_csv('williamsburg_meteo.csv')

# Peek at the data and particularly the column names
weather_data.head()

We can create new columns and fill them with a single value or perform an opreation on a column:

In [None]:
weather_data['QC'] = 'good' # a pretend "quality control column"

weather_data['datetime'] = pd.to_datetime(weather_data['DATE']) # the pd.to_datetime() just reads the dates as a specific type of data that plots well for time series

weather_data['PRCP_cm'] = weather_data['PRCP'] * 2.54 # convert inches to centimeters

weather_data.head()

## Mini-assignment 2

Precip and temperature data were originally given in imperial units. Create new columns where temperature values are given in the metric system (Celcius).

In [None]:
# your code here

# Mini-assignment 3

Print the **date** of the highest-recorded daily rainfall in the record (consult [the docs](https://pandas.pydata.org/docs/reference/frame.html#computations-descriptive-stats) or Google. Note you will have to look at the value of one column to get the value in another column).

In [None]:
# your code here

# `matplotlib`

`pyplot` is the basic plotting package for Python and strongly resembles that of MATLAB. Good introductions are [here](https://matplotlib.org/stable/tutorials/pyplot.html) and [here](https://matplotlib.org/stable/users/explain/quick_start.html). 

In [None]:
import matplotlib.pyplot as plt  # Import matplotlib for plotting


In [None]:
# For ease, we will define separate variables as the columns in our DataFrame. 

date = weather_data['datetime']

temperature = weather_data['TOBS'] 

We can now use the `plt` module we loaded to make a simple plot. You can always [read the docs](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html) to understand the arguments that plotting functions take. (And you *find* the docs by searching "matplotlib [function]")

In [None]:
plt.plot(date, temperature)

## Mini-assignment 4

Create a [scatter](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html) plot of temperature over time with points colored by precipitation. Most plotting functions allow you to specify a `c` axis that colors certain datapoints to be a third data axis for data-rich plots. When you do that you'll want to add a [colorbar](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.colorbar.html) so your viewers know what they're looking at. 

I'll get you started:

In [None]:
# Specify the "c" keyword to add precipitation data as colors!
plt.scatter(???, ???, c=???)


# Call the "colorbar()" class to add a colorbar!
plt.colorbar(label="precipitation")

# Set the title of the plot
plt.title('??')

# Label the x-axis
plt.xlabel('??')

# Label the y-axis
plt.ylabel('???')

## Using `plt.plot()`

In Matplotlib, both plt.figure() and fig, ax = plt.subplots() are used to create figures for object-oriented plotting, but they have different use cases and behaviors:

`plt.figure()`:

- `plt.figure()` is used to create a single figure object, and it returns a reference to that figure. This figure can contain one or more subplots (Axes).

- When you create plots using `plt.plot()`, `plt.scatter()`, etc., without explicitly specifying an Axes object, Matplotlib will automatically create an Axes within the current figure.

- It is useful when you want to create a single plot without multiple subplots, and you are not concerned about creating multiple axes explicitly.

Here, we will create a `figure` object

In [None]:
# Create a figure with a specific size (10x6 inches)
plt.figure(figsize=(10, 6))

# Create a line plot using time on the x-axis and temperature on the y-axis
# Customize the plot with blue color, circular markers, solid lines, and marker size
plt.plot(date, temperature, color='blue', marker='o', linestyle='-', markersize=4)

# Set the title of the plot
plt.title('Temperature Over Time')

# Label the x-axis
plt.xlabel('Time')

# Label the y-axis
plt.ylabel('Temperature (°C)')


Note we can do things like [set the limits of axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylim.html):

In [None]:
# Create a figure with a specific size (10x6 inches)
plt.figure(figsize=(10, 6))

# Create a line plot using time on the x-axis and temperature on the y-axis
# Customize the plot with blue color, circular markers, solid lines, and marker size
plt.plot(date, temperature, color='blue', marker='o', linestyle='-', markersize=4)

# Set the title of the plot
plt.title('Temperature Over Time')

# Label the x-axis
plt.xlabel('Time')

# Set axis limits from 0 to 100
plt.ylim(0, 100)

# Label the y-axis
plt.ylabel('Temperature (°C)')

## Using `ax` objects

In contrast, we can use `fig, ax = plt.subplots()`:

- Multiple Subplots: `plt.subplots()` is used to create a figure (Fig) and one or more subplots (Axes) within that figure. It returns both the figure and an array of Axes objects.

- Explicit Axes: You explicitly create and specify the Axes objects when using `fig, ax = plt.subplots()`. This allows you to have more control over the placement and arrangement of subplots.

- Usage: It is useful when you need to create multiple subplots within a single figure, such as creating a grid of plots.

A main difference is that the syntax for customizing `ax` objects will often include "`set_`" as in `set_xlabel()` as opposed to just `plt.xlabel()`

But you don't need to create multiple plots if you don't want to:

In [None]:
# Create a fig and ax object
fig, ax = plt.subplots(figsize=(10, 6))

# Plot something on the ax object
ax.plot(date, temperature, color='blue', marker='o', linestyle='-', markersize=4)

# Set the title of the axis
ax.set_title('Temperature Over Time')

# Label the x-axis
ax.set_xlabel('Time')

# Label the y-axis
ax.set_ylabel('Temperature (°C)')


## Mini-assignment 5

Make two separate plots for the highest daily temperature and the lowest daily temperature. 

In [None]:
high_temp = ???

low_temp = ???

#Create a fig and ax object
fig, ax = plt.subplots(1,2, figsize=(10,6), sharey=True)

# Now you have an ax object that has two objects in it
# ax[0] is the zeroeth (first) element, ax[1] is the first element, etc. 

# Plot something on the ax object
ax[0].plot(date, high_temp,  color='???', marker='o', linestyle='-', markersize=4)

# Plot something on the ax object
ax[1].plot(date, low_temp,  color='???', marker='o', linestyle='-', markersize=4)

# # Set the title of the axis
# ax.set_title('Temperature Over Time')

# # Label the x-axes
# ax.set_xlabel('Time')

# # Label the y-axis
# ax.set_ylabel('Temperature (°C)')


In [None]:
# Note that they can be displayed on the same plot
# It just depends on what you want to show

# Create a fig and ax object
fig, ax = plt.subplots(figsize=(10, 6))

# Plot something on the ax object
ax.plot(date, high_temp, color='???', marker='o', linestyle='-', markersize=4)
ax.plot(date, low_temp, color='???', marker='o', linestyle='-', markersize=4)

## Using `pandas`' built-in plotting functions

`pandas` actually has its own [plotting functions](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html) that use a slightly different syntax for quick visualization of data.

You can see below that the syntax is `[name of the data frame].plot.[type of plot]` for something like a [scatterplot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.scatter.html).

In [None]:
# Use the built-in plot() function to create a line plot
weather_data.plot.scatter(x='datetime', y='TMAX', c='TMIN', title='Example Plot', marker='o', cmap='viridis')

You can specify the `ax` object to plot on for maximum customization of axes!

In [None]:
# Create a fig and ax object
fig, ax = plt.subplots(figsize=(10, 6))

# Use the built-in plot() function to create a line plot
weather_data.plot.scatter(x='datetime', y='TMAX', c='TMIN', title='Example Plot', marker='o', cmap='viridis', ax=ax)

ax.set_ylim(0, 100)

ax.set_ylabel('Maximum temp (F)')

ax.set_xlabel('Date')

## Mini-assignment 6

Bringing all your knowledge together, create a visual that shows both a line plot (which cannot be colored by another variable) and a scatter plot (which can be colored) that shows some data. 

In [None]:
# Create a fig and ax object
fig, ax = plt.subplots(figsize=(10, 6))

# Plot something on the ax object
# zorder tells the program what order to plot objects in
ax.plot(date, ???, color='???', linestyle='-', zorder=0)

# One way to do it is to name a variable the ax object's plot
# I am also specifying a "vmin" and "vmax" which are the maximum and minimum values for the colorbar
scatterplot = ax.scatter(date, ???, c=???, marker='o', linestyle='-',
                         vmin=???, 
                          vmax=???,
                            zorder=1)

# Customize the colorbar by specifying the variable name for the axis object
colorbar = plt.colorbar(scatterplot, ax=ax)
colorbar.set_label('???')  # Set the label for the colorbar

ax.set_ylabel('???')

ax.set_xlabel('???')
