# `Pandas`

I am going to show you a great way to use `pandas` with some weather data. If you feel lost with the following examples or just want more basics, check out the ["Ten minutes to Pandas"](https://pandas.pydata.org/docs/user_guide/10min.html) guide from the pandas docs. 

In [None]:
import pandas as pd 

In [None]:
# As long as this .csv file is in the "data" directory,
# we can oad temperature data from a CSV file
weather_data = pd.read_csv('data/williamsburg_meteo.csv')

# Peek at the data and particularly the column names
weather_data.head()

We can create new columns and fill them with a single value or perform an opreation on a column:

In [None]:
# You can make any column and fill it with anything you want
weather_data['QC'] = 'good' # a pretend "quality control column"

# You can look at data in one column and make a new column with slightly different formatting
weather_data['datetime'] = pd.to_datetime(weather_data['DATE']) # the pd.to_datetime() just reads the dates as a specific type of data that plots well for time series

# You can do a calculation on a column! 
weather_data['PRCP_cm'] = weather_data['PRCP'] * 2.54 # convert inches to centimeters

weather_data.head()

Precip and temperature data were originally given in imperial units. Using the example above where I converted inches to centimeters (`weather_data['PRCP']` to `weather_data['PRCP_cm']`), create new columns where temperature values are given in the metric system (Celcius).

In [None]:
# your code here

Now, let's make some plots:

In [None]:
# For ease, we will define separate variables as the columns in our DataFrame. 
# This way you can type "date" instead of the DataFrame and column name

date = weather_data['datetime']

temperature = weather_data['TOBS'] 

# `matplotlib`

[`matplotlib`](https://matplotlib.org/stable/users/index.html) is a plotting package that helps you visualize your data. If you have worked with Matlab, this will feel familiar. 

In [None]:
import matplotlib.pyplot as plt

We can now use the `plt` module we loaded to make a simple plot. You can always [read the docs](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html) to understand the arguments that plotting functions take. (And you *find* the docs by searching "matplotlib [function]")

In [None]:
plt.plot(date, temperature)

## Your turn!

Create a [scatter](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html) plot of temperature over time with points colored by precipitation. Most plotting functions allow you to specify a `c` axis that colors certain datapoints to be a third data axis for data-rich plots. When you do that you'll want to add a [colorbar](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.colorbar.html) so your viewers know what they're looking at. 

I'll get you started:

In [None]:
# Specify the "c" keyword to add precipitation data as colors!
plt.scatter(???, ???, c=???)


# Call the "colorbar()" class to add a colorbar!
plt.colorbar(label="precipitation")

# Set the title of the plot
plt.title('??')

# Label the x-axis
plt.xlabel('??')

# Label the y-axis
plt.ylabel('???')

## The nuances of plot making

In Matplotlib, both plt.figure() and fig, ax = plt.subplots() are used to create figures for object-oriented plotting, but they have different use cases and behaviors:

`plt.figure()`:

- `plt.figure()` is used to create a single figure object, and it returns a reference to that figure. This figure can contain one or more subplots (Axes).

- When you create plots using `plt.plot()`, `plt.scatter()`, etc., without explicitly specifying an Axes object, Matplotlib will automatically create an Axes within the current figure.

- It is useful when you want to create a single plot without multiple subplots, and you are not concerned about creating multiple axes explicitly.

Here, we will create a `figure` object

In [None]:
# Create a figure with a specific size (10x6 inches)
plt.figure(figsize=(10, 6))

# Create a line plot using time on the x-axis and temperature on the y-axis
# Customize the plot with blue color, circular markers, solid lines, and marker size
plt.plot(date, temperature, color='blue', marker='o', linestyle='-', markersize=4)

# Set the title of the plot
plt.title('Temperature Over Time')

# Label the x-axis
plt.xlabel('Time')

# Label the y-axis
plt.ylabel('Temperature (°C)')

Note we can do things like [set the limits of axes](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.ylim.html):

In [None]:
# Create a figure with a specific size (10x6 inches)
plt.figure(figsize=(10, 6))

# Create a line plot using time on the x-axis and temperature on the y-axis
# Customize the plot with blue color, circular markers, solid lines, and marker size
plt.plot(date, temperature, color='blue', marker='o', linestyle='-', markersize=4)

# Set the title of the plot
plt.title('Temperature Over Time')

# Label the x-axis
plt.xlabel('Time')

# Set axis limits from 0 to 100
plt.ylim(0, 100)

# Label the y-axis
plt.ylabel('Temperature (°C)')

In contrast, we can use `fig, ax = plt.subplots()`:

- Multiple Subplots: `plt.subplots()` is used to create a figure (Fig) and one or more subplots (Axes) within that figure. It returns both the figure and an array of Axes objects.

- Explicit Axes: You explicitly create and specify the Axes objects when using `fig, ax = plt.subplots()`. This allows you to have more control over the placement and arrangement of subplots.

- Usage: It is useful when you need to create multiple subplots within a single figure, such as creating a grid of plots.

A main difference is that the syntax for customizing `ax` objects will often include "`set_`" as in `set_xlabel()` as opposed to just `plt.xlabel()`

In [None]:
# Create a fig and an ax object with two elements
fig, ax = plt.subplots(1, 2, figsize=(12, 4))

# Plot something on the ax object
ax[0].plot(date, temperature, color='red', marker='o', linestyle='-', markersize=4)

# Set the title of the axis
ax[0].set_title('Temperature Over Time')

# Label the x-axis
ax[0].set_xlabel('Time')

# Label the y-axis
ax[0].set_ylabel('Temperature (°C)')

# Plot something on the ax object
ax[1].plot(date, weather_data['PRCP'], color='blue', marker='o', linestyle='-', markersize=4)

# Set the title of the axis
ax[1].set_title('Precipitation Over Time')

# Label the x-axis
ax[1].set_xlabel('Time')

# Label the y-axis
ax[1].set_ylabel('Precipitation (in)')

But you don't need to create multiple plots if you don't want to:

In [None]:
# Create a fig and ax object
fig, ax = plt.subplots(figsize=(10, 6))

# Plot something on the ax object
ax.plot(date, temperature, color='blue', marker='o', linestyle='-', markersize=4)

# Set the title of the axis
ax.set_title('Temperature Over Time')

# Label the x-axis
ax.set_xlabel('Time')

# Label the y-axis
ax.set_ylabel('Temperature (°C)')


## Plot with `pandas`

`pandas` actually has its own [plotting functions](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html) for quick visualization of data.

You can see below that the syntax is `[name of the data frame].plot.[type of plot]` for something like a [scatterplot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.scatter.html).

In [None]:
# Use the built-in plot() function to create a line plot
weather_data.plot.scatter(x='datetime', y='TMAX', c='TMIN', title='Example Plot', marker='o', cmap='viridis')

Whew! Congrats, you've made it to the end of the introduction to Python, data, and plotting!!

You can experiment with plotting in the cells above. If the above material was a bit of a stretch for you and you need to digest it a little, take the rest of the time to read over the content and convince yourself you know what goes on when you call certain lines of code. 