# Dictionaries, but mostly `pandas`

`pandas` is an alternative to Excel for managing tabular data. An excellent introduction is [here](https://pandas.pydata.org/pandas-docs/stable/user_guide/10min.html).

The two ways you might turn data from something that is not a dataframe into something that is a dataframe is probably either (1) turning a dictionary into a dataframe or (2) loading a delimited text file into pandas

# Dataframes from dictionaries

## But what's a dictionary?

Remember from two weeks ago that [dictionaries](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) (`dict`) are used for holding metadata attributes (e.g., instrument specifications, geographic coordinates) associated with environmental measurements. They also are used for organizing and accessing information, such as referencing a list of properties of data point by its name.

Let us return to that fake weather data:

In [None]:
day = [1, 2, 3, 4, 5]
temp = [60, 65, 68, 61, 58]
humidity = [20, 25, 32, 28, 25]

If we wanted to store these lists of numbers in a dictionary, you would (1) create a dictionary, (2) create keys for the dictionary (e.g. `'day'`), and (3) assign values to those keys

In [None]:
site_data = {}

site_data['day'] = day
site_data['temp'] = temp
site_data['humidity'] = humidity

site_data


The most important thing you need to know right now about dictionaries is that every key has to be unique, and there is nothing ordered to a dictionary (e.g. unlike being able to access the 2nd or 3rd element in a list or array, you cannot access the 2nd or 3rd key in a dictionary - you have to give it a key).

## Example dictionary to pandas DataFrame

In [None]:
import pandas as pd

In [None]:
pd.DataFrame.from_dict(site_data, orient='columns')

Pandas knows to take your data and make the dictionary keys the names of columns in a DataFrame. Great!

# Pandas from delimited data

In [None]:
# Load temperature data from a CSV file

# I'm using the two dots before the 0909 folder to say "Hey, go up a directory to find 0909"
weather_data = pd.read_csv('../0909/williamsburg_meteo.csv')

# Peek at the data and particularly the column names
weather_data.head()

We can create new columns and fill them with a single value or perform an opreation on a column:

In [None]:
weather_data['QC'] = 'good' # a pretend "quality control column"

weather_data['datetime'] = pd.to_datetime(weather_data['DATE']) # the pd.to_datetime() just reads the dates as a specific type of data that plots well for time series

weather_data['PRCP_cm'] = weather_data['PRCP'] * 2.54 # convert inches to centimeters

weather_data.head()

## Mini-assignment 1

Precip and temperature data were originally given in imperial units. Create new columns where temperature values are given in the metric system (Celcius).

In [None]:
# your code here

## Mini-assignment 2

Print the **date** of the highest-recorded daily rainfall in the record (consult [the docs](https://pandas.pydata.org/docs/reference/frame.html#computations-descriptive-stats) or Google. Note you will have to look at the value of one column to get the value in another column).

In [None]:
# your code here

# `pandas` and `matplotlib`

Like in Excel, data organization is only half the fun - we want to make plots too!

In [None]:
import matplotlib.pyplot as plt

## Mini-assignment 3

Remembering all your `matplotlib` lessons from last week, make two separate plots for the highest daily temperature and the lowest daily temperature. 

In [None]:
high_temp = ???

low_temp = ???

#Create a fig and ax object
fig, ax = plt.subplots(1,2, figsize=(10,6), sharey=True)

# Now you have an ax object that has two objects in it
# ax[0] is the zeroeth (first) element, ax[1] is the first element, etc. 

# Plot something on the ax object
ax[0].plot(date, high_temp,  color='???', marker='o', linestyle='-', markersize=4)

# Plot something on the ax object
ax[1].plot(date, low_temp,  color='???', marker='o', linestyle='-', markersize=4)

# # Set the title of the axis
# ax.set_title('Temperature Over Time')

# # Label the x-axes
# ax.set_xlabel('Time')

# # Label the y-axis
# ax.set_ylabel('Temperature (°C)')


In [None]:
# Note that they can be displayed on the same plot
# It just depends on what you want to show

# Create a fig and ax object
fig, ax = plt.subplots(figsize=(10, 6))

# Plot something on the ax object
ax.plot(date, high_temp, color='???', marker='o', linestyle='-', markersize=4)
ax.plot(date, low_temp, color='???', marker='o', linestyle='-', markersize=4)

## Using `pandas`' built-in plotting functions

`pandas` actually has its own [plotting functions](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html) that use a slightly different syntax for quick visualization of data.

You can see below that the syntax is `[name of the data frame].plot.[type of plot]` for something like a [scatterplot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.scatter.html).

In [None]:
# Use the built-in plot() function to create a line plot
weather_data.plot.scatter(x='datetime', y='TMAX', c='TMIN', title='Example Plot', marker='o', cmap='viridis')

You can specify the `ax` object to plot on for maximum customization of axes!

In [None]:
# Create a fig and ax object
fig, ax = plt.subplots(figsize=(10, 6))

# Use the built-in plot() function to create a line plot
weather_data.plot.scatter(x='datetime', y='TMAX', c='TMIN', title='Example Plot', marker='o', cmap='viridis', ax=ax)

ax.set_ylim(0, 100)

ax.set_ylabel('Maximum temp (F)')

ax.set_xlabel('Date')

## Mini-assignment 4

Bringing all your knowledge together, create a visual that shows both a line plot (which cannot be colored by another variable) and a scatter plot (which can be colored) that shows some data. 

In [None]:
# Create a fig and ax object
fig, ax = plt.subplots(figsize=(10, 6))

# Plot something on the ax object
# zorder tells the program what order to plot objects in
ax.plot(date, ???, color='???', linestyle='-', zorder=0)

# One way to do it is to name a variable the ax object's plot
# I am also specifying a "vmin" and "vmax" which are the maximum and minimum values for the colorbar
scatterplot = ax.scatter(date, ???, c=???, marker='o', linestyle='-',
                         vmin=???, 
                          vmax=???,
                            zorder=1)

# Customize the colorbar by specifying the variable name for the axis object
colorbar = plt.colorbar(scatterplot, ax=ax)
colorbar.set_label('???')  # Set the label for the colorbar

ax.set_ylabel('???')

ax.set_xlabel('???')
