## Welcome to the precipitation trends notebook! 

#### **Audience:** Anybody with a computer and access to at least 4GB of memory.
#### **Intent:** Build familiarity with NOAA's Climate at a Glance dataset and the analysis of climate trends. 
#### **Outcome:** Statistics, graphs, and plots of precipitation trends.           
          
This is a Jupyter Notebook meant to facilitate the analysis of precipitation data over time, using historical records stored at NOAA's National Centers for Environmental Information (NCEI). Users will download and analyze real scientific data and see the precipitation trends in their desired region of the United States or the globe. Because users will be using real local data, some data records may be incomplete. As such, it is important to use scientific analysis skills to assess the usability of the chosen dataset and resulting analyses. 

#### **Read and follow the steps below before beginning the notebook.**   
1. **Go to https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/ and:**
    1. Click on the tab of your desired region (i.e. Nation, State, Division, etc.), then launch "Time Series"
    2. Choose your precipitation Parameter (precipitation).
    3. Choose a Time Scale of 12-month. 
    4. Choose the month of December, so we get a yearly average for Jan - Dec. 
    5. Choose whichever start and end year you'd like. I'd recommend downloading the full range of data. 
    6. Uncheck the "Display Base Period" box to the right of the drop-down menus. 
    7. On the website, click "Plot" to visualize the data you've chosen. Does it look like what you want?
    8. Below the plot, there's an option to Download the data. Download it as a CSV. 
9. **(***If using Binder***) Return to your Binder window and:**
    1. Click the "Upload Files" button just below the "Run" tab. It looks like an up arrow with a line underneath. Upload the csv file you just downloaded in step 1, and make sure it appears in your list of files.
    2. Then, right-click the csv file and click the option "Rename." Change the name of the csv file to "climate_data.csv"
9. **(***If not using Binder***) Return to the program you're using to run the notebook and:** 
    1. Save the csv file you just downloaded in the same directory as your Jupyter notebook. 
    2. Change the name of your csv file to "climate_data.csv"
10. **You may begin the notebook!**

DISCLAIMER: if you choose a region with missing data points, the program will not calculate the long-term linear trendline. 

### Step 1: Import Python modules
This section of code won't produce any output. The "import" statements below make sure the code throughout this entire notebook can be executed correctly. 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
import datetime

### Step 2: Read and format the csv file to prepare it for analysis. 
This section allows you to visualize the first few columns of your dataset and make sure it's been read in correctly. If the code below produces an error, there may be an issue with the csv file. Make sure it's in the same directory as this notebook, and that it's named "precip_data.csv". 

In [None]:
# Getting the geographical area from the csv.
name = pd.read_csv("precip_data.csv", usecols = [0])
if (name.columns[0])[-5:] == "ation": 
    region = (name.columns[0])[:-37]
else: 
    region = name.columns[0]
    
# Getting the variable
variable = "precipitation"

# Reading in the csv file. 
remove_header_lines = [0,1,2] # lines 0, 1, and 2 are not data, and we need to remove them before opening the CSV
precip_data = pd.read_csv("precip_data.csv", skiprows = lambda x: x in remove_header_lines)

# Now let's change the "Date" column from a string to datetime. This way, Python knows we're working with dates and will organize them chronologically during plotting.
precip_data["Date"] = pd.to_datetime(precip_data["Date"], format='%Y%m') 

# We'll also change the date to "Year" only, because this data represents a yearly average. 
precip_data["Date"] = pd.DatetimeIndex(precip_data["Date"]).year

# Let's see a preview of what's inside. 
print(precip_data.head())

### Step 3: Calculate statistics from the dataset
This section is meant to understand the breadth and extent of the dataset. The output allows you to learn specific, quantitative facts about the climate data in your chosen location. Although many of these statistics will be plotted later, you may find it interesting to have written facts. 

In [None]:
# Exploring the beginning and end of the dataset. 
data_start = (precip_data["Date"])[0] #here's the first row in our file.
data_end = (precip_data["Date"])[len(precip_data)-1] #and the last row in our file!
print(f"This {region} dataset begins in {data_start} and ends in {data_end}.")

# Hot & cold years on record 
wettest = precip_data.loc[precip_data['Value'].idxmax()] #maximum precip amount in the dataset
driest = precip_data.loc[precip_data['Value'].idxmin()] #minimum precip amount in the dataset

print() #empty line for better formatting of the print statements.
print(f"The wettest year on record is {round(wettest.Date)} with an annual precipitation amount of {wettest.Value} in") #\xb0 is the degree symbol
print(f"The driest year on record is {round(driest.Date)} with an annual precipitation amount of {driest.Value} in")

# Ranking the most recent year as the nth hottest on record.
precip_data["Rank"] = precip_data["Value"].rank(axis=0, method='max', ascending=False)
recent_rank = ((precip_data["Rank"])[len(precip_data)-1])

print()
print(f"The most recent year in your dataset, {data_end}, ranks number {int(recent_rank)} out of {len(precip_data)} in the wettest years on record.")

# Rate of warming since beginning of record and since 1981 
z1 = np.polyfit(precip_data["Date"], precip_data['Value'], 1) #making a linear trend line with all data!
p1 = np.poly1d(z1)

index_of_1981 = int(precip_data[precip_data['Date']==1981].index[0]) #finding the location of the 1981 data so we can refine our dataset 
z2 = np.polyfit((precip_data["Date"])[index_of_1981:], (precip_data['Value'])[index_of_1981:], 1) #limiting the dataset to 1981-most recent, then making a trendline
p2 = np.poly1d(z2)

print()
print("The average rate of change of precipitation is",round((z1[0])*10,2),"in", "per decade since",data_start)
print("The average rate of change of precipitation is",round((z2[0])*10,2),"in", "per decade since 1981")


# ENSO years on record
nino = [1897,1900,1903,1906,1915,1919,1926,1931,1941,1942,1958,1966,1973,1978,1980,1983,1987,1988,1992,1995,1998,2003,2007,2010,2016] #ENSO dates from https://psl.noaa.gov/enso/past_events.html
nina = [1904,1909,1910,1911,1917,1918,1925,1934,1939,1943,1950,1951,1955,1956,1962,1971,1974,1976,1989,1999,2000,2008,2011,2012,2021,2022]

ninoprecip = []
ninaprecip = []
for y in nino:
    if int(y) >= data_start:
        ninoyear = precip_data.loc[(precip_data['Date']) == y] #matching El Nino years to lines in our dataset
        ninoprecip.append(ninoyear.Value.values) #and adding El Nino annual precip to their own list
for z in nina:
    if int(y) >= data_start:
        ninayear = precip_data.loc[(precip_data['Date']) == z] #same as above, for La Nina
        ninaprecip.append(ninayear.Value.values)

print()
print("The average precipitation amount during El Nino years is",round((sum(ninoprecip)/len(ninoprecip))[0],2),"in")
print("The average precipitation amount during La Nina years is",round((sum(ninaprecip)/len(ninaprecip))[0],2),"in")

### Step 4: Plot all precipitation data, including the range of precipitation amounts. 
This section allows you to visualize the wettest and driest recorded years for your location and understand how the annual precipitation amount changes from year to year.

In [None]:
# Plotting
fix, ax = plt.subplots() 
ax.plot(precip_data["Date"], precip_data['Value'], color="black")
plt.xlabel("Year")
plt.grid()
plt.ylabel("Annual Precipitation (in)") #change the y-axis label to reflect the data product you chose.
plt.title("Annual Precipitation in " + region + "\n ("+str(data_start)+" to "+str(data_end)+")") #change the plot title to reflect the data product you chose.

# Putting the wettest/driest year stats on the plot
plt.axhline(y=wettest.Value,color="blue",linestyle = 'dashed', label = f"Wettest year ({wettest.Value} in)")
plt.axhline(y=driest.Value,color="orange",linestyle = 'dashed', label = f"Driest year ({driest.Value} in)")
plt.legend()

As you can see from the above plot, the annual precipitation amount changes a lot from year-to-year, sometimes increasing and sometimes decreasing. Keep in mind that there are numerous meteorological variables that affect the precipitation that falls within a region, and that large variability over the course of a couple years is normal and expected. 

When scientists analyze precipitation amounts in the context of the climate, there are a couple of strageties we use to look past the year-to-year variability and see the larger picture.

### Step 5: Plot a rolling average to visualize general trends
This section uses a rolling average to visualize precipitation data. The "rolling average" calculation in this notebook smooths the year-to-year variability shown in the previous section so the precipitation trends become more apparent. In the graph below, each datapoint is an average of the previous 5 years. 

In [None]:
# Plot
fix, ax = plt.subplots()
ax.plot(precip_data["Date"], precip_data['Value'].rolling(5).mean(), color="black") 
plt.xlabel("Year")
plt.grid()
plt.ylabel("Average Annual Precipitation (in), \n averaged over 5 years") 
plt.title("Rolling Mean Average Annual Precipitation: \n" + region + " ("+str(data_start)+" to "+str(data_end)+")") 

This plot contains the same data and information as the first one, but the year-to-year variation caused by meteorological variation has mostly disappeared. Now, the larger-scale precipitation trends are more easily visualized.

### Step 6: Plotting linear trend lines (across the whole record and since 1981)
This section also analyzes trends in the precipitation data by calculating linear trend line values. Using the annual precip data located in the csv file, this code calculates a trend over the entire recorded period, then a trend since 1981 (a common start date for climate trend calculations, as global temperatures began to show a notable increase at this time).

In [None]:
# Let's make a linear trend line!
fix, ax = plt.subplots()
ax.plot(precip_data["Date"], precip_data['Value'], color="black")
plt.xlabel("Year")
plt.grid()
plt.ylabel(f"Annual {variable} (in)") 
plt.title(f"Annual {variable} trends in {region}")

#add trendline to plot (entire range of dates)
plt.plot(precip_data["Date"], p1(precip_data["Date"]), color = "darkturquoise", label = f"Linear trendline: {data_start} to {data_end}", linewidth=3) # we already calculated the trendline above, so now we just have to plot it!
plt.legend()

#add trendline to plot (since 1981)
plt.plot((precip_data["Date"])[index_of_1981:], (p2(precip_data["Date"]))[index_of_1981:], color = "orange", label = f"Linear trendline: 1981 to {data_end}", linewidth=3) # we already calculated the trendline above, so now we just have to plot it!
plt.legend()

print(f"Equation for {data_start} to {data_end} trendline: {p1}")
print(f"Equation for 1981 to {data_end} trendline: {p2}")

Depending on the region you chose, these trendlines may look very different. Unlike temperature trends, which consistently show increases across the globe, precipitation patterns change regionally. Some places are experiencing precipitation decreases, and others are experiencing increases. In addition, precipitaiton events, rates, and type are also undergoing unique changes that these annual averages cannot show us. So, this trendline plot can give us a general idea of precipitation changes, but does not show the full picture. 

### Step 7: Plot precipitation anomalies across the whole record
The visualization of precipitation anomalies is another way that scientists analyze rainfall/drought behavior under climate change. In this scenario, an anomaly is the difference between the annual average precipitation amount (calculated for the 20th century) and the precipitation amount from each year. The plot produced will show precipitation anomalies, or the deviation from "normal" precipitation amounts, for your selected region. 

In [None]:
# Formatting dates
try:
    index_of_1901 = int(precip_data[precip_data['Date']==1901].index[0])
except:
    index_of_1901 = int(precip_data[precip_data['Date']==data_start].index[0])
index_of_2001 = int(precip_data[precip_data['Date']==2001].index[0]) 
mean_dataset = (precip_data["Value"])[index_of_1901:index_of_2001]
print(f"The annual {variable} in {region} during the 20th century was {round(mean_dataset.mean(),2)}\xb0F")

# Calculating anomalies
precip_data["Anomaly"] = precip_data["Value"] - mean_dataset.mean() 
print()
print(precip_data.head()) 

# Plot
x = precip_data["Date"].to_numpy()
y = precip_data["Anomaly"].to_numpy()

mask1 = y > 0
mask2 = y <= 0 

fix, ax = plt.subplots()
ax.bar(x[mask1], y[mask1], color = 'royalblue') 
ax.bar(x[mask2], y[mask2], color = 'orange')
plt.xlabel("Year")
plt.grid()
plt.ylabel("Difference from 1901 - 2000 average (in)") 
plt.title(f"Annual {variable} in {region}")

In this plot, the blue bars show years that are wetter than average, and the orange bars show drier years. The taller the bar, the further the precipitation amount was from average. 

### References and continued reading:
EPA: https://www.epa.gov/climate-indicators/climate-change-indicators-us-and-global-precipitation#:~:text=As%20average%20temperatures%20at%20the,increase%20precipitation%20in%20many%20areas.

NASA: https://gpm.nasa.gov/resources/faq/how-does-climate-change-affect-precipitation

NOAA: https://www.noaa.gov/education/resource-collections/climate/climate-change-impacts

Climate.gov: https://www.climate.gov/tags/extreme-rain