# **Harmful Algal Bloom Occurence Rate Over Time**

## Introduction: 
    Harmful algal blooms result when excess nutrients (Phosphorous and Nitrogen) loading in water bodies combine with slow moving water to create blue green algae which combine into Harmful Algal Blooms (HABs). There are many different types of blooms. Almost all have a negative impact on the surrounding environment, and many can be harmful to humans.


## Hypothesis:
    One of the main factors in HAB formation is energy availability. In the wake of climate change, nations have taken moves to curb emissions and lower global temperatures. Are these steps readily obvious when viewing HAB occurence rate? Evapotranspiration is another process reliant on water, would the rate of this over time over change? And how will the potential changes compare to those of the HABs? Energy is a major source needed in facilitating both ET and HAB occurence, so how might the rates of these compare over the years?

## Studied Site:
    For this project, the state of New York will be looked at. To estimate ET, a USGS gauge and NOAA gauge near Little Falls, NY provide information on discharge and precipitation.

## Datasets: 
1. Harmful_Algal_Bloom_Statewide_Occurrence_Summary__2012-2018.csv - Compiled by State of New York, provides info on HAB occurences throughout the state for a certain time period. Gives info on where the HAB occurred, and how long it was on DEC watchlists for.

2. 3263069.csv - A precipitation file from the USGS. Gives snow and precipitation data for a certain time period.

3. Littlefalls.txt - A text file from NOAA. Gives discharge data for the Mohawk River near Little Falls, NY.

### Load in the data files
Load and initially clean the data 

In [None]:
# Load in libraries
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np
#%% Load in HAB data
# Load in the HAB data
dfhab = pd.read_csv('HAB data.csv')

#%% Load in discharge file
# Drainage area flowing into gauge point
drainage = 1342

# Load in the file, set Date as index
dfdis = pd.read_csv('Littlefalls.txt', comment = "#", delimiter='\t', header = 1,
                     parse_dates=['20d'], index_col=['20d'])

# Drop unused columns
dfdis.drop(columns = {"5s", "15s", "10s"}, inplace = True)

# Fill in missing values
dfdis.interpolate(method = 'linear')

# Convert the discharge value
dfdis["Discharge_mm"] = (dfdis["14n"]/drainage) * 26334720   #to mm/day

# Drop old column
dfdis.drop(columns = {"14n"}, inplace = True)

#%% Load in precip file
#Load in the file
dfp = pd.read_csv('3263069.csv', parse_dates=['DATE'], index_col=['DATE'])

# Fill missing values with 0
dfp = dfp.fillna(0)

# See if MDPR data exists
if 'MDPR' in dfp.columns.tolist():
     dfp['Combined'] = dfp['MDPR'] + dfp['PRCP']
else:
    dfp['Combined'] = dfp['PRCP']

#Convert to mm
dfp["Combined_Precip_mm"] = dfp["Combined"] * 25.4

#Delete old dataframes
dfp = dfp[['Combined_Precip_mm']]

#%% Initial Plots

fig1, ax1 = plt.subplots()
ax1.bar(dfhab.index, dfhab['Number of Weeks on DEC Notification List'], label = 'Weeks on DEC List')

fig2, ax2 = plt.subplots()
ax2.bar(dfdis.index, dfdis['Discharge_mm'], label = 'Discharge (mm)')
ax2.set_title('Discharge')

fig3, ax3 = plt.subplots()
ax3.bar(dfp.index, dfp['Combined_Precip_mm'], label = 'Precip (mm)',)
ax3.set_title('Combined Precipitation')

## Analysis:
-   Some sort of function or other means of seperating the HAB data by lakes, and create a plot w/ multiple subplots showing HAB occurence over time for various lakes

-   Compare the total number of weeks HABs were on DEC watchlists per year, compare increases

-  PLot discharge, precip, and DEC watchlist data, in theory increased precip leads to increased HAB rate, discharge less so indicative of HABs just helpful to see water movement in the area. Can compare trends

-  Plots for monthly HAB occurence by lake and overall, to identify seasonality and any changes within

-  Can use precip and discharge to estimate ET, indicative of energy rates which is another major component of HAB rate

-  Can also add the ET to any of the plots and observe any potential overlaps in the rates
    

## Discussion:
HABs can be toxic to humans, and at the minimum their occurence in drinking water sources can cause shutdowns and inconviences to the public, along with shutting down local recreational areas. Expound on the impacts (economic, social, environmental)

Then can go into detail discussing results. If positive, can identify any potential measures that have been taken in NY or nationally that could potentially explain these (either measure targetting HABS directly, or sources such as energy influx or nutrient influx), these measures also should probably be referenced first in the hypothesis or introduction. If negative, explain why any potential measures aren't working. If no measures were found, try and identify why the results appear negative, and maybe brainstorm any measures that could be taken to deal with the identified apparent cause. If the results are inconclusive (varying or differing rates in HAB occurence, precip, or discharge, for example) explain what is explainable, try to explain reasons why other results may be inconclusive

Regardless of the results, the limitations will need to be discussed. HABs throughout the state of NY were analyzed, but precip and discharge data was only looked at for a single site. These are intended to charecterize the ET and water movement, but there are issues with using it for the whole state. The HAB data only goes back to 2012, and the practices that led to the increase in HAB occurence was very likely before that, so the complete picture on HAB data is likely not available.