# Coral Bleaching
## Predicting the Likelihood and Severity of Coral Reef Bleaching Events Based on Various Environmental Factors

Authors: Jaimie Chin & Maro Aboelwafa  
Course: DS.UA.301 - Advanced Topics in Data Science: Machine Learning for Climate Change  
Date: 22 March 2023  

## Background
The climate issue we are tackling is predicting the likelihood and severity of coral reef bleaching events based on various environmental factors. Coral reefs are sensitive to environmental changes such as temperature, salinity, nutrient levels and water acidity. When these factors reach certain thresholds, they  trigger coral bleaching, a process in which the coral expels the symbiotic algae that live on it, causing the coral to turn white and possibly die.

To address this, we aim to use machine learning to analyze datasets of environmental data, such as water temperature to develop algorithms that can predict the likelihood and severity of coral bleaching events.

### Data 
We will be using the Bleaching and environmental data for global coral reef sites (1980-2020) from the Biological & Chemical Oceanography Data Management Office. The data includes information on the presence and absence of coral bleaching, allowing comparative analyses and the determination of geographical bleaching thresholds, together with site exposure, distance to land, mean turbidity, cyclone frequency, and a suite of sea-surface temperature metrics at the times of survey. 

Data Server: [Bleaching and Environmental Data](http://dmoserv3.whoi.edu/jg/info/BCO-DMO/Coral_Reef_Brightspots/bleaching_and_env_data%7Bdir=dmoserv3.whoi.edu/jg/dir/BCO-DMO/Coral_Reef_Brightspots/,data=dmoserv3.bco-dmo.org:80/jg/serv/BCO-DMO/Coral_Reef_Brightspots/global_bleaching_environmental.brev0%7D?)


## Import Packages

In [1]:
# Import packages & libraries 
import pandas as pd 
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## Load Dataset 

In [18]:
# Import Global Bleaching & Environmental Data 
data_path = "data/global_bleaching_environmental.csv"
types = {'Distance_to_Shore': float, 'Turbidity': float, 'Percent_Bleaching': float}
bleach_df = pd.read_csv(data_path, sep='\s*[,]\s*', engine='python', na_values=['nd'], dtype=types)

In [21]:
# View sample of the data 
bleach_df.sample(5)

Unnamed: 0,Site_ID,Sample_ID,Data_Source,Latitude_Degrees,Longitude_Degrees,Ocean_Name,Reef_ID,Realm_Name,Ecoregion_Name,Country_Name,...,TSA_FrequencyMax,TSA_FrequencyMean,TSA_DHW,TSA_DHW_Standard_Deviation,TSA_DHWMax,TSA_DHWMean,Date,Site_Comments,Sample_Comments,Bleaching_Comments
529,6400,10325615,Donner,20.017,-87.462,Atlantic,,Tropical Atlantic,Belize and west Caribbean,Mexico,...,8.0,1.0,2.46,2.06,19.71,0.68,2005-08-15,,,
2906,6485,10326802,Donner,18.66,-87.717,Atlantic,,Tropical Atlantic,Belize and west Caribbean,Mexico,...,5.0,0.0,0.0,0.82,6.48,0.25,2005-08-15,,,
36549,3859,10310555,Reef_Check,-8.7175,116.7714,Indian,116.46.17E.8.43.03S,Central Indo-Pacific,Lesser Sunda Islands and Savu Sea,Indonesia,...,20.0,2.0,3.67,1.91,14.46,0.72,2003-05-06,,,
39296,6975,10321442,Reef_Check,9.5628,-79.6842,Atlantic,9N79W2,Tropical Atlantic,Belize and west Caribbean,Panama,...,13.0,1.0,1.04,1.7,12.31,0.52,1998-09-26,,,
30134,5999,10307969,Reef_Check,5.2322,103.2603,Pacific,103.15.37E.5.13.56N,Central Indo-Pacific,Sunda Shelf south-east Asia,Malaysia,...,7.0,1.0,0.0,1.0,7.65,0.33,2010-09-28,,,


## Data Cleaning & Exploration
* Attribute Information & Null values 
* Distributions of each attribute 
* Visualizations
* Correlation Matrix 


## Feature Selection
* Adding, Dropping Features of Importance 
* Encoding Features 
* Discard Unnecessary Features 
* Impute missing information if necessary

## Model Implementation
* Random Forest
* Gradient Boosting 
* ANN

## Evaluation & Analysis 
* Accuracy
* Precision
* Recall
* F1 score 

## Optimization
* Cross-Validation
* Grid-Search

## Final Evaluation & Analysis
* Accuracy
* Precision
* Recall
* F1 score 

## Final Visualizations 
* Plot coral bleaching severity?