# <span style="color:#E888BB; font-size: 0%;">Introduction</span>
<div 
    style="
        padding: 30px; 
        color: white; 
        margin: 10px; 
        font-size: 170%; 
        text-align: left; 
        display: center; 
        border-radius: 10px; 
        background-color: rgba(0, 0, 0, 0.2);  /* Dark overlay */
        overflow: hidden; 
        background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); 
        background-size: cover; 
        background-position: center;
        background-blend-mode: darken;  /* Blend the dark overlay with the image */
    "
>
    <b><span style='color: white;'>Introduction</span></b>
</div>

### <b>1 <span style='color:#FF6F61'>|</span> Problem Background</b> 

Coral reefs are among the most **diverse** and **valuable ecosystems** on the planet, providing essential services such as supporting **marine biodiversity**, protecting **coastlines**, and sustaining millions of people who rely on them for **food**, **tourism**, and **livelihoods**. Despite their importance, coral reefs are increasingly threatened by a phenomenon known as **coral bleaching**. Coral bleaching occurs when corals, stressed by changes in environmental conditions—most notably increased **sea surface temperatures (SSTs)**—expel the symbiotic algae (*zooxanthellae*) that live within their tissues. These algae are crucial to the coral's survival, as they provide energy through **photosynthesis** and contribute to the coral's vibrant colors. The loss of these algae causes the corals to turn white or "bleach" and significantly increases their susceptibility to **disease** and **mortality**.

<br>

<figure style="text-align: center; margin: 20px 0;">
    <img src="./image/image5.webp" alt="Coral Bleaching" style="width:100%; height:auto; border-radius:15px; object-fit:cover; aspect-ratio: 3/1;">
    <figcaption style="font-size: 14px; color: #555; margin-top: 10px;">
        Figure 1: An example of coral bleaching caused by thermal stress.
    </figcaption>
</figure>


<br>

The primary driver of coral bleaching is **thermal stress** due to elevated SSTs. Even small increases in SSTs, such as **1°C above the normal maximum temperatures**, can lead to widespread bleaching events, particularly when these elevated temperatures persist for extended periods. Additional stressors, such as **pollution**, **overfishing**, and **ocean acidification**, can exacerbate the effects of thermal stress, making it more difficult for corals to recover from bleaching events.

Over the past few decades, the **frequency** and **severity** of coral bleaching events have increased significantly, largely driven by **global climate change**. As ocean temperatures continue to rise, the future of coral reefs is increasingly uncertain, with projections suggesting that most coral reefs could experience **annual severe bleaching** by the mid-21st century. The potential loss of coral reefs would have profound implications not only for **marine ecosystems** but also for human communities that depend on the goods and services provided by healthy coral reefs.

Understanding the environmental conditions that lead to coral bleaching, such as **sea surface temperature anomalies (SSTA)** and **thermal stress anomalies (TSA)**, is crucial for predicting and mitigating the impacts of future bleaching events. By studying these factors, researchers can develop **early warning systems**, inform **conservation strategies**, and guide efforts to reduce the **anthropogenic stressors** that further endanger coral reefs. Addressing coral bleaching is not only a matter of preserving **biodiversity** but is also critical for maintaining the **resilience** and **sustainability** of marine environments and the human economies that depend on them.

<br>

<img src="./image/image6.webp" alt="Coral Bleaching" style="width:100%; height:auto; border-radius:15px; object-fit:cover; aspect-ratio: 3/1;">

<img src="https://i.ibb.co/bNj4ZTZ/image5.webp" alt="Notebook Cover Image" style="width:100%; height:auto; border-radius:15px; object-fit:cover; aspect-ratio: 3/1;">
<br>

## <b>2 <span style='color:#FF6F61'>|</span> Objectives</b> 

This notebook is designed to systematically analyze the factors contributing to coral bleaching by following a structured approach that includes data exploration, preprocessing, model training, and evaluation. The primary goals of this notebook are outlined as follows:

1. **Data Exploration**:
   - **Initial Dataset Inspection**: Load the dataset and conduct an initial exploration to understand its structure, including the types of data, the range of values, and potential anomalies.
   - **Descriptive Analysis**: Provide a summary of key variables related to coral bleaching and environmental conditions, such as sea surface temperature anomalies (SSTA) and thermal stress anomalies (TSA).
   - **Visualization**: Utilize various plots (e.g., histograms, scatter plots, and maps) to visualize distributions, relationships, and geographical patterns in the data.

2. **Data Preprocessing**:
   - **Data Cleaning**: Identify and handle missing values, outliers, and inconsistencies within the dataset to ensure the quality and reliability of the analysis.
   - **Feature Engineering**: Create new features or transform existing ones (e.g., normalizing temperature values, calculating additional stress metrics) to enhance the dataset's utility for modeling.
   - **Data Transformation**: Convert categorical data into numeric formats, scale numerical features, and prepare the data for model training.

3. **Model Training, Tuning, and Evaluation**:
   - **Model Selection**: Choose appropriate machine learning models to predict coral bleaching events based on environmental factors, focusing on models that are well-suited to the nature of the data (e.g., regression, classification models).
   - **Model Training**: Train the selected models using the processed dataset, applying techniques such as cross-validation to ensure robust performance.
   - **Hyperparameter Tuning**: Optimize the models by tuning hyperparameters to improve predictive accuracy.
   - **Model Evaluation**: Assess model performance using relevant metrics (e.g., accuracy, precision, recall, RMSE) and compare the results to select the best-performing model.

4. **Conclusion**:
   - **Summary of Findings**: Summarize the key insights gained from the data analysis and modeling, particularly in relation to the environmental drivers of coral bleaching.
   - **Implications for Conservation**: Discuss the potential applications of the findings for coral reef conservation efforts, including predictive modeling and risk assessment.
   - **Future Directions**: Provide suggestions for further research or analysis, particularly in areas that could enhance the understanding of coral reef health and resilience.

This structured approach will ensure a comprehensive analysis of the dataset, providing valuable insights into the environmental factors that influence coral bleaching and offering a foundation for predictive modeling in coral reef conservation.

| **Supplied Name**                  | **Supplied Description**                                                                                     | **Supplied Units**                   | **Standard Name**          |
|------------------------------------|--------------------------------------------------------------------------------------------------------------|--------------------------------------|----------------------------|
| **Site_ID**                        | Unique identifier for each site                                                                               | unitless                             | site                       |
| **Sample_ID**                      | Unique identifier for each sampling event                                                                     | unitless                             | sample                     |
| **Data_Source**                    | Source of data set                                                                                            | unitless                             | sampling_method            |
| **Latitude_Degrees**               | Latitude coordinates (positive values = North; negative values = South)                                       | degrees North                        | lat                        |
| **Longitude_Degrees**              | Longitude coordinates (positive values = East; negative values = West)                                        | degrees East                         | lon                        |
| **Ocean_Name**                     | The ocean in which the sampling took place                                                                    | unitless                             | region                     |
| **Reef_ID**                        | Unique identifier from Reef Check data                                                                        | unitless                             | Site_ID                    |
| **Realm_Name**                     | Identification of realm as defined by the Marine Ecoregions of the World (MEOW) Spalding et al. 2007          | unitless                             | region                     |
| **Ecoregion_Name**                 | Identification of the Ecoregions (150) as defined by Veron et al                                              | unitless                             | region                     |
| **Country_Name**                   | The country where sampling took place                                                                         | unitless                             | region                     |
| **State_Island_Province_Name**     | The state, territory (e.g., Guam) or island group (e.g., Hawaiian Islands) where sampling took place          | unitless                             | region                     |
| **City_Town_Name**                 | The region, city, or nearest town, where sampling took place                                                  | unitless                             | region                     |
| **Site_Name**                      | The accepted name of the site or the name given by the team that sampled the reef                             | unitless                             | site                       |
| **Distance_to_Shore**              | The distance of the sampling site from the nearest land                                                       | meters (m)                           | length                     |
| **Exposure**                       | The site's exposure to fetch, considering factors such as fetch distance, winds, and geographical conditions   | unitless                             | site_descrip               |
| **Turbidity**                      | Kd490 with a 100-km buffer, related to the diffuse attenuation coefficient of light at the 490 nm wavelength   | reciprocal meters (m⁻¹)              | turbidity                  |
| **Cyclone_Frequency**              | Number of cyclone events from 1964 to 2014                                                                    | unitless                             | site_descrip               |
| **Date_Day**                       | The day of the sampling event                                                                                 | unitless                             | day                        |
| **Date_Month**                     | The month of sampling event                                                                                   | unitless                             | month                      |
| **Date_Year**                      | The year of sampling event                                                                                    | unitless                             | year                       |
| **Depth_m**                        | Depth of sampling site                                                                                        | meters (m)                           | depth_w                    |
| **Substrate_Name**                 | Type of substrate from Reef Check data                                                                        | unitless                             | site_descrip               |
| **Percent_Cover**                  | Average cover value (percent)                                                                                 | percent                              | percent_cover              |
| **Bleaching_Level**                | Reef Check data, coral population or coral colony                                                             | unitless                             | site_descrip               |
| **Percent_Bleaching**              | An average of four transect segments (Reef Check) or average of a bleaching code                              | percent                              | bleach_percent             |
| **ClimSST**                        | Climatological sea surface temperature (SST) based on weekly SSTs for the study time frame                    | degrees Celsius                      | SST                        |
| **Temperature_Kelvin**             | Temperature in Kelvin                                                                                         | Kelvin                               | temp                       |
| **Temperature_Mean**               | Mean Temperature                                                                                              | degrees Celsius                      | temp                       |
| **Temperature_Minimum**            | Minimum Temperature                                                                                           | degrees Celsius                      | temp                       |
| **Temperature_Maximum**            | Maximum Temperature                                                                                           | degrees Celsius                      | temp                       |
| **Temperature_Kelvin_Standard_Deviation** | Standard deviation of temperature                                                                          | Kelvin                               | temp                       |
| **Windspeed**                      | Windspeed                                                                                                     | meters per hour                      | wind_speed                 |
| **SSTA**                           | Sea Surface Temperature Anomaly: weekly SST minus weekly climatological SST                                   | degrees Celsius                      | SST                        |
| **SSTA_Standard_Deviation**        | The Standard Deviation of weekly SST Anomalies over the entire time period                                    | degrees Celsius                      | SST                        |
| **SSTA_Mean**                      | The mean SSTA over the entire time period                                                                     | degrees Celsius                      | SST                        |
| **SSTA_Minimum**                   | The minimum SSTA over the entire time period                                                                  | degrees Celsius                      | SST                        |
| **SSTA_Maximum**                   | The maximum SSTA over the entire time period                                                                  | degrees Celsius                      | SST                        |
| **SSTA_Frequency**                 | Sea Surface Temperature Anomaly Frequency: number of times over the previous 52 weeks that SSTA ≥1 degree C   | SSTA per time period                 | SST                        |
| **SSTA_Frequency_Standard_Deviation** | The standard deviation of SSTA_Frequency over the entire time period                                        | SSTA per time period                 | SST                        |
| **SSTA_FrequencyMax**              | The maximum SSTA_Frequency over the entire time period                                                        | SSTA per time period                 | SST                        |
| **SSTA_FrequencyMean**             | The mean SSTA_Frequency over the entire time period                                                           | SSTA per time period                 | SST                        |
| **SSTA_DHW**                       | Sea Surface Temperature Degree Heating Weeks: sum of previous 12 weeks when SSTA ≥1 degree C                 | weeks                                | time_elapsed               |
| **SSTA_DHW_Standard_Deviation**    | The standard deviation SSTA_DHW over the entire time period                                                   | weeks                                | SST                        |
| **SSTA_DHWMax**                    | The maximum SSTA_DHW over the entire time period                                                              | weeks                                | SST                        |
| **SSTA_DHWMean**                   | The mean SSTA_DHW over the entire time period                                                                 | weeks                                | SST                        |
| **TSA**                            | Thermal Stress Anomaly: Weekly sea surface temperature minus the maximum of weekly climatological SST          | degrees Celsius                      | temp                       |
| **TSA_Standard_Deviation**         | The standard deviation of TSA over the entire time period                                                     | degrees Celsius                      | temp                       |
| **TSA_Minimum**                    | The minimum TSA over the entire time period                                                                   | degrees Celsius                      | temp                       |
| **TSA_Maximum**                    | The maximum TSA over the entire time period                                                                   | degrees Celsius                      | temp                       |
| **TSA_Mean**                       | The mean TSA over the entire time period                                                                      | degrees Celsius                      | temp                       |
| **TSA_Frequency**                  | Thermal Stress Anomaly Frequency: number of times over previous 52 weeks that TSA ≥1 degree C                 | TSA per time period                  | temp                       |
| **TSA_Frequency_Standard_Deviation** | The standard deviation of frequency of thermal stress anomalies over the entire time period                  | TSA per time period                  | temp                       |
| **TSA_FrequencyMax**               | The maximum TSA_Frequency over the entire time period                                                         | TSA per time period                  | temp                       |
| **TSA_FrequencyMean**              | The mean TSA_Frequency over the entire time period                                                            | TSA per time period                  | temp                       |
| **TSA_DHW**                        | Thermal Stress Anomaly (TSA) Degree Heating Week (DHW): Sum of previous 12 weeks when TSA ≥1 degree C         | weeks                                | time_elapsed               |
| **TSA_DHW_Standard_Deviation**     | The standard deviation of TSA_DHW over the entire time period                                                 | weeks                                | time_elapsed               |
| **TSA_DHWMax**                     | The maximum TSA_DHW over the entire time period                                                               | weeks                                | time_elapsed               |
| **TSA_DHWMean**                    | The mean TSA_DHW over the entire time period                                                                  | weeks                                | time_elapsed               |
| **Date**                           | Date of sampling event in format YYYY-MM-DD                                                                   | Format: %Y-%m-%d unitless            | date                       |
| **Site_Comments**                  | Comments of any issues with the site or additional information                                                | unitless                             | comment                    |
| **Sample_Comments**                | Comments of any issue or additional information of sampling event                                             | unitless                             | comment                    |
| **Bleaching_Comments**             | Comments of any issue or additional information of bleaching value                                            | unitless                             | comment                    |


# <span style="color:#E888BB; font-size: 0%;">Environment Setup</span>
<div 
    style="
        padding: 30px; 
        color: white; 
        margin: 10px; 
        font-size: 170%; 
        text-align: left; 
        display: center; 
        border-radius: 10px; 
        background-color: rgba(0, 0, 0, 0.2);  /* Dark overlay */
        overflow: hidden; 
        background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); 
        background-size: cover; 
        background-position: center;
        background-blend-mode: darken;  /* Blend the dark overlay with the image */
    "
>
    <b><span style='color: white;'>Environment Setup</span></b>
</div>

In [2]:
import pandas as pd

file_path = 'dataset/global_bleaching_environmental.csv'

df = pd.read_csv(file_path)

df.head()

  df = pd.read_csv(file_path)


Unnamed: 0,Site_ID,Sample_ID,Data_Source,Latitude_Degrees,Longitude_Degrees,Ocean_Name,Reef_ID,Realm_Name,Ecoregion_Name,Country_Name,...,TSA_FrequencyMax,TSA_FrequencyMean,TSA_DHW,TSA_DHW_Standard_Deviation,TSA_DHWMax,TSA_DHWMean,Date,Site_Comments,Sample_Comments,Bleaching_Comments
0,2501,10324336,Donner,23.163,-82.526,Atlantic,nd,Tropical Atlantic,Cuba and Cayman Islands,Cuba,...,5,0,0.0,0.74,7.25,0.18,2005-09-15,nd,nd,nd
1,3467,10324754,Donner,-17.575,-149.7833,Pacific,nd,Eastern Indo-Pacific,Society Islands French Polynesia,French Polynesia,...,4,0,0.26,0.67,4.65,0.19,1991-03-15,The bleaching does not appear to have gained ...,The bleaching does not appear to have gained ...,nd
2,1794,10323866,Donner,18.369,-64.564,Atlantic,nd,Tropical Atlantic,Hispaniola Puerto Rico and Lesser Antilles,United Kingdom,...,7,0,0.0,1.04,11.66,0.26,2006-01-15,nd,nd,nd
3,8647,10328028,Donner,17.76,-64.568,Atlantic,nd,Tropical Atlantic,Hispaniola Puerto Rico and Lesser Antilles,United States,...,4,0,0.0,0.75,5.64,0.2,2006-04-15,nd,nd,nd
4,8648,10328029,Donner,17.769,-64.583,Atlantic,nd,Tropical Atlantic,Hispaniola Puerto Rico and Lesser Antilles,United States,...,5,0,0.0,0.92,6.89,0.25,2006-04-15,nd,nd,nd


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 41361 entries, 0 to 41360
Data columns (total 62 columns):
 #   Column                                 Non-Null Count  Dtype  
---  ------                                 --------------  -----  
 0   Site_ID                                41361 non-null  int64  
 1   Sample_ID                              41361 non-null  int64  
 2   Data_Source                            41361 non-null  object 
 3   Latitude_Degrees                       41361 non-null  float64
 4   Longitude_Degrees                      41361 non-null  float64
 5   Ocean_Name                             41361 non-null  object 
 6   Reef_ID                                41361 non-null  object 
 7   Realm_Name                             41361 non-null  object 
 8   Ecoregion_Name                         41361 non-null  object 
 9   Country_Name                           41361 non-null  object 
 10  State_Island_Province_Name             41361 non-null  object 
 11  Ci

In [5]:
df['Bleaching_Level'].describe()

count          41361
unique             2
top       Population
freq           22531
Name: Bleaching_Level, dtype: object

In [7]:
df.describe(include='all')

Unnamed: 0,Site_ID,Sample_ID,Data_Source,Latitude_Degrees,Longitude_Degrees,Ocean_Name,Reef_ID,Realm_Name,Ecoregion_Name,Country_Name,...,TSA_FrequencyMax,TSA_FrequencyMean,TSA_DHW,TSA_DHW_Standard_Deviation,TSA_DHWMax,TSA_DHWMean,Date,Site_Comments,Sample_Comments,Bleaching_Comments
count,41361.0,41361.0,41361,41361.0,41361.0,41361,41361,41361,41361,41361,...,41361.0,41361.0,41361.0,41361.0,41361.0,41361.0,41361,41361,41361,41361
unique,,,9,,,5,5355,9,115,91,...,231.0,77.0,1192.0,417.0,1706.0,250.0,5212,862,1305,8
top,,,Reef_Check,,,Pacific,nd,Central Indo-Pacific,Bahamas and Florida Keys,United States,...,7.0,1.0,0.0,1.01,6.68,0.24,2005-08-15,nd,nd,nd
freq,,,28821,,,21896,12540,19101,4227,4611,...,5549.0,19080.0,28362.0,890.0,377.0,1373.0,519,39104,38403,38692
mean,74558.16,10128800.0,,7.558085,34.966127,,,,,,...,,,,,,,,,,
std,252041.8,1373151.0,,15.732185,103.404598,,,,,,...,,,,,,,,,,
min,1.0,9623.0,,-30.2625,-179.9743,,,,,,...,,,,,,,,,,
25%,3502.0,10311080.0,,-4.9025,-78.3856,,,,,,...,,,,,,,,,,
50%,5925.0,10316280.0,,10.7761,96.8433,,,,,,...,,,,,,,,,,
75%,8368.0,10321490.0,,20.0505,120.8804,,,,,,...,,,,,,,,,,


# <span style="color:#E888BB; font-size: 0%;">Exploratory Data Analysis (EDA)</span>
<div 
    style="
        padding: 30px; 
        color: white; 
        margin: 10px; 
        font-size: 170%; 
        text-align: left; 
        display: center; 
        border-radius: 10px; 
        background-color: rgba(0, 0, 0, 0.2);  /* Dark overlay */
        overflow: hidden; 
        background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); 
        background-size: cover; 
        background-position: center;
        background-blend-mode: darken;  /* Blend the dark overlay with the image */
    "
>
    <b><span style='color: white;'>Exploratory Data Analysis (EDA)</span></b>
</div>

# <span style="color:#E888BB; font-size: 0%;">Data Preprocessing</span>
<div 
    style="
        padding: 30px; 
        color: white; 
        margin: 10px; 
        font-size: 170%; 
        text-align: left; 
        display: center; 
        border-radius: 10px; 
        background-color: rgba(0, 0, 0, 0.2);  /* Dark overlay */
        overflow: hidden; 
        background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); 
        background-size: cover; 
        background-position: center;
        background-blend-mode: darken;  /* Blend the dark overlay with the image */
    "
>
    <b><span style='color: white;'>Data Preprocessing</span></b>
</div>

# <span style="color:#E888BB; font-size: 0%;">Modeling</span>
<div 
    style="
        padding: 30px; 
        color: white; 
        margin: 10px; 
        font-size: 170%; 
        text-align: left; 
        display: center; 
        border-radius: 10px; 
        background-color: rgba(0, 0, 0, 0.2);  /* Dark overlay */
        overflow: hidden; 
        background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); 
        background-size: cover; 
        background-position: center;
        background-blend-mode: darken;  /* Blend the dark overlay with the image */
    "
>
    <b><span style='color: white;'>Modeling</span></b>
</div>

# <span style="color:#E888BB; font-size: 0%;">Hyperparameter Tuning</span>
<div 
    style="
        padding: 30px; 
        color: white; 
        margin: 10px; 
        font-size: 170%; 
        text-align: left; 
        display: center; 
        border-radius: 10px; 
        background-color: rgba(0, 0, 0, 0.2);  /* Dark overlay */
        overflow: hidden; 
        background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); 
        background-size: cover; 
        background-position: center;
        background-blend-mode: darken;  /* Blend the dark overlay with the image */
    "
>
    <b><span style='color: white;'>Hyperparameter Tuning</span></b>
</div>

# <span style="color:#E888BB; font-size: 0%;">Final Model and Conclusion</span>
<div 
    style="
        padding: 30px; 
        color: white; 
        margin: 10px; 
        font-size: 170%; 
        text-align: left; 
        display: center; 
        border-radius: 10px; 
        background-color: rgba(0, 0, 0, 0.2);  /* Dark overlay */
        overflow: hidden; 
        background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); 
        background-size: cover; 
        background-position: center;
        background-blend-mode: darken;  /* Blend the dark overlay with the image */
    "
>
    <b><span style='color: white;'>Final Model and Conclusion</span></b>
</div>

<br>

![](./image/image7.webp)

<br>

# <span style="color:#E888BB; font-size: 0%;">References</span>
<div 
    style="
        padding: 30px; 
        color: white; 
        margin: 10px; 
        font-size: 170%; 
        text-align: left; 
        display: center; 
        border-radius: 10px; 
        background-color: rgba(0, 0, 0, 0.2);  /* Dark overlay */
        overflow: hidden; 
        background-image: url(https://i.ibb.co/1fn8BZj/image4.webp); 
        background-size: cover; 
        background-position: center;
        background-blend-mode: darken;  /* Blend the dark overlay with the image */
    "
>
    <b><span style='color: white;'>References</span></b>
</div>