# COGS 108 - Project Proposal

## Authors

- Paris Aguilar-Ulloa
- Kali Anchlia
- Julia Berdeski
- Ananya Kharya
- James Yi


## Research Question

Over the past 20 years, what distinct recovery trajectories in percent live coral cover are observed across U.S. coral reef sites following stress events, and how are thermal stress and algal cover associated with these differing pathways?

## Background and Prior Work


Coral reefs are highly sensitive marine ecosystems that provide important ecological, economic, and coastal protection benefits. In recent years, coral reef health has declined worldwide due to rising sea surface temperatures, more frequent marine heatwaves, and local stressors such as algal overgrowth. Thermal stress, often measured using Degree Heating Weeks (DHW), is strongly linked to coral bleaching events, which can lead to partial or complete loss of live coral cover. <a name="cite_ref-1"></a>[<sup>1</sup>](#cite_note-1) While many studies document coral decline following thermal stress, less is known about how coral reefs recover over time and weather recovery follows consistent patterns across different sites.  <a name="cite_ref-2"></a>[<sup>2</sup>](#cite_note-2)

Being students of University of California, San Diego, part of our academic environment is ingrained with the university’s strengths in marine biology, oceanography, and climate science through programs such as the Scripps Institution of Oceanography. UC San Diego researchers have made major contributions to coral reef monitoring and the study of climate-driven marine stressors, which does add to the motivation for this project. Publicly available datasets from organizations like NOAA and the National Coral Reef Monitoring Program align well with UC San Diego’s emphasis on data-driven marine science and provide a strong foundation for this analysis.

Previous research has shown that coral recovery trajectories can vary widely depending on environmental conditions and local ecological dynamics. Studies using NOAA Coral Reef Watch data have found that higher DHW values are associated with more severe bleaching and increased coral mortality, often resulting in slow or incomplete recovery. <a name="cite_ref-3"></a>[<sup>3</sup>](#cite_note-3) However, more recent monitoring efforts, such as the National Coral Reef Monitoring Program (NCRMP), provide long-term, site-level data on percent live coral and algal cover across U.S. reef systems. <a name="cite_ref-4"></a>[<sup>4</sup>](#cite_note-4)

While many studies using these data focus on overall trends in reef health, fewer explicitly examine differences in recovery pathways over time. Our project builds on this prior work by identifying coral recovery trajectories and analyzing how thermal stress and algal cover, both individually and together, are associated with differences in recovery outcomes across U.S. coral reef sites.


1. <a name="cite_note-1"></a> [^](#cite_ref-1) Watch, N. C. R. (n.d.). Coral Reef Watch Home. NOAA Coral Reef Watch Daily 5km Satellite Coral Bleaching Heat Stress Monitoring Products (Version 3.1). https://coralreefwatch.noaa.gov/product/5km/index.php  
2. <a name="cite_note-2"></a> [^](#cite_ref-2) Hughes et Al. (2017, March 16). Global warming and recurrent mass bleaching of corals. Nature News. https://www.nature.com/articles/nature21707 
3. <a name="cite_note-3"></a> [^](#cite-ref-3)Liu, G. et Al. (2014, November 20). Reef-scale thermal stress monitoring of coral ecosystems: New 5-km global products from NOAA Coral Reef Watch. MDPI. https://www.mdpi.com/2072-4292/6/11/11579 
4. <a name="cite_note-4"></a> [^](#cite-ref-4)National Coral Reef Monitoring Program: Tracking Environmental Conditions. NCRMP | Environmental. (n.d.). https://coralreef.noaa.gov/topics/national-coral-reef-monitoring-program/environmental 


## Hypothesis


We hypothesize that coral reef sites exposed to higher cumulative thermal stress will exhibit recovery trajectories characterized by slower increases or sustained declines in percent live coral cover. We also predict that sites with higher algal cover will show poorer recovery outcomes.

## Data

**Data:**
* Our ideal dataset would contain the following variables among others: 
* Percentage of live corals 
* Percentage of algal density in the coral region 
* Water temperatures at the coral site 
* Demarcation of area in the form of a standardized metric
<br>

Since we are trying to determine recovery trajectories, we would like to have continuous data taken weekly or biweekly if not daily. The data would ideally be in the form of CSV files and would be filtered to ensure easy download and storage. 
We were able to find two major existing datasets for the purpose of this study. We will get data about the number of live corals and density of algae from the National Coral Reef Monitoring Program (NCRMP) Benthic data for Florida, US Virgin Islands and Puerto Rico. Data for thermal stress will be acquired in from the NOAA Coral Reef Watch Daily Global 5km Satellite Coral Bleaching Degree
Heating Week data. This data was filtered for the coordinates of the “Carribean” which includes the DHW daily logs for Florida, US Virgin Islands and Puerto Rico. 
Both these datasets are publically available and can be easily accessed and downloaded. 



## Ethics 

Instructions: Keep the contents of this cell. For each item on the checklist
-  put an X there if you've considered the item
-  IF THE ITEM IS RELEVANT place a short paragraph after the checklist item discussing the issue.
  
Items on this checklist are meant to provoke discussion among good-faith actors who take their ethical responsibilities seriously. Your teams will document these discussions and decisions for posterity using this section.  You don't have to solve these problems, you just have to acknowledge any potential harm no matter how unlikely.

Here is a [list of real world examples](https://deon.drivendata.org/examples/) for each item in the checklist that can refer to.

[![Deon badge](https://img.shields.io/badge/ethics%20checklist-deon-brightgreen.svg?style=popout-square)](http://deon.drivendata.org/)

### A. Data Collection
 - [X] **A.1 Informed consent**: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?

> This project does not involve human subjects or individual-level data. All datasets used are publicly available environmental monitoring datasets collected by government agencies (e.g., NOAA) using standardized ecological survey methods.

 - [X] **A.2 Collection bias**: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?

> Collection bias is relevant because coral reef monitoring sites are not evenly distributed across regions or reef types. Some U.S. reef systems may be monitored more frequently or consistently than others due to accessibility, funding, or conservation priority. This could bias observed recovery pathways toward better-studied regions. We acknowledge this limitation and will avoid overgeneralizing results beyond the monitored sites.

 - [X] **A.3 Limit PII exposure**: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn't relevant for analysis?

> This project does not collect or use any personally identifiable information. All data are ecological and site-level, such as percent coral cover and thermal stress metrics.

 - [X] **A.4 Downstream bias mitigation**: Have we considered ways to enable testing downstream results for biased outcomes (e.g., collecting data on protected group status like race or gender)?

> Because this project does not involve human populations or protected groups, downstream bias related to demographic characteristics is not applicable

### B. Data Storage
 - [X] **B.1 Data security**: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?

> The datasets used are publicly available and contain no sensitive information. Data will be stored locally for analysis using standard file protections. While advanced security measures are not required, care will be taken to avoid accidental modification or loss of data.

 - [X] **B.2 Right to be forgotten**: Do we have a mechanism through which an individual can request their personal information be removed?

> The project does not use personal or individual-level data. There are no individuals whose data could be removed upon request.

 - [X] **B.3 Data retention plan**: Is there a schedule or plan to delete the data after it is no longer needed?
> Data will be retained only for the duration of the course project and may be deleted afterward. Since the data are publicly available, long-term storage does not pose ethical concerns.

### C. Analysis
 - [X] **C.1 Missing perspectives**: Have we sought to address blindspots in the analysis through engagement with relevant stakeholders (e.g., checking assumptions and discussing implications with affected communities and subject matter experts)?

> Our analysis does not include input from reef managers or local communities. Our findings *could* influence policy or funding, e.g., reefs with faster recovery might get more attention, while slower-recovering reefs could be deprioritized. We will clarify that recovery trajectories are descriptive, not value judgments.

 - [X] **C.2 Dataset bias**: Have we examined the data for possible sources of bias and taken steps to mitigate or address these biases (e.g., stereotype perpetuation, confirmation bias, imbalanced classes, or omitted confounding variables)?

> Potential sources of bias include uneven temporal coverage across sites, missing years of data, and unmeasured confounding factors such as storms, pollution, or local management practices. These limitations may affect interpretation of recovery trajectories and will be discussed when presenting results.

 - [X] **C.3 Honest representation**: Are our visualizations, summary statistics, and reports designed to honestly represent the underlying data?

> Visualizations and summary statistics will be designed to accurately reflect the underlying data, including showing uncertainty, missing data, and variability across sites. We will avoid visual choices that exaggerate trends or imply causation.

 - [X] **C.4 Privacy in analysis**: Have we ensured that data with PII are not used or displayed unless necessary for the analysis?

> No data with personally identifiable information will be used or displayed, as all data are ecological and environmental in nature.

 - [X] **C.5 Auditability**: Is the process of generating the analysis well documented and reproducible if we discover issues in the future?

> The data cleaning, merging, and analysis process will be documented using reproducible code and clear descriptions of methods. This allows the analysis to be reviewed or revisited if issues are discovered later.


### D. Modeling
 - [X] **D.1 Proxy discrimination**: Have we ensured that the model does not rely on variables or proxies for variables that are unfairly discriminatory?

> This project does not involve predictive models that affect individuals, nor does it include demographic variables.

 - [X] **D.2 Fairness across groups**: Have we tested model results for fairness with respect to different affected groups (e.g., tested for disparate error rates)?

> Fairness across human groups is not applicable. However, we recognize that modeling choices may implicitly emphasize certain regions or reef types over others due to data availability.

 - [X] **D.3 Metric selection**: Have we considered the effects of optimizing for our defined metrics and considered additional metrics?

> Recovery metrics such as changes in percent live coral cover or trajectory slopes were chosen because they are commonly used in reef ecology. We acknowledge that no single metric fully captures reef health and will discuss this limitation.

 - [X] **D.4 Explainability**: Can we explain in understandable terms a decision the model made in cases where a justification is needed?

> The analytical methods used are interpretable and can be explained in clear terms. We will avoid complex models that obscure interpretation.

 - [X] **D.5 Communicate limitations**: Have we communicated the shortcomings, limitations, and biases of the model to relevant stakeholders in ways that can be generally understood?

> Limitations such as observational data, lack of causal inference, and incomplete coverage will be clearly communicated in the final report to avoid misinterpretation of results.


### E. Deployment
 - [X] **E.1 Monitoring and evaluation**: Do we have a clear plan to monitor the model and its impacts after it is deployed (e.g., performance monitoring, regular audit of sample predictions, human review of high-stakes decisions, reviewing downstream impacts of errors or low-confidence decisions, testing for concept drift)?

> This project is exploratory and academic in nature and will not be deployed in a production environment. Ongoing monitoring is therefore not applicable.

 - [X] **E.2 Redress**: Have we discussed with our organization a plan for response if users are harmed by the results (e.g., how does the data science team evaluate these cases and update analysis and models to prevent future harm)?

> Because this analysis does not produce decisions affecting individuals or communities directly, formal redress mechanisms are not required. However, we aim to present findings responsibly to avoid misuse.

 - [X] **E.3 Roll back**: Is there a way to turn off or roll back the model in production if necessary?

> There is no deployed system or model to roll back. If errors are discovered, analyses and conclusions can be revised.

 - [X] **E.4 Unintended use**: Have we taken steps to identify and prevent unintended uses and abuse of the model and do we have a plan to monitor these once the model is deployed?

> While the project is academic, results could potentially be misused if interpreted as causal or definitive. To mitigate this, we will clearly state the scope, assumptions, and limitations of the analysis.


## Team Expectations 

1. **Clear communication** - weekly meeting (Tuesday 5:00-6:00PM)
- Assign roles for the week/what we want to accomplish by the next meeting
<br>
<br>
2. **Github**
- Don't push without verifying with others
<br>
<br>
3. **Disagreements**
- Vote for majority
- Flip coin if we can’t come to conclusion


## Project Timeline Proposal


| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 2/3  |  5 PM | Think of some project ideas  | Decide on an idea and start working on the proposal | 
| 2/5  |  5 PM  |  Complete project proposal | Divvy up work and expectations on gathering info | 
| 2/10  | 5 PM  | Background research on topic  | Discuss ideal dataset(s) and ethics; draft project proposal + start data checkpoint 1   |
| 2/17  | 5 PM  | Work on data checkpoint 1 | Clean up and finalize checkpoint 1 + discuss first steps of EDA   |
| 2/24  | 5 PM  | Background info on different ways to analyze the data | Discuss how we want to deeply analyze our data and present it. Divvy up work on EDA |
| 3/3  | 5 PM  | Work on EDA| Finishing touches on EDA checkpoint + figure out what needs to be done for final submission |
| 3/10  | 5 PM  | Work on cleaning up + finishing up requirements for final submission | Turn in Final Project & Group Project Surveys |
| 3/13  | 5 PM  | Work on final submission | Discuss video (script, who says what, etc) |
| 3/20  | Before 11:59 PM  | Practice for video, Fine detailing submission | Record video, Turn in Final Project & Group Project Surveys |