## DETECTION AND ATTRIBUTION

This notebook is an attempt to create a model to do a "Detection and Attribution" study out of our findings. 

It builds on the correlation of climatic indices (SAM, ENSO, IOD, Polar vortex) to the mooring data, outlining a possible approach using a multiple linear regression model. This approach assumes a linear relationship between the climatic indices and the upwelling phenomenon - which might not be true and needs a bit of research on still.

### STEPS THAT SHOULD BE DONE BEFORE GETTING HERE: 

STEP 1: Data Preparation
1. Collect historical data for the South Australian upwelling (mooring data from IMOS) and the climatic indices (SAM, ENSO, IOD, Polar vortex, etc) for the same time period. 
2. Preprocess the data to handle missing values, outliers, and inconsistencies.
- relevant notebooks: 


Step 2: Correlation Analysis
1. Calculate Pearson correlation coefficients between the upwelling data and each climatic index. This initial analysis will give you a sense of the relationships.
- relevant notebooks: 

### STEPS STILL TO DO: 
Step 1: Multiple Linear Regression Model
1. Organize your data into a dataset with columns for the upwelling and the climatic indices.
2. Divide the dataset into training and testing sets for model validation.

Step 2: Model Building and training.
1. Build the multiple linear regression model 
2. Train the model using the training dataset. The model will learn the relationships between the upwelling and the climatic indices.

Step 3: Model Evaluation
1. Evaluate the model's performance using the testing dataset. Metrics like Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared can assess how well the model fits the data.

Step 4: Interpretation
1. Interpret the model coefficients. A positive coefficient indicates a positive correlation between the climatic index and the upwelling, while a negative coefficient indicates a negative correlation.

Step 5: Scenario Analysis and Visualization
1. Simulate different scenarios by altering the values of the climatic indices in the model.
2. Plot the simulated upwelling values against the original observed values for visual comparison.

Step 8: Sensitivity Analysis
1. Perform sensitivity analysis by introducing variations in the climatic indices and observing their effects on the simulated upwelling.

Step 9: Discussion and Implications
1. Discuss the results of your model in the context of the South Australian upwelling phenomenon and the role of the climatic indices in its suppression.
2. Highlight any interesting findings, interactions between indices, and potential implications for the marine ecosystem.

Step 10: other models. 
1. You might need to explore more advanced modeling techniques, such as nonlinear regression, machine learning algorithms, or time series analysis, depending on the complexity of the relationships and the data.

### 2. Model Building and Training

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Load your data into a Pandas DataFrame
data = pd.read_csv('your_data.csv')  # Replace 'your_data.csv' with your data file path

# Separate features (climatic indices) and target (upwelling) variables
X = data[['SAM', 'ENSO', 'IOD', 'Polar_vortex']]
y = data['Upwelling']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear regression model
model = LinearRegression()

# Train the model on the training data
model.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = model.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

# Plot the predicted vs. actual upwelling values
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Upwelling")
plt.ylabel("Predicted Upwelling")
plt.title("Actual vs. Predicted Upwelling")
plt.show()


In [None]:
### 10. Other models 


In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt

# Load your data into a Pandas DataFrame
data = pd.read_csv('your_data.csv')  # Replace 'your_data.csv' with your data file path

# Separate features (climatic indices) and target (upwelling) variables
X = data[['SAM', 'ENSO', 'IOD', 'Polar_vortex']]
y = data['Upwelling']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Random Forest Regressor model
model = RandomForestRegressor(n_estimators=100, random_state=42)

# Train the model on the training data
model.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = model.predict(X_test)

# Evaluate the model's performance
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

# Plot the predicted vs. actual upwelling values
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Upwelling")
plt.ylabel("Predicted Upwelling")
plt.title("Actual vs. Predicted Upwelling")
plt.show()
