- Import library and read data with pandas 

### Spearman's Correlation

In [1]:
# Read in the data from the csv file

import pandas as pd

CO2Data = pd.read_csv("../data/Terminos_lagoon_TA_DIC_2023_RawData.csv")

In [7]:
from scipy import stats

def test_spearman(x, y, alpha=0.05):
    if len(x) != len(y):
        raise ValueError("Las variables deben tener la misma longitud.")

    rho, pval = stats.spearmanr(x, y)

    print(f"Spearman's correlation coefficient (rho): {rho:.3f}")
    print(f"p-value: {pval:.9f}")

    if pval < alpha:
        print(f"✔️ Significant relationship (p < {alpha:.3f})")
    else:
        print(f"⚠️ No significant relationship (p ≥ {alpha:.3f})")

    if rho > 0:
        print("📈 Positive correlation")
    else:
        print("📉 Negative correlation")

In [3]:
test_spearman(CO2Data["ta_micromol_kg"], CO2Data["dic_micromol_kg"])

Spearman's correlation coefficient (rho): 0.838
p-value: 0.0000
✔️ Significant relationship (p < 0.050)
📈 Positive correlation


### Exercise: Create a function to compute the Pearson correlation.
---



**When to Use Spearman's Correlation**

Spearman's correlation is a non-parametric measure that evaluates the strength and direction of the association between two variables based on their ranks. Consider using Spearman's correlation in the following scenarios:

1. **Non-Normal Data**: When your data do not follow a normal distribution, Spearman's correlation is appropriate because it does not assume normality.

2. **Ordinal Data**: If your variables are ordinal (i.e., they represent categories with a meaningful order but unknown intervals), Spearman's correlation is suitable. For example, rankings like first, second, and third place.

3. **Monotonic Relationships**: When the relationship between two variables is monotonic but not necessarily linear—meaning as one variable increases, the other either consistently increases or decreases—Spearman's correlation can effectively measure the strength of this association.

4. **Outliers Present**: Spearman's correlation is more robust to outliers compared to Pearson's correlation, making it a better choice when your data contain anomalies that could disproportionately influence the results.

In summary, use Spearman's correlation when your data are ordinal, not normally distributed, or when you suspect a monotonic relationship that isn't strictly linear. It's also a good choice when your data contain outliers that could affect the results of other correlation measures.

--- 

### Ejercicio

In [10]:
import pandas as pd

# Leer base de datos
CO2Data = pd.read_csv("../data/Terminos_lagoon_TA_DIC_2023_RawData.csv")

# Definimos la función de Pearson
def pearson_correlation(x, y):
    if len(x) != len(y):
        raise ValueError("Las dos listas deben tener la misma longitud")
    
    n = len(x)
    mean_x = sum(x) / n
    mean_y = sum(y) / n
    
    numerator = sum((x[i] - mean_x) * (y[i] - mean_y) for i in range(n))
    denominator = (sum((x[i] - mean_x) ** 2 for i in range(n)) ** 0.5) * \
                  (sum((y[i] - mean_y) ** 2 for i in range(n)) ** 0.5)
    
    if denominator == 0:
        return 0
    
    return numerator / denominator

# Extraer las dos variables de interés como listas
x = CO2Data["ta_micromol_kg"].dropna().tolist()
y = CO2Data["dic_micromol_kg"].dropna().tolist()

# Asegurar que ambas listas tengan la misma longitud
min_len = min(len(x), len(y))
x = x[:min_len]
y = y[:min_len]

# Calcular Pearson
r = pearson_correlation(x, y)
print("Correlación de Pearson entre TA y DIC:", r)

# Función de interpretación
def interpret_correlation(r):
    if abs(r) < 0.2:
        return "muy débil o inexistente"
    elif abs(r) < 0.4:
        return "débil"
    elif abs(r) < 0.6:
        return "moderada"
    elif abs(r) < 0.8:
        return "fuerte"
    else:
        return "muy fuerte"

# Imprimir resultado con interpretación
print(f"Correlación de Pearson entre TA y DIC: {r:.3f}")
print(f"Interpretación: La relación es {interpret_correlation(r)} "
      f"y de dirección {'positiva' if r > 0 else 'negativa'}.")

Correlación de Pearson entre TA y DIC: 0.8822837984862447
Correlación de Pearson entre TA y DIC: 0.882
Interpretación: La relación es muy fuerte y de dirección positiva.
