<a href="https://colab.research.google.com/github/rpezoa/mlvalpo/blob/main/regresion_MLvalpo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ejemplo de predicción de temperatura efectiva en estrellas masivas
***

**Objetivo de la actividad**: Usar herramientas de Python para probar algunos métodos ML (muy simples) para predecir la temperatura efectiva de espectros de estrellas masivas.

**Datos**: Un subconjunto de espectros de estrellas Be, generados por PhD (c) Daniela Turis (IFA-UV) con el código ZPEKTR.

Al usar ML, este problema se resuelve como un problema de **regresión**.

Información obre estrellas masivas en: https://massivestars.ifa.uv.cl/



## Bibliotecas de Python
***

In [1]:
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

import numpy as np
import time

from sklearn.metrics import r2_score
import matplotlib.pyplot as plt
import seaborn as sns

## Descarga de Datos y Pandas
***
* Datos generados con ZPEKTR, por PhD (c) Daniela Turis IFA-UV.
* Debe obtener el conjunto de espectros llamados: df_ZPEKTR_limb_lineal.csv

In [None]:
# Dataset provided by Daniela Turis, IFA-UV.

!gdown https://drive.google.com/uc?id=1m_GajQqDRcKrH8_ExG_0Yp_sQ4MrhZbN

## Pandas dataframe

In [None]:
data=pd.read_csv('df_ZPEKTR_limb_lineal.csv')
data.head()

### Construcción de matriz $X$ y vector $y$

In [None]:
X = data.iloc[:,0:170] # X matrix containing spectral lines (flux)
y_input = data.iloc[:,170:176] # input parameters
y_output = data.iloc[:,176:] # output parameters
y = data.iloc[:,176:177] # y matrix containing the values we want to predict
lambdas = np.array(data.columns[:-18]).astype("float")

plt.figure()
plt.plot(lambdas, X.iloc[2,:])
plt.title("Example of spectrum")
plt.xlabel("Wavelength")
plt.ylabel("Flux")
plt.grid()
plt.show()

In [18]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2, random_state=0)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.2, random_state=0)

In [None]:
X_train.shape, y_train.shape

## Regresión Lineal

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
import matplotlib.pyplot as plt

lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)

print("Linear Regression Results:")
print(f"MSE: {mean_squared_error(y_test, y_pred):.2f}")
print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f}")
print(f"R²: {r2_score(y_test, y_pred):.3f}")

plt.scatter(y_test, y_pred, alpha=0.6)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.xlabel("True $T_{eff}$")
plt.ylabel("Predicted $T_{eff}$")
plt.title("Linear Regression")
plt.show()

## Decision Tree

In [None]:
from sklearn.tree import DecisionTreeRegressor

tree = DecisionTreeRegressor(max_depth=4, random_state=42)
tree.fit(X_train, y_train)
y_pred = tree.predict(X_test)

print("Decision Tree Results:")
print(f"MSE: {mean_squared_error(y_test, y_pred):.2f}")
print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f}")
print(f"R²: {r2_score(y_test, y_pred):.3f}")

plt.scatter(y_test, y_pred, alpha=0.6)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.xlabel("True $T_{eff}$")
plt.ylabel("Predicted $T_{eff}$")
plt.title("Decision Tree (max_depth=4)")
plt.show()

## Random Forest

In [None]:
from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(n_estimators=10, random_state=42)
rf.fit(X_train, y_train)
y_pred = rf.predict(X_test)

print("Random Forest Results:")
print(f"MSE: {mean_squared_error(y_test, y_pred):.2f}")
print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f}")
print(f"R²: {r2_score(y_test, y_pred):.3f}")

plt.scatter(y_test, y_pred, alpha=0.6)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.xlabel("True $T_{eff}$")
plt.ylabel("Predicted $T_{eff}$")
plt.title("Random Forest")
plt.show()

In [None]:
from sklearn.ensemble import GradientBoostingRegressor

gb = GradientBoostingRegressor(random_state=42)
gb.fit(X_train, y_train)
y_pred2 = gb.predict(X_test)

print("Gradient Boosting Results:")
print(f"MSE: {mean_squared_error(y_test, y_pred):.2f}")
print(f"MAE: {mean_absolute_error(y_test, y_pred):.2f}")
print(f"R²: {r2_score(y_test, y_pred):.3f}")

plt.scatter(y_test, y_pred, alpha=0.6)
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--')
plt.xlabel("True $T_{eff}$")
plt.ylabel("Predicted $T_{eff}$")
plt.title("Gradient Boosting")
plt.show()