# Monitorización Inteligente de Mezclas Gaseosas\n
## Análisis de Datos para la Industria - Grupo 05\n
\n
Este notebook proporciona una versión interactiva del análisis.

## 1. Importar Librerías

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import warnings
warnings.filterwarnings('ignore')

plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')

print('✓ Librerías importadas correctamente')

## 2. Cargar Datos

In [None]:
# Definir nombres de columnas
columns = ['Time_s', 'CO_or_CH4_ppm', 'Ethylene_ppm'] + [f'Sensor_{i}' for i in range(1, 17)]

# Cargar dataset (ajusta la ruta según tu configuración)
filepath = 'ethylene_CO.txt'  # o 'ethylene_methane.txt'

df = pd.read_csv(filepath, sep=r'\\s+', header=None, names=columns, engine='python')

print(f'Dataset cargado: {len(df):,} registros')
print(f'Columnas: {len(df.columns)}')
print(f'Duración: {df['Time_s'].max():.0f} segundos')

df.head()

## 3. Análisis Exploratorio

In [None]:
# Estadísticas de concentraciones\n
print('Estadísticas de Concentraciones:')\n
df[['CO_or_CH4_ppm', 'Ethylene_ppm']].describe()

In [None]:
# Visualización de evolución temporal\n
fig, axes = plt.subplots(2, 1, figsize=(14, 8))\n
\n
# Subset de datos para visualización\n
subset = df[:10000]\n
\n
axes[0].plot(subset['Time_s'], subset['CO_or_CH4_ppm'], label='CO/CH₄', linewidth=0.8)\n
axes[0].set_ylabel('Concentración (ppm)')\n
axes[0].set_title('Evolución Temporal de Concentraciones')\n
axes[0].legend()\n
axes[0].grid(True, alpha=0.3)\n
\n
axes[1].plot(subset['Time_s'], subset['Ethylene_ppm'], label='Etileno', color='orange', linewidth=0.8)\n
axes[1].set_xlabel('Tiempo (s)')\n
axes[1].set_ylabel('Concentración (ppm)')\n
axes[1].legend()\n
axes[1].grid(True, alpha=0.3)\n
\n
plt.tight_layout()\n
plt.show()

In [None]:
# Análisis de sensores\n
sensor_cols = [col for col in df.columns if col.startswith('Sensor_')]\n
\n
print('Análisis de Sensores:')\n
for col in sensor_cols[:4]:  # Mostrar solo los primeros 4\n
    mean_val = df[col].mean()\n
    std_val = df[col].std()\n
    print(f'{col}: μ={mean_val:.2f}, σ={std_val:.2f}')

## 4. Preprocesamiento

In [None]:
# Downsampling\n
downsample_factor = 100\n
df_processed = df.iloc[::downsample_factor].copy().reset_index(drop=True)\n
\n
print(f'Registros después de downsampling: {len(df_processed):,}')\n
\n
# Convertir a kΩ\n
for col in sensor_cols:\n
    df_processed[col] = 40000 / df_processed[col]\n
\n
print('✓ Sensores convertidos a kΩ')

In [None]:
# Suavizado\n
window_size = 5\n
for col in sensor_cols:\n
    df_processed[f'{col}_smooth'] = df_processed[col].rolling(\n
        window=window_size, center=True, min_periods=1).mean()\n
\n
print(f'✓ Suavizado aplicado (ventana={window_size})')

## 5. Feature Engineering

In [None]:
# Lags\n
lag_steps = [1, 2, 5]\n
for col in sensor_cols:\n
    for lag in lag_steps:\n
        df_processed[f'{col}_lag{lag}'] = df_processed[col].shift(lag)\n
\n
print(f'✓ Lags creados: {lag_steps}')\n
\n
# Agregados\n
window_sizes = [5, 10]\n
for col in sensor_cols:\n
    for win in window_sizes:\n
        df_processed[f'{col}_mean{win}'] = df_processed[col].rolling(window=win, min_periods=1).mean()\n
        df_processed[f'{col}_std{win}'] = df_processed[col].rolling(window=win, min_periods=1).std()\n
\n
print(f'✓ Agregados creados: {window_sizes}')\n
\n
# Eliminar NaN\n
df_processed = df_processed.dropna()\n
print(f'Registros finales: {len(df_processed):,}')

## 6. Preparar Train/Test

In [None]:
# Definir features y targets\n
feature_cols = [col for col in df_processed.columns if col.startswith('Sensor_')]\n
target_cols = ['CO_or_CH4_ppm', 'Ethylene_ppm']\n
\n
X = df_processed[feature_cols].values\n
y = df_processed[target_cols].values\n
\n
# Split temporal\n
test_size = 0.2\n
n_test = int(len(X) * test_size)\n
\n
X_train, X_test = X[:-n_test], X[-n_test:]\n
y_train, y_test = y[:-n_test], y[-n_test:]\n
\n
print(f'Train: {len(X_train):,} | Test: {len(X_test):,}')\n
print(f'Features: {X.shape[1]}')\n
\n
# Escalado\n
scaler = StandardScaler()\n
X_train = scaler.fit_transform(X_train)\n
X_test = scaler.transform(X_test)\n
\n
print('✓ Datos escalados')

## 7. Entrenar Modelos

In [None]:
# Random Forest\n
print('Entrenando Random Forest...')\n
rf = RandomForestRegressor(n_estimators=50, max_depth=15, random_state=42, n_jobs=-1)\n
rf.fit(X_train, y_train)\n
\n
y_pred_rf = rf.predict(X_test)\n
mae_rf = mean_absolute_error(y_test, y_pred_rf)\n
r2_rf = r2_score(y_test, y_pred_rf)\n
\n
print(f'Random Forest - MAE: {mae_rf:.4f} ppm, R²: {r2_rf:.4f}')

## 8. Visualizar Resultados

In [None]:
# Predicciones vs Real\n
fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n
\n
for i, target in enumerate(target_cols):\n
    axes[i].scatter(y_test[:, i], y_pred_rf[:, i], alpha=0.5, s=10)\n
    axes[i].plot([y_test[:, i].min(), y_test[:, i].max()], \n
                [y_test[:, i].min(), y_test[:, i].max()], \n
                'r--', linewidth=2, label='Perfecto')\n
    axes[i].set_xlabel('Real (ppm)')\n
    axes[i].set_ylabel('Predicho (ppm)')\n
    axes[i].set_title(f'{target}')\n
    axes[i].legend()\n
    axes[i].grid(True, alpha=0.3)\n
\n
plt.suptitle('Predicciones vs Real - Random Forest')\n
plt.tight_layout()\n
plt.show()

## 9. Análisis de Residuales

In [None]:
# Calcular residuales\n
residuals = y_test - y_pred_rf\n
\n
# Visualizar\n
fig, axes = plt.subplots(1, 2, figsize=(14, 6))\n
\n
for i, target in enumerate(target_cols):\n
    axes[i].hist(residuals[:, i], bins=50, alpha=0.7, edgecolor='black')\n
    axes[i].axvline(0, color='red', linestyle='--', linewidth=2)\n
    axes[i].set_xlabel('Residual (ppm)')\n
    axes[i].set_ylabel('Frecuencia')\n
    axes[i].set_title(f'{target}')\n
    axes[i].grid(True, alpha=0.3, axis='y')\n
\n
plt.suptitle('Distribución de Residuales')\n
plt.tight_layout()\n
plt.show()

## 10. Conclusiones\n
\n
En este notebook hemos:\n
1. Cargado y explorado el dataset\n
2. Preprocesado las señales de sensores\n
3. Creado features temporales\n
4. Entrenado un modelo Random Forest\n
5. Evaluado las predicciones\n
\n
Para un análisis completo, ejecuta el script principal `proyecto_sensores_gas.py`