<span style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">An Exception was encountered at '<a href="#papermill-error-cell">In [5]</a>'.</span>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import mlflow
import mlflow.sklearn
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder
import os
import sys

# Configurar MLflow
mlflow.set_tracking_uri("http://mlflow:5000")
mlflow.set_experiment("thermal_comfort_prediction")

<Experiment: artifact_location='s3://mlflow-artifacts/1', creation_time=1764625883917, experiment_id='1', last_update_time=1764625883917, lifecycle_stage='active', name='thermal_comfort_prediction', tags={}>

## 1. Carregamento e Limpeza de Dados

In [2]:
# Carregar dados
data_path = "/app/data/sample_thermal_data.csv"

# Se o arquivo n칚o existir ou quisermos for칞ar a regenera칞칚o
if not os.path.exists(data_path):
    print("Gerando dados sint칠ticos (2023-2025)...")
    sys.path.append("/app/scripts")
    from generate_data import generate_thermal_data
    # Gerar apenas 3 anos de dados
    df = generate_thermal_data(years_range=(2023, 2025))
    df.to_csv(data_path, index=False)
else:
    print("Carregando dados existentes...")
    df = pd.read_csv(data_path)

print(f"Total de registros: {len(df)}")
print(f"Per칤odo: {df['timestamp'].min()} a {df['timestamp'].max()}")
df.head()

Carregando dados existentes...
Total de registros: 26304
Per칤odo: 2023-01-01T00:00:00 a 2025-12-31T23:00:00


Unnamed: 0,timestamp,temperature,humidity,wind_velocity,pressure,solar_radiation,thermal_sensation,comfort_zone
0,2023-01-01T00:00:00,21.13,78.4,5.95,1009.75,0.0,21.37,Confort치vel
1,2023-01-01T01:00:00,20.95,82.2,5.69,1016.12,0.0,21.17,Confort치vel
2,2023-01-01T02:00:00,19.87,83.8,2.6,1011.37,0.0,20.47,Confort치vel
3,2023-01-01T03:00:00,19.75,50.7,3.03,1015.94,0.0,20.21,Confort치vel
4,2023-01-01T04:00:00,20.61,77.9,3.04,1010.98,0.0,21.24,Confort치vel


In [3]:
# Verificar nulos
print(df.isnull().sum())

# Tratamento b치sico (se houver nulos)
df = df.dropna()

timestamp            0
temperature          0
humidity             0
wind_velocity        0
pressure             0
solar_radiation      0
thermal_sensation    0
comfort_zone         0
dtype: int64


## 2. Prepara칞칚o dos Dados

In [4]:
# Features e Target
X = df[['temperature', 'humidity', 'wind_velocity', 'pressure', 'solar_radiation']]
y = df['comfort_zone']

# Split treino/teste
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Treino: {X_train.shape}, Teste: {X_test.shape}")

Treino: (21043, 5), Teste: (5261, 5)


## 3. Treinamento e Registro no MLflow

<span id="papermill-error-cell" style="color:red; font-family:Helvetica Neue, Helvetica, Arial, sans-serif; font-size:2em;">Execution using papermill encountered an exception here and stopped:</span>

In [5]:
with mlflow.start_run():
    # Par칙metros do modelo
    n_estimators = 100
    max_depth = 10
    
    mlflow.log_param("n_estimators", n_estimators)
    mlflow.log_param("max_depth", max_depth)
    
    # Treinar modelo
    clf = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth, random_state=42)
    clf.fit(X_train, y_train)
    
    # Predi칞칫es
    y_pred = clf.predict(X_test)
    
    # M칠tricas
    accuracy = accuracy_score(y_test, y_pred)
    print(f"Acur치cia: {accuracy:.4f}")
    
    mlflow.log_metric("accuracy", accuracy)
    
    # Salvar modelo
    mlflow.sklearn.log_model(clf, "random_forest_model")
    
    # Matriz de Confus칚o
    plt.figure(figsize=(10, 8))
    sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')
    plt.title("Matriz de Confus칚o")
    plt.ylabel("Real")
    plt.xlabel("Previsto")
    plt.savefig("confusion_matrix.png")
    mlflow.log_artifact("confusion_matrix.png")
    
    print("Experimento registrado no MLflow com sucesso!")

The git executable must be specified in one of the following ways:
    - be included in your $PATH
    - be set via $GIT_PYTHON_GIT_EXECUTABLE
    - explicitly set via git.refresh(<full-path-to-git-executable>)

All git commands will error until this is rectified.

This initial message can be silenced or aggravated in the future by setting the
$GIT_PYTHON_REFRESH environment variable. Use one of the following values:
    - quiet|q|silence|s|silent|none|n|0: for no message or exception
    - error|e|exception|raise|r|2: for a raised exception

Example:
    export GIT_PYTHON_REFRESH=quiet





Acur치cia: 0.9492




游끢 View run powerful-newt-875 at: http://mlflow:5000/#/experiments/1/runs/22a76ae990e5467bb156d51c56062592
游빍 View experiment at: http://mlflow:5000/#/experiments/1


S3UploadFailedError: Failed to upload /tmp/tmpt5q6sjqx/model/python_env.yaml to mlflow-artifacts/1/models/m-1fc3d98d7bf7487d99c395359eead8f3/artifacts/python_env.yaml: An error occurred (NoSuchBucket) when calling the PutObject operation: The specified bucket does not exist