<a href="https://colab.research.google.com/github/romariomamani/-portafolio/blob/main/OPTUNA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install optuna


Collecting optuna
  Downloading optuna-4.2.0-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.14.1-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.9-py3-none-any.whl.metadata (2.9 kB)
Downloading optuna-4.2.0-py3-none-any.whl (383 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m383.4/383.4 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.14.1-py3-none-any.whl (233 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.6/233.6 kB[0m [31m18.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading Mako-1.3.9-py3-none-any.whl (78 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.5/78.5 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: Ma

In [1]:
# First, install required packages
!pip install optuna==3.2.0

# Import necessary libraries
from google.colab import files
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import optuna
import os

print("Please upload your 'Real estate valuation data set.xlsx' file when prompted")
uploaded = files.upload()

print("\nFiles in current directory:")
print(os.listdir())

file_path = 'Real estate valuation data set.xlsx'
data = pd.read_excel(file_path)

print("\nFirst few rows of the dataset:")
print(data.head())

X = data.iloc[:, :-1]  # All columns except the last one
y = data.iloc[:, -1]   # Last column

X_train, X_valid, y_train, y_valid = train_test_split(
    X, y, test_size=0.2, random_state=42
)

def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 50, 400)
    max_depth = trial.suggest_int('max_depth', 2, 32)

    model = RandomForestRegressor(
        n_estimators=n_estimators,
        max_depth=max_depth,
        random_state=42
    )
    model.fit(X_train, y_train)

    y_pred = model.predict(X_valid)
    mse = mean_squared_error(y_valid, y_pred)

    return mse

study = optuna.create_study(direction='minimize')
study.optimize(objective, n_trials=100)

print("\nOptimization Results:")
print("Best hyperparameters:", study.best_params)
print("Best MSE:", study.best_value)

# Train the final model with the best hyperparameters
best_model = RandomForestRegressor(
    n_estimators=study.best_params['n_estimators'],
    max_depth=study.best_params['max_depth'],
    random_state=42
)
best_model.fit(X_train, y_train)

# Predict and evaluate
y_pred = best_model.predict(X_valid)
mse = mean_squared_error(y_valid, y_pred)
r2 = r2_score(y_valid, y_pred)
mae = mean_absolute_error(y_valid, y_pred)

print("\nEvaluation Metrics:")
print("Mean Squared Error (MSE):", mse)
print("R-squared (R2):", r2)
print("Mean Absolute Error (MAE):", mae)

try:
    print("\nGenerating visualization plots...")

    history_plot = optuna.visualization.plot_optimization_history(study)
    display(history_plot)

    slice_plot = optuna.visualization.plot_slice(study)
    display(slice_plot)

    contour_plot = optuna.visualization.plot_contour(
        study,
        params=["n_estimators", "max_depth"]
    )
    display(contour_plot)

except Exception as e:
    print(f"Error generating plots: {str(e)}")


Collecting optuna==3.2.0
  Downloading optuna-3.2.0-py3-none-any.whl.metadata (17 kB)
Collecting alembic>=1.5.0 (from optuna==3.2.0)
  Downloading alembic-1.14.1-py3-none-any.whl.metadata (7.4 kB)
Collecting cmaes>=0.9.1 (from optuna==3.2.0)
  Downloading cmaes-0.11.1-py3-none-any.whl.metadata (18 kB)
Collecting colorlog (from optuna==3.2.0)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Collecting Mako (from alembic>=1.5.0->optuna==3.2.0)
  Downloading Mako-1.3.9-py3-none-any.whl.metadata (2.9 kB)
Downloading optuna-3.2.0-py3-none-any.whl (390 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m390.6/390.6 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading alembic-1.14.1-py3-none-any.whl (233 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.6/233.6 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading cmaes-0.11.1-py3-none-any.whl (35 kB)
Downloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading Mako-1.3.

Saving Real estate valuation data set.xlsx to Real estate valuation data set.xlsx

Files in current directory:
['.config', 'Real estate valuation data set.xlsx', 'sample_data']


[I 2025-02-12 14:07:20,582] A new study created in memory with name: no-name-29d17084-7269-418d-8039-e4f401d9f94a



First few rows of the dataset:
   No  X1 transaction date  X2 house age  \
0   1          2012.916667          32.0   
1   2          2012.916667          19.5   
2   3          2013.583333          13.3   
3   4          2013.500000          13.3   
4   5          2012.833333           5.0   

   X3 distance to the nearest MRT station  X4 number of convenience stores  \
0                                84.87882                               10   
1                               306.59470                                9   
2                               561.98450                                5   
3                               561.98450                                5   
4                               390.56840                                5   

   X5 latitude  X6 longitude  Y house price of unit area  
0     24.98298     121.54024                        37.9  
1     24.98034     121.53951                        42.2  
2     24.98746     121.54391                        47.3 

[I 2025-02-12 14:07:21,935] Trial 0 finished with value: 32.472513135182595 and parameters: {'n_estimators': 308, 'max_depth': 16}. Best is trial 0 with value: 32.472513135182595.
[I 2025-02-12 14:07:22,674] Trial 1 finished with value: 32.97265596166094 and parameters: {'n_estimators': 207, 'max_depth': 20}. Best is trial 0 with value: 32.472513135182595.
[I 2025-02-12 14:07:23,392] Trial 2 finished with value: 31.75186404484451 and parameters: {'n_estimators': 251, 'max_depth': 7}. Best is trial 2 with value: 31.75186404484451.
[I 2025-02-12 14:07:24,638] Trial 3 finished with value: 31.90812584656686 and parameters: {'n_estimators': 364, 'max_depth': 12}. Best is trial 2 with value: 31.75186404484451.
[I 2025-02-12 14:07:25,077] Trial 4 finished with value: 33.32086292060482 and parameters: {'n_estimators': 125, 'max_depth': 14}. Best is trial 2 with value: 31.75186404484451.
[I 2025-02-12 14:07:25,903] Trial 5 finished with value: 33.090729558602725 and parameters: {'n_estimators':


Optimization Results:
Best hyperparameters: {'n_estimators': 56, 'max_depth': 8}
Best MSE: 30.95351444577708

Evaluation Metrics:
Mean Squared Error (MSE): 30.95351444577708
R-squared (R2): 0.8154890225047953
Mean Absolute Error (MAE): 3.795372029650292

Generating visualization plots...
