# Ejercicio: Optimización de parámetros.
- Utilizando el modelo y los datos de validación de ejercicio [Ejercicio: Clasificación con XGBoost](../module_4/4_03.ipynb), optimiza el parámetro max_depth cd xgboost. Utiliza para ello HyperparameterTuner.
- Para el parámetro max_depth usa sagemaker.parameter.CategoricalParameter([2, 3, 4, 5, 6, 7, 8]) por ejmplo.
- Usa "validation:auc" como métrica del HyperparameterTuner.
- Visualiza el resultado.


In [3]:
import sagemaker

role = sagemaker.get_execution_role()
sess = sagemaker.Session()
region = sess.boto_region_name

bucket = sess.default_bucket()
prefix = 'module_4/part_3'

print(role)
print(sess)
print(region)
print(bucket)
print(prefix)

arn:aws:iam::467432373215:role/service-role/AmazonSageMaker-ExecutionRole-20221206T164397
<sagemaker.session.Session object at 0x7f9125c3c0d0>
eu-west-1
sagemaker-eu-west-1-467432373215
module_4/part_3


In [4]:
image = sagemaker.image_uris.retrieve("xgboost", region, "1.5-1")
print(image)

141502667606.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-xgboost:1.5-1


In [5]:
s3_train_data = f's3://{bucket}/{prefix}/data/train.csv'
s3_validation_data = f's3://{bucket}/{prefix}/data/validation.csv'

print(s3_train_data)
print(s3_validation_data)


s3://sagemaker-eu-west-1-467432373215/module_4/part_3/data/train.csv
s3://sagemaker-eu-west-1-467432373215/module_4/part_3/data/validation.csv


In [6]:
train_input = sagemaker.TrainingInput(
    s3_train_data, 
    content_type="text/csv",
)
validation_input = sagemaker.TrainingInput(
    s3_validation_data,
    content_type="text/csv",
)

data_channels = {
    'train': train_input, 
    'validation': validation_input
}


In [22]:
s3_output_location = f's3://{bucket}/{prefix}/output'

hyperparameters = {
    "max_depth": "5",
    "eta": "0.2",
    "gamma": "4",
    "min_child_weight": "6",
    "subsample": "0.7",
    "objective": "binary:logistic",
    "num_round": "50",
    "eval_metric": "auc",
}


estimator = sagemaker.estimator.Estimator(
    image_uri=image,
    role=role,
    instance_count=1,
    hyperparameters=hyperparameters,
    instance_type="ml.c4.xlarge",
    output_path=s3_output_location,
    sagemaker_session=sess,
)


In [30]:
# https://sagemaker.readthedocs.io/en/stable/api/training/parameter.html#sagemaker.parameter.ParameterRange
# https://sagemaker-examples.readthedocs.io/en/latest/hyperparameter_tuning/xgboost_random_log/hpo_xgboost_random_log.html
hyperparameter_ranges = {
    "max_depth": sagemaker.parameter.IntegerParameter(max_value=10, min_value=2),
    "alpha": sagemaker.parameter.ContinuousParameter(0.01, 10, scaling_type="Logarithmic"),
    "lambda": sagemaker.parameter.ContinuousParameter(0.01, 10, scaling_type="Logarithmic"),
}

In [33]:
# https://sagemaker.readthedocs.io/en/stable/api/training/tuner.html
tuner = sagemaker.tuner.HyperparameterTuner(
    estimator,
    "validation:auc",
    hyperparameter_ranges,
    objective_type='Maximize',
    max_jobs=20,
    max_parallel_jobs=10,
    strategy="Random",
)

In [34]:
jobname = f'xgboost-quiebras-opt-3'
tuner.fit(    
    inputs=data_channels,
    job_name=jobname,
)

No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config


.................................................................!


- Podemos ver los resultados con HyperparameterTuningJobAnalytics.
- También podemos verlo en la pantalla de experimentos.

In [35]:
df= sagemaker.HyperparameterTuningJobAnalytics(
    tuner.latest_tuning_job.job_name
).dataframe()
df

Unnamed: 0,alpha,lambda,max_depth,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
0,6.766728,0.490861,5.0,xgboost-quiebras-opt-3-020-9cc29ecf,Completed,0.9298,2022-12-14 11:43:54+00:00,2022-12-14 11:44:21+00:00,27.0
1,0.508403,0.012132,10.0,xgboost-quiebras-opt-3-019-6754a896,Completed,0.93,2022-12-14 11:43:53+00:00,2022-12-14 11:44:19+00:00,26.0
2,0.023224,2.260909,2.0,xgboost-quiebras-opt-3-018-6d2046f7,Completed,0.92353,2022-12-14 11:44:38+00:00,2022-12-14 11:45:06+00:00,28.0
3,0.06197,5.324785,8.0,xgboost-quiebras-opt-3-017-f376e7f9,Completed,0.9238,2022-12-14 11:43:39+00:00,2022-12-14 11:44:06+00:00,27.0
4,0.505031,6.377878,2.0,xgboost-quiebras-opt-3-016-2b89745e,Completed,0.9307,2022-12-14 11:43:33+00:00,2022-12-14 11:44:00+00:00,27.0
5,1.27238,0.106287,6.0,xgboost-quiebras-opt-3-015-c53164a6,Completed,0.92041,2022-12-14 11:44:12+00:00,2022-12-14 11:44:39+00:00,27.0
6,2.648858,9.637087,8.0,xgboost-quiebras-opt-3-014-c1d2372e,Completed,0.93364,2022-12-14 11:43:28+00:00,2022-12-14 11:43:55+00:00,27.0
7,0.101433,3.635361,5.0,xgboost-quiebras-opt-3-013-a8d961ba,Completed,0.92551,2022-12-14 11:43:26+00:00,2022-12-14 11:43:53+00:00,27.0
8,0.521182,0.292266,5.0,xgboost-quiebras-opt-3-012-86895bc6,Completed,0.91551,2022-12-14 11:43:16+00:00,2022-12-14 11:43:43+00:00,27.0
9,1.216255,1.681876,3.0,xgboost-quiebras-opt-3-011-ba8e7ce2,Completed,0.92991,2022-12-14 11:43:13+00:00,2022-12-14 11:43:40+00:00,27.0


In [37]:
df.sort_values(by='FinalObjectiveValue', ascending=False)

Unnamed: 0,alpha,lambda,max_depth,TrainingJobName,TrainingJobStatus,FinalObjectiveValue,TrainingStartTime,TrainingEndTime,TrainingElapsedTimeSeconds
14,2.676781,5.136473,8.0,xgboost-quiebras-opt-3-006-498c71a1,Completed,0.93428,2022-12-14 11:42:37+00:00,2022-12-14 11:43:44+00:00,67.0
6,2.648858,9.637087,8.0,xgboost-quiebras-opt-3-014-c1d2372e,Completed,0.93364,2022-12-14 11:43:28+00:00,2022-12-14 11:43:55+00:00,27.0
4,0.505031,6.377878,2.0,xgboost-quiebras-opt-3-016-2b89745e,Completed,0.9307,2022-12-14 11:43:33+00:00,2022-12-14 11:44:00+00:00,27.0
1,0.508403,0.012132,10.0,xgboost-quiebras-opt-3-019-6754a896,Completed,0.93,2022-12-14 11:43:53+00:00,2022-12-14 11:44:19+00:00,26.0
9,1.216255,1.681876,3.0,xgboost-quiebras-opt-3-011-ba8e7ce2,Completed,0.92991,2022-12-14 11:43:13+00:00,2022-12-14 11:43:40+00:00,27.0
0,6.766728,0.490861,5.0,xgboost-quiebras-opt-3-020-9cc29ecf,Completed,0.9298,2022-12-14 11:43:54+00:00,2022-12-14 11:44:21+00:00,27.0
16,0.201479,0.487739,2.0,xgboost-quiebras-opt-3-004-643b1d6a,Completed,0.92979,2022-12-14 11:42:35+00:00,2022-12-14 11:43:17+00:00,42.0
18,0.13572,5.012392,9.0,xgboost-quiebras-opt-3-002-6fc81073,Completed,0.92787,2022-12-14 11:42:23+00:00,2022-12-14 11:43:25+00:00,62.0
10,0.066393,3.728523,9.0,xgboost-quiebras-opt-3-010-0c04465d,Completed,0.92597,2022-12-14 11:42:35+00:00,2022-12-14 11:43:28+00:00,53.0
7,0.101433,3.635361,5.0,xgboost-quiebras-opt-3-013-a8d961ba,Completed,0.92551,2022-12-14 11:43:26+00:00,2022-12-14 11:43:53+00:00,27.0
