Here’s the revised section with detailed information on setting up metrics for hyperparameter tuning jobs:

---

## Hyperparameter Tuning in SageMaker: Neural Network Example

To conduct efficient hyperparameter tuning with neural networks in SageMaker, we’ll leverage SageMaker’s **hyperparameter tuning jobs** while carefully managing parameter ranges and model count. Here’s an overview of the process, with a focus on both efficiency and cost-effectiveness.

### Key Steps for Hyperparameter Tuning

1. **Define Parameter Ranges**: SageMaker supports both `ContinuousParameter` and `CategoricalParameter` for tuning. 
   - **`ContinuousParameter`** is ideal when you want to evenly sample numeric values across a range without specifying each exact value. 
   - **`CategoricalParameter`** allows you to explicitly set the values SageMaker will test, ensuring only specific values are sampled.
   
2. **Set Up Objective Metrics**: SageMaker relies on objective metrics to assess models and choose the best configuration. Here’s how to set up metrics:
   - In the training script, **log the target metric** (e.g., validation accuracy or loss) using `print` statements formatted to match the expected regular expression.
   - **Define `metric_definitions`** in both the estimator and tuner configurations. This redundancy can help avoid common errors and ensure the metric is captured correctly.
   - Example of logging and regex setup:
     ```python
     print(f"validation:accuracy = {val_accuracy:.4f}", flush=True)
     ```
   - The `metric_definitions` parameter uses a regular expression to match the logged output, enabling SageMaker to track the desired metric across training jobs.

3. **Resource-Conscious Approach**: To control costs, choose efficient instance types and limit the search to impactful parameters, keeping resource consumption in check.

### Balancing Tuning Time and Environmental/Monetary Costs

Understanding the roles of `instance_count`, `max_jobs`, and `max_parallel_jobs` is key for optimizing tuning time and cost:

1. **`instance_count`** (Estimator):
   - Specifies the number of instances used per training job. 
   - Increasing `instance_count` can benefit large datasets or deep networks needing distributed power, but it raises per-job costs. 
   - **Start with `instance_count=1`** (especially when you're first testing your setup) and increase only if single-instance training is inefficient.

2. **`max_jobs`** (HyperparameterTuner):
   - Controls the total number of training jobs, representing distinct hyperparameter combinations.
   - Begin with a moderate range, such as **10–20 jobs**; expand if further performance improvements justify it.

3. **`max_parallel_jobs`** (HyperparameterTuner):
   - Sets the number of jobs to run concurrently, impacting overall tuning time but not total cost.
   - **Recommended range**: 2–4 for efficiency; increase only if time constraints and budget permit.

---

### Code Example for SageMaker Hyperparameter Tuning with Neural Networks

This setup provides:
- **Explicit control** over `epochs` using `CategoricalParameter`, allowing targeted testing of specific values.
- **Efficient sampling** for `learning_rate` using `ContinuousParameter`, covering a defined range for a balanced approach.
- **Cost control** by setting moderate `max_jobs` and `max_parallel_jobs`.

By managing these settings and configuring metrics properly, you can achieve a balanced and efficient approach to hyperparameter tuning for neural networks.

In [8]:
import sagemaker
from sagemaker.tuner import HyperparameterTuner, IntegerParameter, ContinuousParameter, CategoricalParameter
from sagemaker.pytorch import PyTorch
from sagemaker.inputs import TrainingInput
from sagemaker import get_execution_role

# Initialize SageMaker session and role
session = sagemaker.Session()
role = get_execution_role()
bucket = 'titanic-dataset-test'  # replace with your S3 bucket name

# Define the PyTorch estimator with entry script and environment details
pytorch_estimator = PyTorch(
    entry_point="test_AWS/train_nn.py",  # Your script for training
    role=role,
    instance_count=1,
    instance_type="ml.m5.large",
    framework_version="1.9",
    py_version="py38",
    metric_definitions=[{"Name": "validation:accuracy", "Regex": "validation:accuracy = ([0-9\\.]+)"}],
    hyperparameters={
        "train": "/opt/ml/input/data/train/train_data.npz",  # SageMaker will mount this path
        "val": "/opt/ml/input/data/val/val_data.npz",        # SageMaker will mount this path
        "epochs": 100,
        "learning_rate": 0.001
    },
    sagemaker_session=session,
)

# Hyperparameter tuning ranges
hyperparameter_ranges = {
    "epochs": CategoricalParameter([100, 1000, 10000]),       # Adjust as needed
    "learning_rate": ContinuousParameter(0.001, 0.1),  # Range for continuous values
}

No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config
No finished training job found associated with this estimator. Please make sure this estimator is only used for building workflow config


.......................................................................................................................................................!
Hyperparameter tuning job launched.


Before running the full search, let's test our setup by setting max_jobs = 1. This will test just one possible hyperparameter configuration. 

In [None]:
# Tuner configuration
tuner = HyperparameterTuner(
    estimator=pytorch_estimator,
    metric_definitions=[{"Name": "validation:accuracy", "Regex": "validation:accuracy = ([0-9\\.]+)"}],
    objective_metric_name="validation:accuracy",  # Ensure this matches the metric name exactly
    objective_type="Maximize",                   # Specify if maximizing or minimizing the metric
    hyperparameter_ranges=hyperparameter_ranges,
    max_jobs=1,                # Adjust based on exploration needs (please keep below 30 to be kind to environment)
    max_parallel_jobs=1         # Adjust based on available resources and budget
)

# Define the input paths
train_input = TrainingInput(f"s3://{bucket}/train_data.npz", content_type="application/x-npz")
val_input = TrainingInput(f"s3://{bucket}/val_data.npz", content_type="application/x-npz")

# Launch the hyperparameter tuning job
tuner.fit({"train": train_input, "val": val_input})
print("Hyperparameter tuning job launched.")

If all goes well, we can scale up the experiment with the below code (20 hyperparameter configurations).

After running the below cell, we can check on the progress by visiting the SageMaker Console and finding the "Training" tab located on the left panel. Click "Hyperparmater tuning jobs" to view running jobs.

If you're seeing only "2/4 training completed" in the console, it may be because SageMaker initially schedules only a subset of the total jobs to run simultaneously, based on your setting for max_parallel_jobs. Here’s what’s happening:

* Initial Jobs: SageMaker starts by running only max_parallel_jobs (2 in this case) as the initial batch. As each job completes, new jobs from the remaining pool are triggered until max_jobs (20) is reached.
* Batch Scheduling: The "2/4" progress indicates that SageMaker has grouped the jobs into batches and is currently processing the first batch of four jobs (two of which are either in progress or completed). This behavior helps SageMaker optimize resources and allows monitoring to quickly identify high-performing configurations to prioritize in the subsequent batches.
* Job Completion: Once the first few jobs complete, SageMaker will continue to launch the remaining jobs up to the maximum of 20, but no more than two at a time.

This approach is designed to balance efficient exploration with resource constraints. If you want more jobs to start simultaneously, consider increasing max_parallel_jobs, but keep in mind the potential cost implications of running many jobs concurrently.

In [None]:
# Tuner configuration
tuner = HyperparameterTuner(
    estimator=pytorch_estimator,
    metric_definitions=[{"Name": "validation:accuracy", "Regex": "validation:accuracy = ([0-9\\.]+)"}],
    objective_metric_name="validation:accuracy",  # Ensure this matches the metric name exactly
    objective_type="Maximize",                   # Specify if maximizing or minimizing the metric
    hyperparameter_ranges=hyperparameter_ranges,
    max_jobs=20,                # Adjust based on exploration needs (please keep below 30 to be kind to environment)
    max_parallel_jobs=4         # Adjust based on available resources and budget
)

# Define the input paths
train_input = TrainingInput(f"s3://{bucket}/train_data.npz", content_type="application/x-npz")
val_input = TrainingInput(f"s3://{bucket}/val_data.npz", content_type="application/x-npz")

# Launch the hyperparameter tuning job
tuner.fit({"train": train_input, "val": val_input})
print("Hyperparameter tuning job launched.")

### Can/should we run more instances in parallel?
Setting max_parallel_jobs to 20 (equal to max_jobs) will indeed launch all 20 jobs in parallel. This approach won’t affect the total cost (since cost is based on the number of total jobs, not how many run concurrently), but it can impact the final results and resource usage pattern due to SageMaker's ability to dynamically select hyperparameter values to test to maximize efficiency and improve model performance. This adaptability is especially useful for neural networks, which often have a large hyperparameter space with complex interactions. Here’s how SageMaker’s approach impacts typical neural network training:

### 1. **Adaptive Search Strategies**
   - SageMaker offers **Bayesian optimization** for hyperparameter tuning. Instead of purely random sampling, it learns from previous jobs to choose the next set of hyperparameters more likely to improve the objective metric.
   - For neural networks, this strategy can help converge on better-performing configurations faster by favoring promising areas of the hyperparameter space and discarding poor ones.

### 2. **Effect of `max_parallel_jobs` on Adaptive Tuning**
   - When using Bayesian optimization, a lower `max_parallel_jobs` (e.g., 2–4) can allow SageMaker to iteratively adjust and improve its choices. Each batch of jobs informs the subsequent batch, which may yield better results over time.
   - Conversely, if all jobs are run in parallel (e.g., `max_parallel_jobs=20`), SageMaker can’t learn and adapt within a single batch, making this setup more like a traditional grid or random search. This approach is still valid, especially for small search spaces, but it doesn’t leverage the full potential of adaptive tuning.

### 3. **Practical Impact on Neural Network Training**
   - **For simpler models** or smaller parameter ranges, running jobs in parallel with a higher `max_parallel_jobs` works well and quickly completes the search.
   - **For more complex neural networks** or large hyperparameter spaces, an adaptive strategy with a smaller `max_parallel_jobs` may yield a better model with fewer total jobs by fine-tuning hyperparameters over multiple iterations.

### Summary
- **For fast, straightforward tuning**: Set `max_parallel_jobs` closer to `max_jobs` for simultaneous testing.
- **For adaptive, refined tuning**: Use a smaller `max_parallel_jobs` (like 2–4) to let SageMaker leverage adaptive tuning for optimal configurations. 

This balance between exploration and exploitation is particularly impactful in neural network tuning, where training costs can be high and parameters interact in complex ways.


### Important Details and Best Practices
1. **Always Test Before Scaling Up**: Before tunning the full search, make sure to test your code setup with max_jobs set to 1.

1. **Parameter Ranges**:
   - **Learning Rate**: Cover a wide range (0.0001 to 0.01) as it often has a significant effect.
   - **Batch Size**: Typically smaller values are computationally lighter but may require more tuning. Ranging from 16 to 64 here.
   - **Hidden Units**: Adjust the number of units per layer within a moderate range (32 to 128) to optimize model complexity without excess.

2. **Objective Metric**: Choose `validation_loss` to minimize overfitting; it’s parsed from log outputs by the specified `Regex`.

3. **Instance Count for Tuning**:
   - **Parallel Jobs**: Run up to 5 jobs in parallel to utilize multiple instances and speed up the search.
   - **Instance Type**: `ml.m5.large` balances cost and performance for this job, though `ml.m5.xlarge` is a possible upgrade if needed.

4. **Environmental Considerations**:
   - By limiting hyperparameter jobs to 10-20, we reduce carbon footprint and resource usage. Only critical parameters are tuned, and a limited range helps avoid unnecessary jobs.
   - Encourage reusing efficient hyperparameters in similar tasks, reducing the need for repeated tuning.

5. **Monitoring and Results**:
   - SageMaker’s `tuner` job will report validation loss for each configuration, and you can select the optimal one based on these results.
   - Consider logging model performances and job configurations to guide future jobs, reducing redundancy.

This setup demonstrates a responsible yet effective approach to hyperparameter tuning with distributed SageMaker instances, allowing for both environmental and cost efficiency without sacrificing performance.

---

### 1. Retrieve the Best Training Job from the Tuning Job
First, let’s retrieve details about the best model from the tuning job. This will include the best hyperparameters and access to the model artifacts.


In [9]:
# Get the best training job from the completed tuning job
best_training_job = tuner.best_training_job()
print("Best training job name:", best_training_job)

# Retrieve best hyperparameters
best_job_desc = session.sagemaker_client.describe_training_job(TrainingJobName=best_training_job)
best_hyperparameters = best_job_desc["HyperParameters"]
print("Best hyperparameters:", best_hyperparameters)

Best training job name: pytorch-training-241029-2307-015-b999e972
Best hyperparameters: {'_tuning_objective_metric': 'validation:accuracy', 'epochs': '"100"', 'learning_rate': '0.002292278623021597', 'sagemaker_container_log_level': '20', 'sagemaker_estimator_class_name': '"PyTorch"', 'sagemaker_estimator_module': '"sagemaker.pytorch.estimator"', 'sagemaker_job_name': '"pytorch-training-2024-10-29-23-07-03-095"', 'sagemaker_program': '"train_nn.py"', 'sagemaker_region': '"us-east-1"', 'sagemaker_submit_directory': '"s3://sagemaker-us-east-1-183295408236/pytorch-training-2024-10-29-23-07-03-095/source/sourcedir.tar.gz"', 'train': '"/opt/ml/input/data/train/train_data.npz"', 'val': '"/opt/ml/input/data/val/val_data.npz"'}


In [12]:
import boto3

# # Initialize SageMaker client
sagemaker_client = boto3.client("sagemaker")

# Retrieve tuning job details
tuning_job_name = tuner.latest_tuning_job.name  # Replace with your tuning job name if needed
tuning_job_desc = sagemaker_client.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)

# Retrieve all training jobs for the tuning job
training_jobs = sagemaker_client.list_training_jobs_for_hyper_parameter_tuning_job(
    HyperParameterTuningJobName=tuning_job_name, StatusEquals='Completed'
)["TrainingJobSummaries"]

# Calculate total training and billing time
total_training_time = 0
total_billing_time = 0

for job in training_jobs:
    job_name = job["TrainingJobName"]
    job_desc = sagemaker_client.describe_training_job(TrainingJobName=job_name)
    
    # Calculate training time (in seconds)
    training_time = job_desc["TrainingEndTime"] - job_desc["TrainingStartTime"]
    total_training_time += training_time.total_seconds()
    
    # Calculate billed time (in seconds, rounded up to the nearest second)
    billed_time = job_desc["ResourceConfig"]["InstanceCount"] * training_time.total_seconds()
    total_billing_time += billed_time

# Print total compute and billing time
print(f"Total training time across all jobs: {total_training_time / 3600:.2f} hours")
print(f"Estimated total billing time across all jobs: {total_billing_time / 3600:.2f} hours")


Total training time across all jobs: 0.12 hours
Estimated total billing time across all jobs: 0.12 hours




### 2. Deploy the Best Model as an Endpoint (Optional)
If you want to deploy the best model as a SageMaker endpoint for real-time inference, you can use the following code:

```python
# Deploy the best model to an endpoint
best_predictor = tuner.deploy(
    initial_instance_count=1, 
    instance_type="ml.m5.large"
)

# Example inference call
import numpy as np
test_data = np.array([[...]])  # Replace with sample test data
predictions = best_predictor.predict(test_data)
print("Predictions:", predictions)
```

### 3. Download the Best Model Artifacts for Local Evaluation
If you want to use the best model locally or load it for further analysis, you can download the model artifacts from S3:

```python
import boto3

# Specify where to save the model locally
local_model_path = "best_model.tar.gz"

# Get the model artifacts S3 path
model_s3_path = best_job_desc["ModelArtifacts"]["S3ModelArtifacts"]
print("Model artifact S3 path:", model_s3_path)

# Download the model
s3 = boto3.client("s3")
s3.download_file(
    Bucket=model_s3_path.split('/')[2],
    Key='/'.join(model_s3_path.split('/')[3:]),
    Filename=local_model_path
)
print("Best model downloaded to:", local_model_path)
```

### 4. Unpack and Load the Model Locally (for PyTorch example)
Assuming this is a PyTorch model, you can unpack and load it:

```python
import torch

# Extract and load the model (assuming the model is saved as `model.pth` in the archive)
import tarfile

with tarfile.open(local_model_path, "r:gz") as tar:
    tar.extractall("model_dir")

# Load the model
model = TitanicNet()  # Initialize your model class
model.load_state_dict(torch.load("model_dir/nn_model.pth"))
model.eval()  # Set to evaluation mode

# Now `model` is ready for inference or evaluation on test data
```

### 5. Evaluate the Best Model on a Test Set
Use your test dataset for final evaluation, as follows:

```python
# Assuming you have test data in numpy format
test_data = np.load("test_data.npz")
X_test = torch.tensor(test_data['X_test'], dtype=torch.float32)
y_test = torch.tensor(test_data['y_test'], dtype=torch.float32).unsqueeze(1)

# Inference and evaluation
with torch.no_grad():
    predictions = model(X_test)
    predictions = predictions.round()  # Round for binary classification
    accuracy = (predictions == y_test).float().mean().item()
    print(f"Test Set Accuracy: {accuracy:.4f}")
```

These steps provide a comprehensive approach to accessing, deploying, and evaluating the best model from your tuning job.