## Hyperparameter tuning of the server-side optimizer with Optuna

This notebook shows how to tune hyperparameters of the server-side optimizer, specifically the *learning rate* of *FedAdam*, using the Optuna package. Optuna supports Bayesian optimization for the selection of hyperparameter values. Tuning of other hyperparameter and/or other server-side optimizers can be done analogously. The notebook *Aggregators.ipynb* shows how to use different aggregators with the FEDn Python API.

For a complete list of implemented interfaces, please refer to the [FEDn APIs](https://fedn.readthedocs.io/en/latest/fedn.network.api.html#module-fedn.network.api.client). 

For implementation details related to how aggregators are implemented, we recommend to read [FEDn Framework Extensions](https://www.scaleoutsystems.com/post/fedn-framework-extensions).

Before starting this tutorial, make sure you have a project running in FEDn Studio and have created the compute package and the initial model. If you're not sure how to do this, please follow the instructions in sections 1, 2, and 3 of the [quickstart guide](https://fedn.readthedocs.io/en/latest/quickstart.html). 

In [36]:
from fedn import APIClient
import time
import json
import numpy as np

In this example, we assume the project is hosted on the public FEDn Studio. You can find the CONTROLLER_HOST address in the project dashboard.

**Note:** If you're using a local sandbox, the CONTROLLER_HOST will be "localhost," and the CONTROLLER_PORT will be 8092.

Next, you'll need to generate an access token. To do this, go to the project page in FEDn Studio, click on "Settings," then "Generate token." Copy the access token from the Studio and paste it into the notebook. In case you need further details, have a look at the [FEDn ClientAPIs](https://fedn.readthedocs.io/en/latest/apiclient.html#).

In [None]:
CONTROLLER_HOST = 'fedn.scaleoutsystems.com/fedn.scaleoutsystems.com/<your-project-name>'
ACCESS_TOKEN = '<your-access-token>'
client = APIClient(CONTROLLER_HOST,token=ACCESS_TOKEN, secure=True,verify=True)

Initialize FEDn with the compute package and seed model. Note that these files needs to be created separately. If you're not sure how to do this, please follow the instructions only in section 3 of the [quickstart guide](https://fedn.readthedocs.io/en/latest/quickstart.html#create-the-compute-package-and-seed-model).

In [None]:
client.set_active_package('../mnist-pytorch/package.tgz', 'numpyhelper')
client.set_active_model('../mnist-pytorch/seed.npz')
seed_model = client.get_active_model()

### Using Optuna to tune the server-side learning rate of FedAdam
The Optuna framework expects the user to define an objective function, which is used to evaluate the model given a certain set of hyperparameter values. This notebook is based on an existing example on the [FEDn Github](https://github.com/scaleoutsystems/fedn/tree/master/examples/mnist-pytorch), where we use a simple PyTorch model on the MNIST handwritten digit dataset. To evaluate the performance given different hyperparameter values, we will view the accuracy on the test set as the validation accuracy and we want to find the learning rate that maximizes this metric.

### Defining the objective function

For each choice of hyperparameter values, we start a new session, with a given number of rounds, using FEDn and train the global model with the current hyperparameter values. When the session has finished, we evaluate the performance attained in the session. This is where the objective function comes into play! The objective function should follows these steps:

1. Set a range for each hyperparameter to tune using the `trial` object in Optuna.
2. **Train the model**, using the hyperparameters suggested by Optuna.
3. Calculate and **return an evaluation metric**.

But before we define the objective function, we will create a function that defines how the evaluation metric shall be calculated (step 3) after each finished session. Below are two suggested methods for evaluating the performance attained in a session:

* **Highest score** - select the highest achieved test accuracy out of all rounds in the session.
* **Average final few rounds** - compute the average test accuracy over the final few (ex. 5) rounds to account for the stochastic nature of the test accuracy score.

…and how to implement them using FEDn, where the `eval_method` parameter determines which of the two methods to use:


In [40]:
def get_test_accuracy(client, n_rounds_in_session, eval_method='highest'):
    
    # Set number of rounds to average for 'smooth' method
    if eval_method == 'smooth':
        n_rounds_to_eval = min(5, n_rounds_in_session)
    else:
        n_rounds_to_eval = n_rounds_in_session
    
    # Get models in session based on eval_method
    models_in_session = client.get_model_trail()[-n_rounds_to_eval:]

    session_test_accuracy_scores = []
    for model in models_in_session:
        model_id = model["model"]

        # Wait to receive validation data
        wait_time = 0
        while True:
            time.sleep(1)
            wait_time += 1
            validations = client.get_validations(model_id=model_id)
            if validations['count'] != 0 or wait_time == 60:
                break

        # Average test accuracy over all contributing clients
        model_test_accuracy_scores = []
        for validation in validations['result']:
            metrics = json.loads(validation['data'])
            model_test_accuracy_scores.append(metrics['test_accuracy'])
            
        session_test_accuracy_scores.append(model_test_accuracy_scores)

    client_avg_test_accuracy_scores = [np.mean(x) for x in session_test_accuracy_scores]

    if eval_method == 'highest':
        # Return the highest test accuracy
        return np.amax(client_avg_test_accuracy_scores)
    elif eval_method == 'smooth':
        # Return the calculated mean accuracy
        return np.mean(client_avg_test_accuracy_scores)
    else:
        raise ValueError("Invalid eval_method. Use 'highest' or 'smooth'.")


Now that we have created a function to use in step 3, we will define the objective function. The code below shows how we can complete the three steps of the objective function with FEDn. The range in which Optuna will look for hyperparameter values is defined in **step 1**. Note that we are only tuning the learning rate of FedAdam in this example to keep things simple. **Step 2** entails starting a session and waiting for it to finish before evaluating the resulting model. In **step 3**, we simply call the function that we defined above and return the result.

**Note:** We start from the seed model in each session to ensure that each trial has the same starting point.

In [41]:
import optuna

# Objective function which will be sent to Optuna to evaluate the selection of hyperparameter values
def objective(trial):
    # Number of rounds per session
    n_rounds = 50

    # 1. Suggest hyperparameter priors
    learning_rate = trial.suggest_float("learning_rate", 1e-3, 1e-1, log=True)

    # 2. Train the model
    # Set session configurations (from seed model)
    session_config = {
                        "helper": "numpyhelper",
                        "aggregator": "fedopt",
                        "aggregator_kwargs": {
                            "serveropt": "adam",
                            "learning_rate": learning_rate
                            },
                        "model_id": seed_model['model'],
                        "rounds": n_rounds
                    }

    # Run session and get session id
    result_fedadam = client.start_session(**session_config)
    session_id = result_fedadam['config']['session_id']
    
    # Wait for the session to finish
    while not client.session_is_finished(session_id):
        time.sleep(1)
    
    # 3. Return validation accuracy for session
    return get_test_accuracy(client=client, n_rounds_in_session=n_rounds, eval_method="smooth")

### Creating, running and analyzing an Optuna study

It’s time to create and run our study to let Optuna find optimal server-side learning rate for FedAdam. At this stage, all that is left to do is to tell Optuna in which direction to optimize the objective function and how many hyperparameter values we want to try. We create an Optuna `study` object and since we are using the test accuracy for evaluation, we want to `maximize` the objective function in this example. We run the `optimize()` method, passing the `objective` function we defined earlier as a parameter and specify the number of hyperparameter values we want to try via the `n_trials` parameter. 

**Note:** Each trial starts a session, so the number of sessions will be `n_trials`.

In [42]:
# Create an Optuna study
study = optuna.create_study(direction="maximize")

# Optimize hyperparameters
study.optimize(objective, n_trials=20)

[I 2024-09-11 17:55:58,359] A new study created in memory with name: no-name-3523664a-7d8a-415c-953f-a97630ce9ef8
[I 2024-09-11 17:59:21,745] Trial 0 finished with value: 0.9595600068569183 and parameters: {'learning_rate': 0.012301378174433742}. Best is trial 0 with value: 0.9595600068569183.
[I 2024-09-11 18:02:47,210] Trial 1 finished with value: 0.9620999932289124 and parameters: {'learning_rate': 0.023368957405967835}. Best is trial 1 with value: 0.9620999932289124.
[I 2024-09-11 18:06:14,743] Trial 2 finished with value: 0.911080002784729 and parameters: {'learning_rate': 0.0013378005538716429}. Best is trial 1 with value: 0.9620999932289124.
[I 2024-09-11 18:09:36,215] Trial 3 finished with value: 0.9557799935340882 and parameters: {'learning_rate': 0.007007650689123562}. Best is trial 1 with value: 0.9620999932289124.
[I 2024-09-11 18:12:54,568] Trial 4 finished with value: 0.9598599970340729 and parameters: {'learning_rate': 0.042762352644500096}. Best is trial 1 with value: 0

Now we can easily access the results through the `study` object, for example the best learning rate:

In [43]:
opt_learning_rate = study.best_params['learning_rate']
opt_learning_rate

0.02945876449044553

…and visualize the optimization process:

In [44]:
optuna.visualization.plot_slice(study)

### Conclusion

In this post, we showed how to integrate Optuna with FEDn for hyperparameter tuning, using the example of tuning the learning rate of FedAdam. By defining an objective function and leveraging Optuna's efficient optimization, we automated the search for the best server-side learning rate to maximize test accuracy. With FEDn’s flexible API, we were able to evaluate performance in a flexible manner, whether by selecting the highest accuracy or averaging the final rounds.