Authors: David E. Bernal Neira (david.e.bernalneira)<br>

Copyright © 2023, United States Government, as represented by the Administrator
of the National Aeronautics and Space Administration. All rights reserved.

The *PySA*, a powerful tool for solving optimization problems is licensed under
the Apache License, Version 2.0 (the "License"); you may not use this file
except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0. 

Unless required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

# PySA Hyperparameter optimization via Hyperopt
This tutorial explains how to perform a hyperparameter optimization to find good parameters when addressing a challenging problem for PySA.
The problem at hard is a Wishart instance. This kind of problems can be automatically generated, have a predetermined known (or <em>planted</em>) solution, and can be really challenging for QUBO solvers such as PySA. For more details see the following *[paper](
https://doi.org/10.1103/PhysRevE.101.052102)*.

This tutorial shows how to:
- Import automatically created problem from the generator **[Chook](https://github.com/dilinanp/chook)**
- Parameterize PySA minimum and maximum temperature automatically based on the coefficients of the problem
- Perform a Hyperparameter optimization via **[Hyperopt](https://hyperopt.github.io/hyperopt/)** for PySA

## Importing packages

Here we assume that you have PySA installed. Moreover, we will make use of the libraries **[numpy](https://numpy.org/)** and **[scipy](https://scipy.org/)** for the matrices processing.
Finally, for Hyperparameter optimization, we will use the library **[Hyperopt](https://hyperopt.github.io/hyperopt/)**.

In [1]:
# Import PySA
from pysa.sa import Solver

# Import Numpy and Scipy for the processing of (sparse) matrices
from scipy.sparse import csr_matrix
import numpy as np

# Import Hyperopt for the optimization of the hyperparameters
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from hyperopt.fmin import generate_trials_to_calculate

# Import Matplotlib for the plotting of the results
import matplotlib.pyplot as plt

%matplotlib inline

ModuleNotFoundError: No module named 'pysa'

## The Wishart problem and loading it from the library Chook
We will solve an instance of a Wishart problem. The problem is defined as follows:
$$
\min_{x\in \{-1,1\}^N} x^\top W x,
$$
where $W$ is derived such that the optimal solution to the problem above, $x^*$, is the solution of the nullspace of a system of linear equations, i.e., $Ax = 0$, with $A \in \mathbb{R}^{M\times N}$.
The complexity of this problem can be controlled by the ratio of the rows and columns of $A$ in a parameter known as $\alpha$, i.e., $\alpha = N/M$.

More details of this problem can be found in the following *[paper](
https://doi.org/10.1103/PhysRevE.101.052102)*.

One can create instances of the Wishart problem parameterized by $N$ and $\alpha$, which is one of the functions of the library **[Chook](https://github.com/dilinanp/chook)**. Besides generating planted Wishart files, it also provides generators for Tile planting (2D/3D), Deceptive cluster loops (DCL), Equation planting (k-regular k-XORSAT), k-local planting, and more to come!

For the sake of this example we will generate a single instance of the Wishart Problem of size $N=50$ and $\alpha=0.5$. Chook will create a text file with the data of the $W$ matrix and another file with the optimal objective function value, also known as the <em>ground state energy</em>.

The code below assumes that you have both files available in the same directory as this notebook. To create them, check the usage of **[Chook](https://github.com/dilinanp/chook)**.


In [None]:
# Get the matrix for the Ising model from the file
instance_filename = 'wishart_planting_N_50_alpha_0.50_inst_1.txt'

# Load the matrix
file_array = np.loadtxt(instance_filename, unpack=True)
rows = file_array[0, :].astype(int)
cols = file_array[1, :].astype(int)
vals = file_array[2, :]

# Create a sparse matrix representing the W in the Wishart model
ising = csr_matrix((vals, (rows, cols)), shape = (50, 50))
ising = (ising + ising.T) / 2. # Note that this is the symmetric form of the Ising problem
ising = ising.A

# Get the ground state energy (first line of gs_energies.txt)
gs_filename = 'gs_energies.txt'
gs_dict = {}
with open(gs_filename) as f:
    line = f.readline().strip().split('\t')
    gs_energy = float(line[1])

# Ising problem
As specified, PySA is an Ising solver. In this sense, it attempts to find the solution to a problem of the form:
$$
\min_{\sigma \in \{ -1,+1 \}^N} H(\sigma) =\min_{\sigma \in \{ -1,+1 \}^N} \sum_{(ij) \in E(G)} J_{ij}\sigma_i\sigma_j + \sum_{i \in V(G)}h_i\sigma_i + c_I
$$
where we optimize over spins $\sigma \in \{ -1,+1 \}^N$, on a constrained graph $G(V,E)$, where the quadratic coefficients are $J_{ij}$ and the linear coefficients are $h_i$. We also include an arbitrary offset of the Ising model $c_I$.

We consider that the problem is already specified and will obviate the notation $(ij) \in E(G)$ and $i \in V(G)$ where it's apparent, mainly in sums indices.

# Specifying the temperatures for PySA
PySA receives several parameters, including the number of replicas and sweeps to be executed. Besides, minimum and maximum temperatures are also used to define the different replicas for PySA through a geometric series, where each value is assigned to a replica.
Determining the optimal temperatures is challenging; their values decide the solver's performance, and the optimal values depend on the problem to be solved.

There is a way of determining these temperatures by parameterizing them using the nonzero coefficients in the Ising model $J$ and $h$.

In the cold limit, the temperature should reflect a small chance of having a single flip in the variable values, as we want to avoid being stuck between small energy tweaks.
Given a Metropolis update, the probability of changing a single variable value / exciting a single spin becomes
$$
p^{cold} = \sum_{i \in N} \exp \left( -\frac{\Delta E^{min}_i}{T^{cold}} \right) .
$$
Computing the minimum energy change $\Delta E^{min}_i$ that each spin can experiment with a single flip would require solving a combinatorial problem.
$$
\Delta E^{min}_i = \min_{\delta\sigma \in \{ -1,0,1 \}^{N}, |\delta\sigma|_0 = 1} \sum_{j \mid (ij) \in E(G)} 2J_{ij} \sigma_i\delta\sigma_j + 2\sum_{j \in V(G)}h_j\delta\sigma_j ,
$$
hence an approximate solution is taken. Here the change of energy is given by two times the minimum effective field felt by each spin:
$$
\Delta E^{min}_i \sim \Delta E^{cold}_i = 2 \min \left[ \min_{j \mid J_{ij} \neq 0}|J_{ij}|, |h_i| \mid h_i \neq 0 \right].
$$
This allows us to approximate the probability at the cold limit as
$$
p^{cold} = N^{min gap}\exp \left( -\frac{\Delta E^{cold}}{T^{cold}} \right),
$$
where the $\Delta E^{cold}$ becomes
$$
\Delta E^{cold} = \min_i \Delta E^{cold}_i,
$$
and a correction for scaling the minimum energy can be given by the number of spins that reflect that minimum effective field, e.g.,
$$
N^{min gap} = \sum_{i \mid E^{cold}_i = E^{cold}} 1.
$$

The cold transition probability usually takes as a small value, e.g., $p_{cold}=0.01$ (1%).

For the hot limit, we want to find a temperature where all variable value flips have a large chance of happening, as we want to make large steps at that limit to enhance exploration.
We use a temperature that would assure the most unlikely of the transitions to happen by overcoming the maximum energy difference.
Using a Metropolis update for this transition, we obtain that it is bounded by the probability
$$
p^{hot} = \exp \left( -\frac{\Delta E^{max}}{T^{hot}} \right) .
$$

The maximum energy difference can also be computed through a combinatorial problem, although here, we take an approximation as the worst case of the effective field
$$
\Delta E^{max} \sim \Delta E^{hot} = 2 \max_i \left[ \sum_{j}|J_{ij}| + |h_i| \right].
$$

The hot transition probability usually takes as a large value, e.g., $p^{hot}=0.5$ (50%).

We present a function to compute these $\Delta E$ to compute the temperatures later.

In [None]:
def get_delta_e_ising(J, h, scaling_correction = True):
    """
    This function computes the hot and cold deltas of Energy for a given Ising problem.
    For the cold delta, it provides a count of the number of variables with the minimum mean field energy.
    This assume the symmetric matrix form for J, and h represented as a vector

    Parameters
    ----------
    J : np.ndarray
        The J matrix of the Ising problem
    h : np.ndarray
        The h vector of the Ising problem
    scaling_correction : bool, optional
        Whether to apply the scaling correction to the delta_e_cold, by default True

    Returns
    -------
    delta_e_hot : float
        The delta of Energy at the hot temperature limit
    delta_e_cold : float
        The delta of Energy at the cold temperature limit
    count_min_i : int
        The number of variables with the minimum mean field energy
    """
    max_mean_field = np.max(np.sum(np.abs(J), axis = 1) + np.abs(h))
    delta_e_hot = 2 * max_mean_field

    min_mean_field = np.minimum(np.min(np.abs(J[J != 0]), initial=np.inf), np.min(np.abs(h[h != 0]), initial=np.inf))
    min_mean_field_ji = np.where(abs(J) != 0, abs(J), np.inf).min(axis=1)
    min_mean_field_hi = np.where(abs(h) != 0, abs(h), min_mean_field_ji)
    if scaling_correction:
        count_min_i = np.sum(np.minimum(min_mean_field_ji, min_mean_field_hi) == min_mean_field)
    else:
        count_min_i = 1
    delta_e_cold = 2 * min_mean_field

    return delta_e_hot, delta_e_cold, count_min_i

## Quadratic Unconstrained Binary Optimization
Although PySA can handle Quadratic Unconstrained Binary Optimization (QUBO) problems natively using the argument `problem_type='qubo'`, for the $\Delta E$ computation, we instead transform the coefficients of the QUBO problem into an Ising model and then call the function previously designed.
This transformation is based on the problem definition::
$$
\min_{x \in \{0,1 \}^N} \sum_{(ij) \in E(G)} Q_{ij}x_i x_j + \sum_{i \in V(G)}Q_{ii}x_i + c_Q = \min_{x \in \{0,1 \}^n}  x^\top Q x + c_Q
$$
where we optimize over binary variables $x \in \{ 0,1 \}^N$, on a constrained graph $G(V,E)$ defined by an adjacency matrix $Q$. We also include an arbitrary offset  $c_Q$.

The transformation is a linear mapping of $x \in \{ 0,1 \}^n \to \sigma \in \{ -1,1 \}^N$ by setting $\sigma_i = 2x_i - 1$. The remaining coefficient mappings follow from this definition and are implemented below.

In [None]:
def get_delta_e_qubo(Q):
    """
    This function computes the hot and cold deltas of Energy for a given QUBO problem.
    It transform the problem into a Ising problem and then use the get_delta_e_ising function.
    This assume the symmetric matrix form for Q
    
    Parameters
    ----------
    Q : np.ndarray
        The Q matrix of the QUBO problem

    Returns
    -------
    delta_e_hot : float
        The delta of Energy at the hot temperature limit
    delta_e_cold : float
        The delta of Energy at the cold temperature limit
    count_min_i : int
        The number of variables with the minimum mean field energy
    """
    J = (Q - np.diag(np.diag(Q.A))) / 4.
    h = np.ones(Q.shape[0]) @ Q / 2.
    return get_delta_e_ising(J, h)

## Temperature computation for Wishart problem
With these functions, the range for the temperature is defined as $T \in \left[ -\frac{\Delta E^{cold}}{\log(p^{cold}/N^{mingap})} ,  -\frac{\Delta E^{hot}}{\log(p^{hot})} \right]$.
We evaluate those temperatures using the functions above to provide them to PySA as parameters.

In [None]:
# Define transition probabilities (in percentages)
phot = 50.
pcold = 1.

delta_e_hot, delta_e_cold, count_min_i = get_delta_e_ising(ising, np.zeros(ising.shape[0]))
max_temp = - delta_e_hot / np.log(phot / 100.)
min_temp = - delta_e_cold / np.log(pcold / 100. / count_min_i)

## Time To Solution performance metric
The performance metrics when evaluating PySA might be contradictory. On one hand you would like to obtain a large probability of finding a right solution (the definition of right comes from what you define as success). On the other hand, the time it takes to solve these cases should be as small as possible.
This is why we are interested in a metric that combines both, and that is why we settle on the Time To Solution (TTS) which is defined as
$$
TTS = t\frac{\log{1-s}}{\log{1-p}},
$$
where $t$ is the mean runtime, $s$ is a success factor, usually takes as $s = 99\%$, and $p$ is the success probability, usually accounted as the observed/empirical success probability.

One usually reads this as the time to solution within 99\% probability.

We provide a function to compute this $TTS_{99\%}$ given the runtime of the algorithm, the observed energies returned by PySA, the ground state energy, a tolerance for determining what we call success (in this case a relative difference with the ground state), the s value, and a value to place when not a single observation satisfied the success threshold.

In [None]:
def tts_objective_fcn(energies, mean_runtime, gs_energy, s=0.99, opt_gap=0.05, fail_value=1e10):
    """
    This function computes the time-to-solution for a given set of energies given a mean-runtime and the ground state energy.
    It is based on the following paper:
    https://arxiv.org/pdf/1905.10876.pdf

    Parameters
    ----------
    energies : np.ndarray
        The energies of the samples
    mean_runtime : float
        The mean runtime of the samples
    gs_energy : float
        The ground state energy of the problem
    s : float, optional
        The success probability, by default 0.99
    opt_gap : float, optional
        The optimality gap, by default 0.05
    fail_value : float, optional
        The value to return if the success probability is 0, by default 1e10

    Returns
    -------
    float
        The time-to-solution value
    """
    p_succ = np.mean(energies <= gs_energy * (1. -  opt_gap))
    print("Probability of success:", p_succ)

    if p_succ == 0.:
        return fail_value
    elif p_succ >= s:
        return mean_runtime
    else:
        tts = mean_runtime *  np.log(1 - s) / np.log(1 - p_succ)
        print("Time-to-solution:", tts, "seconds")
        return tts


## Defining the PySA run
These functions now allow us to run PySA. We created a wrapper function such that we can compute the $TTS_{99\%}$ directly from the outputs and interface it with the hyperparameter optimization library Hyperopt.

In [None]:

def run_pysa(ising_model, fixed_params, tuned_params, gs_energy):
    """
    This function runs PySA with the given parameters and returns the time-to-solution value.

    Parameters
    ----------
    ising_model : np.ndarray
        The Ising matrix of the problem
    fixed_params : dict
        The fixed parameters for PySA
    tuned_params : dict
        The tuned parameters for PySA
    gs_energy : float
        The ground state energy of the problem

    Returns
    -------
    dict
        The result of the PySA run

    Note: Fixed_params, tuned_params, and kwargs keys need to match the pysa arguments, e.g., 
        num_sweeps: int, 
        num_reads: int = 1,
        num_replicas: int = None,
        temps: np.ndarray = None,
        min_temp: float = 0.3,
        max_temp: float = 1.5,
        update_strategy: str = 'random',
        initialize_strategy: str = 'random',
        init_energies: List[float] = None,
        recompute_energy: bool = False,
        sort_output_temps: bool = False,
        return_dataframe: bool = True,
        parallel: bool = True,
        use_pt: bool = True,
        send_background: bool = False,
        verbose: bool = False
    """
    # Combine fixed and tuned parameters
    joint_params = fixed_params
    joint_params.update(tuned_params)

    # Fix types of num_sweeps and num_replicas
    joint_params['num_sweeps'] = int(joint_params['num_sweeps'])
    joint_params['num_replicas'] = int(joint_params['num_replicas']) 

    solver = Solver(problem=ising_model, problem_type='ising', float_type='float32')
    result = solver.metropolis_update(**joint_params)
    energies =  2 * result['best_energy'][1:] # Note: energy is 1/2 * state @ ising @ state
    mean_runtime = 1e-6 * result['runtime (us)'][1:].mean()
    print(tuned_params)
    tts = tts_objective_fcn(energies, mean_runtime, gs_energy)

    return {
        'loss': tts,
        'status': STATUS_OK,
        'num_sweeps': joint_params['num_sweeps'],
        'num_replicas': joint_params['num_replicas'],
        # -- store other results like this
        'result': result,
        }

## Executing PySA
We can fix certain parameters and allow others to be (later) modified. Moreover, we provide the ground state energy for the internal computation of $TTS$ within our wrapper.

In [None]:
fixed_params = {'min_temp' : min_temp,
                'max_temp' : max_temp,
                'num_reads' : 1001,
                'update_strategy' : 'random',
                'recompute_energy' : True,
                'sort_output_temps' : True,
                'parallel' : True,
                'use_pt' : True,
                'verbose' : False}

tuned_params = {'num_sweeps' : 100,
                'num_replicas' : 4}

run_pysa(ising_model=ising, fixed_params=fixed_params, tuned_params=tuned_params, gs_energy=gs_energy)

## Hyperparameter optimization
As it can be seen, fixing the values of certain parameters yields an $TTS_{99\%}$, which we wish to optimize. Therefore, we use Hyperopt, and its algorithm of the tree of Parzen to perform this hyperparameter optimization.

In [None]:
# Define the hyperparameter search space
tuned_params_space = {
    'num_sweeps': hp.qloguniform('num_sweeps', 0, 4, 1), # loguniform between 1 and 1000
    'num_replicas': hp.quniform('num_replicas', 1, 16, 1), # uniform between 1 and 16
    }

#define the hyperopt objective function
objective = lambda tuned_params : run_pysa(ising, fixed_params, tuned_params, gs_energy)
trials = Trials()
best_params = fmin(fn = objective,
                space=tuned_params_space,
                algo=tpe.suggest,
                max_evals=50,
                trials=trials)
        


After performing the hyperparameter optimization, we report the best found values.

In [None]:
best_hyperparams = trials.argmin
best_tts = trials.best_trial['result']['loss']
best_iter = trials.best_trial['tid']
print("Best hyperparameters:", best_hyperparams)
print("Best time-to-solution:", best_tts, "seconds")
print("Best iteration:", best_iter)

Finally, we present a plot of the advance of the hyperparameter optimization algorithm.

In [None]:
losses = np.array(trials.losses())
f, ax = plt.subplots()
ax.plot(
    np.ma.masked_where(losses > 1e9, losses),
    label='TTS')
ax.axhline(y=best_tts, color='r', linestyle='--', label='Best TTS')
ax.set_xlabel('Hyperopt iteration')
ax.set_ylabel('TTS_99 (s)')
ax.legend()
ax.title.set_text('TTS_99 vs. Hyperopt iteration')

## Pretty plots
For a more advanced plots, you would need to import Pandas and Plotly. Then a contour plot of the experiments will be shown below.

In [None]:
import plotly.graph_objects as go
import pandas as pd

trials_df = pd.DataFrame(trials.results)
trials_df["trial_number"] = trials_df.index
filter = (
    (trials_df['loss'] < 5e10)
)
# plotly express does not support contour plots so we will use `graph_objects` instead. `go.Contour
# automatically interpolates "z" values for our loss.
fig = go.Figure(
    data=[
    go.Contour(
        z=np.log10(trials_df.loc[filter, "loss"]),
        x=trials_df.loc[filter, "num_sweeps"],
        y=trials_df.loc[filter, "num_replicas"],
        contours=dict(
            showlabels=True,  # show labels on contours
            # label font properties
            labelfont=dict(size=12, color="white",),
        ),
        colorbar=dict(title="log10(TTS (s))", titleside="right",),
        connectgaps=True,
        hoverinfo='skip',
        hoverongaps=False,
    ),
    go.Contour(
        name='Explored values',
        z=trials_df.loc[filter, "loss"],
        x=trials_df.loc[filter, "num_sweeps"],
        y=trials_df.loc[filter, "num_replicas"],
        connectgaps=False,
        showscale=False,
        colorbar=None,
        colorscale='greys',
        hoverongaps=False,
        showlegend=False,
        hovertemplate="TTS: %{z:.2r} s<br>sweeps: %{x}<br>replicas: %{y}<extra></extra>",
    ),
    ]
)
fig.update_layout(
    xaxis_title="sweeps",
    yaxis_title="replicas",
    title={
        "text": "TTS vs. sweeps and replicas | pcold == 1, phot == 50",
        "xanchor": "center",
        "yanchor": "top",
        "x": 0.5,
    },
)