# Tutorial for evaluating the experiments

In total, we provide three pre-trained models for the 3-SAT dataset and six pre-trained models for the pseudo-industrial datasets. They found in the folders `../../Data/models/random_3SAT/` and `../../Data/models/pseudo-industrial/`, respectively. The following table summarizes the characteristics of the models.

| Path of the pre-trained model        | Description of model                                                                                                    |
|----------------------------------|-----------------------------------------------------------------------------------------------------------------|
| `../../Data/models/random_3SAT/3_SAT_LLL.npy`| model trained on random 3-SAT using only the LLL-Loss     |
| `../../Data/models/random_3SAT/3_SAT_Gibbs.npy`|  model trained on random 3-SAT using only the Gibbs-Loss |
| `../../Data/models/random_3SAT/3_SAT_Gibbs_LLL.npy`|model trained on random 3-SAT using both the LLL-Loss and the Gibbs-Loss |
| `../../Data/models/pseudo_industrial/g4sat_easy_ca.npy`| model trained on easy CA instances using only the Gibbs-Loss     |
| `../../Data/models/pseudo_industrial/g4sat_medium_ca.npy`| model trained on medium CA instances using only the Gibbs-Loss     |
| `../../Data/models/pseudo_industrial/g4sat_hard_ca.npy`| model trained on hard CA instances using only the Gibbs-Loss     |
| `../../Data/models/pseudo_industrial/g4sat_easy_ps.npy`| model trained on easy PS instances using only the Gibbs-Loss     |
| `../../Data/models/pseudo_industrial/g4sat_medium_ps.npy`| model trained on medium PS instances using only the Gibbs-Loss     |
| `../../Data/models/pseudo_industrial/g4sat_hard_ps.npy`| model trained on hard PS instances using only the Gibbs-Loss     |

The corresponding hyperparameters used for the training are specified in the corresponding config files in `../../experiments/configs/`.

## 1) Experiments for random 3-SAT

### 1.1) Load a model and run the MT algorithm and WalkSAT on it

Let us start by the full-oracle version that uses the oracle both for initialization and for updating.

In [None]:
from evaluate_with_given_params import load_model_and_test

data_path = r"../../Data/random_sat_data/test/" #tbf with path of the evaluation dataset
!mkdir ../../Data/trajectories/ #create a folder to save the trajectories
!mkdir ../../Data/trajectories/random_3SAT/ #create a folder to save the trajectories

trajectories_path = r"../../Data/trajectories/random_3SAT/"

#put the paths of the models you want to evaluate
model_paths = [ # "uniform" #for uniform algorithm,
                # or path of the models you want to evaluate
                # we used the following:
                "uniform",
                "../../Data/models/random_3SAT/3_SAT_LLL.npy",
                "../../Data/models/random_3SAT/3_SAT_Gibbs_LLL.npy", 
                "../../Data/models/random_3SAT/3_SAT_Gibbs.npy",
                ]
# you can put multiple elements in this list! 
# in case you train your own models, please adjust the filename accordingly.

n_steps = 1000000 
n_runs = 100

# We start by running the evaluation using the MT algorithm

for model_path in model_paths:
    if model_path != "uniform":
        path_save = trajectories_path + "3SATmoser" + model_path.split("/")[-1][:-4]
    else:
        path_save = trajectories_path + "3SATmoser" + model_path
    total_array2 = load_model_and_test(
                    data_path,
                    model_path,
                    n_steps,
                    n_runs,
                    "moser", #moser for MT-algorithm or probsat for oracle WalkSAT
                    path_save=path_save,
                    keep_traj=True,
                    pre_compute_mapping=True,
                    prob_flip_best=0,
                )

# and then for the oracle WalkSAT algorithm

for model_path in model_paths:
    if model_path != "uniform":
        path_save = trajectories_path + "3SATprobsat" + model_path.split("/")[-1][:-4]
    else:
        path_save = trajectories_path + "3SATprobsat" + model_path
    total_array2 = load_model_and_test(
                    data_path,
                    model_path,
                    n_steps,
                    n_runs,
                    "probsat", #moser for MT-algorithm or probsat for oracle WalkSAT
                    path_save=path_save,
                    keep_traj=True,
                    pre_compute_mapping=True,
                    prob_flip_best=0,
                )

We can also only use the oracle for initialization. We do this by the following lines of code:

In [None]:

from evaluate_with_given_params import load_model_and_test_two_models

data_path = r"../../Data/random_sat_data/test/" #tbf with path of the evaluation dataset
!mkdir ../../Data/trajectories/random_3SAT/ #create a folder to save the trajectories
trajectories_path = r"../../Data/trajectories/random_3SAT/"

#put the paths of the models you want to evaluate
model_path_resample = "uniform"
model_path_initialize = "../../Data/models/random_3SAT/3_SAT_Gibbs_LLL.npy"

# in case you train your own models, please adjust the filename accordingly.

n_steps = 1000000
n_runs = 100
if model_path_initialize != "uniform":
        path_save = trajectories_path + "3SATmoser_initialize_only" + model_path_initialize.split("/")[-1][:-4]
else:
        path_save = trajectories_path + "3SATmoser_initialize_only" + model_path_initialize
total_array2 = load_model_and_test_two_models(
                        data_path,
                        model_path_initialize,
                        model_path_resample,
                        n_steps,
                        n_runs,
                        "moser", #moser for MT-algorithm or probsat for oracle WalkSAT
                        path_save=path_save,
                        keep_traj=True,
                        pre_compute_mapping=True,
                        prob_flip_best=0,
                    )


if model_path_initialize != "uniform":
        path_save = trajectories_path + "3SATprobsat_initialize_only" + model_path_initialize.split("/")[-1][:-4]
else:
        path_save = trajectories_path + "3SATprobsat_initialize_only" + model_path_initialize
total_array2 = load_model_and_test_two_models(
                        data_path,
                        model_path_initialize,
                        model_path_resample,
                        n_steps,
                        n_runs,
                        "probsat", #moser for MT-algorithm or probsat for oracle WalkSAT
                        path_save=path_save,
                        keep_traj=True,
                        pre_compute_mapping=True,
                        prob_flip_best=0,
                    )

### 1.2) Plot the trajectories from the evaluation above

#### 1.2.1) Define helper functions

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
import pandas as pd
import os
# Set global plot parameters
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['legend.fontsize'] = 12
plt.rcParams['text.usetex'] = True
plt.rcParams["font.family"] = "Times New Roman"

def do_evaluation_multiple_models_NEURIPS_legend(saved_eval_path_list_multiple, color, legend, average_alpha = False, plot_save = False, min_max = False):
    """
    saved_eval_path_list_multiple: list of lists of paths to the saved evaluation files (i.e. trajectories.npy files from above) with the first list being for MT and the second for WalkSAT
    color: list of colors for the models (i.e. the colors of the models in the same order as the saved_eval_path_list that appear in the figure)
    average_alpha: if False, the each instance is plotted in a scatter plot. Otherwise, the median accross instances and runs.
    legend: list of names of the models (i.e. the names of the models in the same order as the saved_eval_path_list that appear in the figure legend)
    plot_save: if False, the plot is not saved. Otherwise, the path to save the plot.
    min_max: if False, only the median is plotted. Otherwise, the min and max are also plotted.
    """
    fig, ax = plt.subplots(2, 2, figsize=(7, 6))
    def plot_number_of_steps(idx, total_steps, alpha_array, color, average_alpha = False, legend = False):
        label = legend
        if average_alpha == False:
            ax[idx, 0].scatter([alpha_array],[total_steps], alpha = 0.8, label = label) #model_path.split("/")[-1])
        else:
            x, x_vary = average_alpha
            y = []
            for i in x:
                relevant_mask = np.where(abs(i-alpha_array) < x_vary, 1, 0)
                relevant_steps = total_steps*relevant_mask
                relevant_steps = relevant_steps[relevant_steps>0]
                relevant_alpha = alpha_array*relevant_mask
                relevant_alpha = relevant_alpha[relevant_alpha>0]
                y.append(np.median(relevant_steps))
            ax[idx, 0].plot(x,y, ".-", alpha = 0.8, color = color, label = label)

    def plot_trajectory_solved_instances(idx, total_steps, energies_array, color, legend = False):
        min_steps = np.amin(total_steps, axis = 1)
        median_steps = np.median(total_steps, axis = 1)
        max_steps = np.amax(total_steps, axis = 1)
        lines = []
        max_plot_steps = len(energies_array) - 1
        for steps, linewidth, alpha, linestyle in zip([min_steps, median_steps, max_steps], 
                                                    [1, 1.5,1], 
                                                    [1, 1, 1],
                                                    ['dotted', 'solid', 'dashed']):
            steps = np.array(steps, dtype=int)
            bins = np.logspace(0, np.log10(max_plot_steps), num=10000)
            counts, bins = np.histogram(steps, bins=bins)
            cumulative_counts = np.cumsum(counts)
            percentages = cumulative_counts / len(steps) * 100
            if linestyle == 'solid':
                line, = ax[idx, 1].plot(bins[1:], percentages, color=color, linewidth=linewidth, alpha=alpha, linestyle=linestyle, label=legend)
                lines.append(line)
            else:
                if min_max == True:
                    line, = ax[idx, 1].plot(bins[1:], percentages, color=color, linewidth=linewidth, alpha=alpha, linestyle=linestyle)
                    lines.append(line)


    for idx, saved_eval_path_list in enumerate(saved_eval_path_list_multiple):
        for i in range(len(saved_eval_path_list)):
            saved_eval_path = saved_eval_path_list[i]
            c = color[i]
            l = legend[i]
            _ ,_ , _, alpha_array, _, energies_array_median,total_steps = np.load(saved_eval_path, allow_pickle=True)
            total_steps = total_steps + np.ones(np.shape(total_steps))
            total_steps_median = np.median(total_steps, axis = 1)
            plot_number_of_steps(idx, total_steps_median, alpha_array, c, average_alpha=average_alpha, legend=l)
            if len(energies_array_median)!= 0:
                plot_trajectory_solved_instances(idx, total_steps, energies_array_median, c, legend = l)
            if len(saved_eval_path_list) -1 == i:
                max_steps = len(energies_array_median)
        ax[idx, 0].hlines(max_steps, np.min(alpha_array), np.max(alpha_array), color = "gray", linestyle = "dashed")
        ax[idx, 0].set_yscale("log")
        ax[idx, 0].set_ylabel(r"$\# \mathrm{steps}$")
        ax[idx, 0].set_xlabel(r"$\alpha$")

        ax[idx, 1].set_xscale("log")
        ax[idx, 1].set_ylabel(r"$\%_{\mathrm{model}}$")
        ax[idx, 1].set_xlabel(r"$\# \mathrm{steps}$")
        ax[idx, 1].hlines(100, 1, max_steps, color = "gray", linestyle = "dashed")

        title_sub = "WalkSAT" if idx == 1 else "MT"
        ax[idx,0].text(0.1, 0.90, title_sub, transform=ax[idx,0].transAxes, verticalalignment='top', fontsize=18)
        ax[idx,1].text(0.1, 0.90, title_sub, transform=ax[idx,1].transAxes, verticalalignment='top', fontsize=18)

    ax_legend = fig.add_axes([0.1, -0.12, 0.8, 0.1])
    ax_legend.axis('off')
    handles, labels = ax[0, 1].get_legend_handles_labels()
    min_line = mlines.Line2D([], [], color="gray", linewidth=1, alpha=1, linestyle='dotted', label="using minimum steps across runs")
    median_line = mlines.Line2D([], [], color="gray", linewidth=1.5, alpha=1, linestyle='solid', label='using median steps across runs')
    max_line = mlines.Line2D([], [], color="gray", linewidth=1, alpha=1, linestyle='dashed', label='using maximum steps across runs')
    if min_max == True:
        handles.extend([min_line, median_line, max_line])
        labels.extend([min_line.get_label(), median_line.get_label(), max_line.get_label()])
    else:
        handles.extend([median_line])
        labels.extend([median_line.get_label()])
    ax_legend.legend(handles, labels, loc='center', ncol = 2)
    plt.subplots_adjust(bottom=0.7)
    
    fig.tight_layout()
    if plot_save:
        plt.savefig(plot_save + ".pdf", dpi = 500, format = "pdf", bbox_inches='tight')
        print("saved")
    plt.show()

def get_statistics(model_paths, names):
        """
        model_paths: list of paths to the models (i.e. trajectories.npy files from above)
        names: list of names of the models (i.e. the names of the models in the same order as the model_paths that appear in the figure legend)
        """
        df = pd.DataFrame(columns=["Name", "Mean", "Median", "Percentage Median", "Percentage best", "Percentage worst"])
        for i in range(len(model_paths)):
            if os.path.isfile(model_paths[i]):
                array = np.load(model_paths[i], allow_pickle=True)
                alpha_list = array[3]
                max_steps_algo = len(array[5]) - 1
                total_steps = array[6] + np.ones(array[6].shape)
                median_steps = np.median(np.median(total_steps,axis = 1))
                mean_steps = np.mean(np.mean(total_steps,axis = 1))
                median_instances = np.median(total_steps,axis = 1)
                percentage_median = np.sum(np.where(np.median(total_steps,axis = 1) < max_steps_algo,1,0)) / len(np.median(total_steps,axis = 1)) * 100
                percentage_min = np.sum(np.where(np.amin(total_steps,axis = 1) < max_steps_algo,1,0)) / len(np.mean(total_steps,axis = 1)) * 100
                percentage_max = np.sum(np.where(np.amax(total_steps,axis = 1) < max_steps_algo,1,0)) / len(np.mean(total_steps,axis = 1)) * 100
                df.loc[i] = [names[i], mean_steps, median_steps, percentage_median, percentage_min, percentage_max]
        df[['Mean', 'Median']] = df[['Mean', 'Median']].applymap("{:.2e}".format)
        return df

#### 1.2.2) Plot the results
We compare three different variants of both the MT algorithm and WalkSAT, namely 
- the original version using the uniform oracle,
- a "hybrid" version using the trained oracle just for initialization and then switches to uniform updating and
- the full oracle-based algorithm using the oracle for both initialization and updating. 


In [None]:
import seaborn as sns
saved_eval_path_list_MT=  [
                        "../../Data/trajectories/random_3SAT/3SATmoseruniform.npy",
                        "../../Data/trajectories/random_3SAT/3SATmoser_initialize_only3_SAT_Gibbs_LLL.npy",
                        "../../Data/trajectories/random_3SAT/3SATmoser3_SAT_Gibbs_LLL.npy", 
                        ]

saved_eval_path_list_WalkSAT =  [
                        "../../Data/trajectories/random_3SAT/3SATprobsatuniform.npy",
                        "../../Data/trajectories/random_3SAT/3SATprobsat_initialize_only3_SAT_Gibbs_LLL.npy",
                        "../../Data/trajectories/random_3SAT/3SATprobsat3_SAT_Gibbs_LLL.npy", 
                        ]
legend = ["uniform algorithm", "``hybrid'' algorithm", "full-oracle algorithm"]

color = sns.color_palette("Set2")
x = np.linspace(1, 4.4, 71) #specify here the range of alpha values and the spacing you want to average over
x_vary = 0.1 #specify here the range of alpha values you want to average over
plot_save = "../../Data/plots/3SAT_main_plot" #specify here, where you want to save the plot
do_evaluation_multiple_models_NEURIPS_legend([saved_eval_path_list_MT, saved_eval_path_list_WalkSAT], color, average_alpha = (x,x_vary), plot_save = plot_save, legend = legend, min_max=True)


print("MT")
df = get_statistics(saved_eval_path_list_MT, legend)
print(df.to_string(index=False))

print("WalkSAT")
df = get_statistics(saved_eval_path_list_WalkSAT, legend)
print(df.to_string(index=False))


Since our loss consists of two terms (that we weigh equally in the experiments), an obvious question to ask is whether both of them contribute to the improvement in the performance of the algorithm. We compare the uniform variant with the full boosted variant, with the model being trained on only one, the other, and both loss terms. 

In [None]:
import seaborn as sns
saved_eval_path_list_MT=  [
                        "../../Data/trajectories/random_3SAT/3SATmoseruniform.npy",
                        "../../Data/trajectories/random_3SAT/3SATmoser3_SAT_Gibbs_LLL.npy", 
                        "../../Data/trajectories/random_3SAT/3SATmoser3_SAT_Gibbs.npy",
                        "../../Data/trajectories/random_3SAT/3SATmoser3_SAT_LLL.npy",
                        ]
saved_eval_path_list_WalkSAT=  [
                        "../../Data/trajectories/random_3SAT/3SATprobsatuniform.npy",
                        "../../Data/trajectories/random_3SAT/3SATprobsat3_SAT_Gibbs_LLL.npy", 
                        "../../Data/trajectories/random_3SAT/3SATprobsat3_SAT_Gibbs.npy",
                        "../../Data/trajectories/random_3SAT/3SATprobsat3_SAT_LLL.npy",
                        ]
legend = ["uniform oracle" , "oracle trained using Gibbs- + LLL-Loss", "oracle trained using only Gibbs-Loss", "oracle trained using only LLL-Loss"]

color = sns.color_palette("Set2")
x = np.linspace(1, 4.4, 71) #specify here the range of alpha values and the spacing you want to average over
x_vary = 0.1 #specify here the range of alpha values you want to average over
plot_save = "../../Data/plots/3SAT_different_loss_terms" #specify here, where you want to save the plot
do_evaluation_multiple_models_NEURIPS_legend([saved_eval_path_list_MT, saved_eval_path_list_WalkSAT], color, average_alpha = (x,x_vary), plot_save = plot_save, legend = legend, min_max = True)

print("MT")
df = get_statistics(saved_eval_path_list_MT, legend)
print(df.to_string(index=False))

print("WalkSAT")
df = get_statistics(saved_eval_path_list_WalkSAT, legend)
print(df.to_string(index=False))

## 2.) Experiments on pseudo-industrial datasets

### 2.1) Load a model and run the WalkSAT on it
As above, we start by running WalkSAT with the trained models on the datasets. The code below also does all the cross-evaluations. If you want to avoid this, please change the code accordingly.

In [None]:

from evaluate_with_given_params import load_model_and_test

# create the folders to save the trajectories
mkdir! ../../Data/trajectories/g4sat_easy/ca/
mkdir! ../../Data/trajectories/g4sat_easy/ps/
mkdir! ../../Data/trajectories/g4sat_medium/ca/
mkdir! ../../Data/trajectories/g4sat_medium/ps/
mkdir! ../../Data/trajectories/g4sat_hard/ca/
mkdir! ../../Data/trajectories/g4sat_hard/ps/

data_base_path = "../../Data/G4SAT/" # fill this with the path to the G4SAT datasets
traj_data = [
            ["../../Data/trajectories/g4sat_easy/ca/", data_base_path + "easy/ca/test/sat/"],
            ["../../Data/trajectories/g4sat_easy/ps/", data_base_path + "easy/ps/test/sat/"],
            ["../../Data/trajectories/g4sat_medium/ca/", data_base_path + "medium/ca/test/sat/"],
            ["../../Data/trajectories/g4sat_medium/ps/", data_base_path + "medium/ps/test/sat/"],
            ["../../Data/trajectories/g4sat_hard/ca/", data_base_path + "hard/ca/test/sat/"],
            ["../../Data/trajectories/g4sat_hard/ps/", data_base_path + "hard/ps/test/sat/"],
            ]
#put the paths of the models you want to evaluate
model_paths = [ 
                "uniform",
                "../../Data/models/pseudo_industrial/g4sat_easy_ca.npy",
                "../../Data/models/pseudo_industrial/g4sat_easy_ps.npy",
                "../../Data/models/pseudo_industrial/g4sat_medium_ca.npy", 
                "../../Data/models/pseudo_industrial/g4sat_medium_ps.npy",
                "../../Data/models/pseudo_industrial/g4sat_hard_ca.npy",
                "../../Data/models/pseudo_industrial/g4sat_hard_ps.npy",
                ]
# in case you train your own models, please adjust the filename accordingly.

n_steps = 10000
n_runs = 100

for i in range(len(traj_data)):
    data_path = traj_data[i][1]
    trajectories_path = traj_data[i][0]
    for j in range(len(model_paths)):
        model_path = model_paths[j]
        if model_path != "uniform":
            path_save = trajectories_path + "probsat" + model_path.split("/")[-1][:-4]
        else:
            path_save = trajectories_path + "probsat" + model_path
        total_array2 = load_model_and_test(
                                    data_path,
                                    model_path,
                                    n_steps,
                                    n_runs,
                                    "probsat", #moser for MT-algorithm or probsat for oracle WalkSAT
                                    path_save=path_save,
                                    keep_traj=False,
                                    pre_compute_mapping=True,
                                    prob_flip_best=0,
                                )


### 2.2) Plot the trajectories from the evaluation above

#### 2.2.1) Define helper functions

In [34]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from collections import Counter
import seaborn as sns
import matplotlib.lines as mlines
import os

# Set global plot parameters
plt.rcParams['axes.labelsize'] = 18
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['legend.fontsize'] = 12
plt.rcParams['text.usetex'] = True
plt.rcParams["font.family"] = "Times New Roman"

def get_evaluation_multiple(models, model_paths, names, title, max_steps, path_save = False):

    def plot_steps_multiple(idx,min_steps, median_steps, max_steps, max_plot_steps, color, model_label):
        lines = []
        for steps, linewidth, alpha, linestyle in zip([min_steps, median_steps, max_steps], 
                                                    [1, 1.5,1], 
                                                    [1, 1, 1],
                                                    ['dotted', 'solid', 'dashed']):
            steps = np.array(steps, dtype=int)
            bins = np.logspace(0, np.log10(max_plot_steps), num=100)
            counts, bins = np.histogram(steps, bins=bins)
            
            cumulative_counts = np.cumsum(counts)
            percentages = cumulative_counts / len(steps) * 100
            if linestyle == 'solid':
                line, = axs[idx,0].plot(bins[:-1], percentages, color=color, linewidth=linewidth, alpha=alpha, linestyle=linestyle, label=model_label)
            else:
                line, = axs[idx,0].plot(bins[:-1], percentages, color=color, linewidth=linewidth, alpha=alpha, linestyle=linestyle)
            lines.append(line)

    def plot_percentage_median_multiple(idx,uniform_steps, model_steps, max_plot_steps, color, model_label):
        steps_uniform_median = np.array(np.median(uniform_steps,axis = 1), dtype=int)
        steps_model_median = np.array(np.median(model_steps, axis = 1), dtype=int)
        bins = np.logspace(0, np.log10(max_plot_steps), num=100)
        counts_uniform_median, bins_uniform_median = np.histogram(steps_uniform_median, bins=bins)
        counts_model_median, bins_model_median = np.histogram(steps_model_median, bins=bins)
        cumulative_counts_uniform_median = np.cumsum(counts_uniform_median)
        cumulative_counts_model_median = np.cumsum(counts_model_median)
        percentages_uniform_median = cumulative_counts_uniform_median / len(steps_uniform_median) * 100
        percentages_model_median = cumulative_counts_model_median / len(steps_uniform_median) * 100
        axs[idx,1].plot(bins[:-1], percentages_model_median - percentages_uniform_median, color=color, linewidth=1.5, alpha=1, linestyle="solid", label=model_label)

    fig, axs = plt.subplots(len(models), 2, figsize=(8, 3*len(models)))  # Create subplots for each model
    for idx, model in enumerate(models):
        colors = sns.color_palette("Set1")
        uniform_array = np.load(model_paths[idx][len(model_paths[idx])-1], allow_pickle=True)
        uniform_steps = uniform_array[6] + np.ones(uniform_array[6].shape)
        #max_steps = len(uniform_array[5]) - 1

        df = pd.DataFrame(columns=["Name", "Mean", "Median", "Percentage Median", "Percentage best", "Percentage worst"])
        for i in range(len(model_paths[idx])):
            if os.path.isfile(model_paths[idx][i]):
                array = np.load(model_paths[idx][i], allow_pickle=True)
                total_steps = array[6] + np.ones(array[6].shape)
                median = np.median(np.median(total_steps,axis = 1))
                mean = np.mean(np.mean(total_steps,axis = 1))
                percentage_median = np.sum(np.median(total_steps,axis = 1) < max_steps) / len(np.median(total_steps,axis = 1)) * 100
                percentage_min = np.sum(np.amin(total_steps, axis = 1) < max_steps) / len(np.mean(total_steps,axis = 1)) * 100
                percentage_max = np.sum(np.amax(total_steps, axis = 1) < max_steps) / len(np.mean(total_steps,axis = 1)) * 100
                df.loc[i] = [names[i], mean, median, percentage_median, percentage_min, percentage_max]    
                plot_steps_multiple(idx, np.amin(total_steps, axis = 1), np.median(total_steps, axis = 1), np.amax(total_steps, axis = 1), max_steps - 1, colors[i], names[i])
            else:
                print("Model not found")
        axs[idx,0].hlines(100,0, max_steps - 1, colors="black", linestyles="dashed")
        axs[idx,0].set_xlabel(r"$\# \mathrm{steps}$")
        axs[idx,0].set_ylabel(r"$\%_{\mathrm{model}}$")
        axs[idx,0].set_xscale("log")
        if idx == 0:
            title_sub = "easy"
        elif idx == 1:
            title_sub = "medium"
        else:
            title_sub = "hard"
        axs[idx,0].text(0.1, 0.90, title_sub, transform=axs[idx,0].transAxes, verticalalignment='top', fontsize = 18)

        for i in range(len(model_paths[idx])):
            if os.path.isfile(model_paths[idx][i]):
                array = np.load(model_paths[idx][i], allow_pickle=True)
                total_steps = array[6] + np.ones(array[6].shape)
                plot_percentage_median_multiple(idx, uniform_steps, total_steps, max_steps -1, colors[i], names[i])
            else:
                print("Model not found")
        axs[idx,1].set_ylabel(r"$\%_{\mathrm{model}} - \%_{\mathrm{uniform}}$")
        axs[idx,1].set_xlabel(r"$\# \mathrm{steps}$")
        axs[idx,1].set_xscale("log")
        axs[idx,1].text(0.1, 0.90, title_sub, transform=axs[idx,1].transAxes, verticalalignment='top', fontsize = 18)
        df[['Mean', 'Median']] = df[['Mean', 'Median']].applymap("{:.2e}".format)
        df_string = df.to_string(index=False)
        # df_string = df.to_latex(index=False)
        # print("/".join(model.split("/")[-3:]))
        print(df_string)

    ax_legend = fig.add_axes([0.1, -0.12, 0.8, 0.1])
    ax_legend.axis('off')
    handles, labels = axs[0, 0].get_legend_handles_labels()
    min_line = mlines.Line2D([], [], color="gray", linewidth=1, alpha=1, linestyle='dotted', label="using minimum steps across runs")
    median_line = mlines.Line2D([], [], color="gray", linewidth=1.5, alpha=1, linestyle='solid', label='using median steps across runs')
    max_line = mlines.Line2D([], [], color="gray", linewidth=1, alpha=1, linestyle='dashed', label='using maximum steps across runs')
    handles.extend([min_line, median_line, max_line])
    labels.extend([min_line.get_label(), median_line.get_label(), max_line.get_label()])
    ax_legend.legend(handles, labels, loc='center', ncol = 2)
    plt.subplots_adjust(bottom=0.7)
    
    fig.suptitle(title, fontsize=20)
    fig.tight_layout()
    if path_save:
        plt.savefig(path_save + ".pdf", dpi = 500, bbox_inches='tight')
    plt.show()

def get_statistics_relative_improvement(model_paths, names, max_steps_algo):
        """
        model_paths: list of paths to the models (i.e. trajectories.npy files from above)
        names: list of names of the models (i.e. the names of the models in the same order as the model_paths that appear in the figure legend)
        """
        df = pd.DataFrame(columns=["Name", "Mean", "Median", "Percentage Median", "Percentage best", "Percentage worst"])
        error = 0
        for i in range(len(model_paths)):
            if os.path.isfile(model_paths[i]):
                array = np.load(model_paths[i], allow_pickle=True)
                alpha_list = array[3]
                # max_steps_algo = len(array[5]) - 1
                total_steps = array[6] + np.ones(array[6].shape)
                median_steps = np.median(np.median(total_steps,axis = 1))
                mean_steps = np.mean(np.mean(total_steps,axis = 1))
                median_instances = np.median(total_steps,axis = 1)
                percentage_median = np.sum(np.where(np.median(total_steps,axis = 1) < max_steps_algo,1,0)) / len(np.median(total_steps,axis = 1)) * 100
                percentage_min = np.sum(np.where(np.amin(total_steps,axis = 1) < max_steps_algo,1,0)) / len(np.mean(total_steps,axis = 1)) * 100
                percentage_max = np.sum(np.where(np.amax(total_steps,axis = 1) < max_steps_algo,1,0)) / len(np.mean(total_steps,axis = 1)) * 100
                df.loc[i] = [names[i], mean_steps, median_steps, percentage_median, percentage_min, percentage_max]
            else: 
                error = 1
        if error == 0:
            df.loc[2] = ["relative_improvement", ((df.loc[0]["Mean"] - df.loc[1]["Mean"])/ df.loc[1]["Mean"]) * 100, ((df.loc[0]["Median"] - df.loc[1]["Median"])/ df.loc[1]["Median"]) * 100, ((df.loc[0]["Percentage Median"] - df.loc[1]["Percentage Median"])/ df.loc[1]["Percentage Median"]) * 100, ((df.loc[0]["Percentage best"] - df.loc[1]["Percentage best"])/ df.loc[1]["Percentage best"]) * 100, ((df.loc[0]["Percentage worst"] - df.loc[1]["Percentage worst"])/ df.loc[1]["Percentage worst"]) * 100]

        df[['Mean', 'Median']] = df[['Mean', 'Median']].applymap("{:.2e}".format)
        return df

#### 2.2.2) Plot the results

We plot the cross-benchmark results and the corresponding statistics on the CA datasets.

In [None]:
names = [
        "WalkSAT with oracle trained on CA (easy)",
        "WalkSAT with oracle trained on CA (medium)",
        "WalkSAT with oracle trained on CA (hard)",
        "WalkSAT with oracle trained on PS (easy)",
        "WalkSAT with oracle trained on PS (medium)",
        "WalkSAT with oracle trained on PS (hard)",
        "uniform WalkSAT"
        ]

traj_names = [
        "probsatg4sat_easy_ca.npy",
        "probsatg4sat_medium_ca.npy",
        "probsatg4sat_hard_ca.npy",
        "probsatg4sat_easy_ps.npy",
        "probsatg4sat_medium_ps.npy",
        "probsatg4sat_hard_ps.npy",
        "probsatuniform.npy",
]    

models = [
        "../../Data/trajectories/g4sat_easy/ca/",
        "../../Data/trajectories/g4sat_medium/ca/",
        "../../Data/trajectories/g4sat_hard/ca/",
]

model_paths = []
for model in models:
    model_paths.append([model + traj_name for traj_name in traj_names])
plot_save = "../../Data/plots/pseudo_industrial_CA_cross_benchmark"
title = "CA datasets"
get_evaluation_multiple(models, model_paths, names, title, path_save = plot_save, max_steps = 10**6 - 1)

and the same on the PS datasets.

In [None]:
names = [
        "WalkSAT with oracle trained on CA (easy)",
        "WalkSAT with oracle trained on CA (medium)",
        "WalkSAT with oracle trained on CA (hard)",
        "WalkSAT with oracle trained on PS (easy)",
        "WalkSAT with oracle trained on PS (medium)",
        "WalkSAT with oracle trained on PS (hard)",
        "uniform WalkSAT"
        ]

traj_names = [
        "probsatg4sat_easy_ca.npy",
        "probsatg4sat_medium_ca.npy",
        "probsatg4sat_hard_ca.npy",
        "probsatg4sat_easy_ps.npy",
        "probsatg4sat_medium_ps.npy",
        "probsatg4sat_hard_ps.npy",
        "probsatuniform.npy",
]    

models = [
        "../../Data/trajectories/g4sat_easy/ps/",
        "../../Data/trajectories/g4sat_medium/ps/",
        "../../Data/trajectories/g4sat_hard/ps/",
]

model_paths = []
for model in models:
    model_paths.append([model + traj_name for traj_name in traj_names])
plot_save = "../../Data/plots/pseudo_industrial_PS_cross_benchmark"
title = "PS datasets"
get_evaluation_multiple(models, model_paths, names, title, max_steps = 10**6 - 1, path_save = plot_save)

If you are only interested in the statistics of two models and you want to see the improvement over each other, you can get them in the following way:

In [None]:
traj_paths = [
        "probsatg4sat_easy_ca.npy",
        "probsatg4sat_medium_ca.npy",
        "probsatg4sat_hard_ca.npy",
        "probsatg4sat_easy_ps.npy",
        "probsatg4sat_medium_ps.npy",
        "probsatg4sat_hard_ps.npy",
        "probsatuniform.npy",
]    

models = [
        "../../Data/trajectories/g4sat_easy/ca/",
        "../../Data/trajectories/g4sat_medium/ca/",
        "../../Data/trajectories/g4sat_hard/ca/",
        "../../Data/trajectories/g4sat_easy/ps/",
        "../../Data/trajectories/g4sat_medium/ps/",
        "../../Data/trajectories/g4sat_hard/ps/",
]

for i in range(len(models)):
    print(models[i])
    saved_eval_path_list = [ 
        models[i] + traj_paths[i],
        models[i] + "probsatuniform.npy",
    ]

    legend = [
        traj_paths[i],
        "uniform WalkSAT"]

    df = get_statistics_relative_improvement(saved_eval_path_list, legend, max_steps_algo=10**6 - 1)
    print(df.to_string(index=False))