# Loading Experiment Data

In this notebook, we start by loading the data collected while running different experiment-wares, and perform some preprocessing on this data to allow its use for further analysis in dedicated notebooks.

## Imports

We first need to import the modules we need to load the data.
In particular, we must obviously import *Metrics-Wallet*, which we will use to deal with our data.

In [1]:
from itertools import product
from metrics.wallet import BasicAnalysis
import pandas as pd
# pd.set_option('display.max_columns', None)
# pd.set_option('display.width', None)
# pd.set_option('display.max_colwidth', None)

## Reading the data

The next step is to read the data from the log files produced by our different experiment-wares.
This data is described in the file [`scalpel_config.yml`](config/scalpel_config.yml), and automatically parsed by *Metrics-Scalpel* to create a `BasicAnalysis` object.

In [2]:
analysis = BasicAnalysis(input_file='config/config.yml', log_level='WARNING')

The `BasicAnalysis` object instantiated above provides elementary and general methods for preprocessing our data before actually analyzing the results (which will require more specific methods as it can be seen in the dedicated notebooks).

In [3]:
analysis.data_frame.columns

Index(['input', 'experiment_ware', 'cpu_time', 'problem', 'status',
       'flatBoolConstraints', 'arch', 'flatBoolVars', 'num_solutions',
       'store_mem', 'eps_solved_subproblems', 'eliminatedImplications',
       'timeout_ms', 'flatIntVars', 'variables', 'flatIntConstraints', 'paths',
       'or_nodes', 'solveTime', 'propagator_mem', 'run', 'propagations',
       'peakDepth', 'objectiveBound', 'free_search', 'propagators',
       'objective', 'stack_size', 'problem_path', 'boolVariables', 'failures',
       'solver', 'nodes', 'method', 'memory_configuration',
       'fixpoint_iterations', 'restarts', 'and_nodes', 'initTime',
       'num_blocks_done', 'nSolutions', 'eps_skipped_subproblems',
       'eps_num_subproblems', 'flatTime', 'version', 'shared_mem',
       'evaluatedHalfReifiedConstraints', 'evaluatedReifiedConstraints',
       'solutions', 'model', 'data_file', 'timeout', 'success', 'user_success',
       'missing', 'consistent_xp', 'consistent_input', 'error'],
      dtyp

An important thing to do now is to visualize the collected data, to make sure that everything was properly read.
This can be achieved by looking at the data-frame that has been built inside the `BasicAnalysis` object.

In [4]:
analysis.data_frame[(analysis.data_frame['experiment_ware']=='TurboGPU') & (analysis.data_frame['status']=='SATISFIED')][['problem','success','consistent_xp','consistent_input','error','user_success','missing']]

Unnamed: 0,problem,success,consistent_xp,consistent_input,error,user_success,missing
175,diameterc-mst,True,True,True,False,True,False
290,generalized-peacable-queens,True,True,True,False,True,False
61,accap,True,True,True,False,True,False
336,generalized-peacable-queens,True,True,True,False,True,False
9,roster-sickness,True,True,True,False,True,False
319,nfc,True,True,True,False,True,False
343,accap,True,True,True,False,True,False
211,triangular,True,True,True,False,True,False
222,generalized-peacable-queens,True,True,True,False,True,False
30,team-assignment,True,True,True,False,True,False


## Checking the success and consistency of the results

During our analysis, we will need to know whether a given experiment was successful. As an example, we provide below the code to check the success of an optimization solver.


In [5]:
def is_success(xp):
    """
    This function checks that a solver either proved the optimality of its best
    bound within the time limit, or proved the input to be unsatisfiable.

    :param xp: The experiment to determine the best bound of.
    """
    return xp['status'] != 'UNKNOWN'

To make sure that our experiments are consistent, we also need to compare the results obtained by the different experiment-wares. As an example, we provide below the code to check that if different optimization solvers claim to have found an optimal value, this value must be the same for all solvers.

In [6]:
def is_consistent_by_input(df_input):
    """
    This function checks that the pairwise comparison between two different
    optimal bounds found on the same input is small enough to consider these bounds as consistent.
    """
    # Checking the decision of the solvers.
    decisions = df_input['status'].unique()
    if 'OPTIMAL_SOLUTION' in decisions and 'UNSATISFIABLE' in decisions:
        # A solver has found an optimal solution while another proved unsatisfiability.
        return False
    if 'SATISFIED' in decisions and 'UNSATISFIABLE' in decisions:
        # A solver has found a solution while another proved unsatisfiability.
        return False

    # Checking that at most one optimal value exists.
    best_values_for_complete_search = df_input[df_input['objective']=='OPTIMAL_SOLUTION']['objective'].unique()

    # Checking if the "proved" best bound is less optimal than another non-optimal bound
    if df_input['method'].unique()[0] == 'minimize': # in the case of minimization
        best_global_value = df_input['objective'].min()
    else: # in the case of maximization
        best_global_value = df_input['objective'].max()
    if best_global_value is None or len(best_values_for_complete_search) == 0:
        return True
    # Checking
    return best_values_for_complete_search[0] == best_global_value

In [7]:
analysis.check_success(is_success)
analysis.check_input_consistency(is_consistent_by_input)

In [8]:
analysis.error_table()[['problem','experiment_ware','status','method','objective']].to_html("error.html")

## Summary and export of the analysis

We can now give a summary of the analysis, that we obtain through the following table.

In [9]:
analysis.description_table()

Unnamed: 0,analysis
n_experiment_wares,4
n_inputs,88
n_experiments,352
n_missing_xp,0
n_inconsistent_xp,0
n_inconsistent_xp_due_to_input,0
more_info_about_variables,<analysis>.data_frame.describe(include='all')


Finally, the analysis is exported, both to share the data to allow the reproducibility of the analysis, and to reuse it in other notebooks dedicated to more specific analyses.

In [10]:
analysis.export('.cache')