# Overview

This is the second part of a 2 part tutorial which runs the PaRoutes benchmark.
Running this notebook requires that part 2a is already run.
In this half of the tutorial, we analyze the results of search algorithms.

"Analysis" can be many things. In this notebook we focus on the time when a solution is found
and the number of diverse solutions found.
We also visualize the routes.

In [1]:
from __future__ import annotations
import math
from pathlib import Path
import pickle
import pprint

In [2]:
from syntheseus.search.graph.molset import MolSetNode, MolSetGraph
from syntheseus.search.graph.and_or import AndNode, OrNode, AndOrGraph

## Step 1: load results from notebook 2a

In [3]:
# Load pickle files
alg_name_to_graph = dict()
for alg_name in ["retro star", "mcts"]:
    with open(f"./search-results-{alg_name}.pkl", "rb") as f: 
        alg_name_to_graph[alg_name] = pickle.load(f)

## Step 2: time at which a solution is found 

The code is written in a way where nodes keep track of their own
creation time, so time-based measures can be computed retrospectively for
many time measures
(e.g. wallclock time, number of calls to reaction model).
For any analysis involving time, we need to choose how time is measured,
and this is left up to the user by filling in the `analysis_time` field of each node's data.

In [4]:
for graph in alg_name_to_graph.values():
    for node in graph.nodes():
        
        # Wallclock time: difference between this node's creation time and that of the root node
        node.data["analysis_time"] = (node.creation_time - graph.root_node.creation_time).total_seconds()
        
        # Could alternatively use number of calls to reaction model
        # node.data["analysis_time"] = node.data["num_calls_rxn_model"]

In [5]:
# Now use a function to compute the first solution time
from syntheseus.search.analysis.solution_time import get_first_solution_time
for alg_name, graph in alg_name_to_graph.items():
    print(f"{alg_name} first solution: {get_first_solution_time(graph)}")

retro star first solution: 1.205841
mcts first solution: 1.267164


## Step 3: extract routes

We extract individual synthesis routes from the graph
in order to later calculate their diversity.
However, there are many possible routes in a graph,
possibly too many to exhaustively enumerate.
Therefore we only extract the _minimum cost_ routes,
where the cost of each route is the sum of `node.data["route_cost"]` for each
node in the route.
This cost could be anything: a constant,
something based on the policy, etc.
It is up to the user to set a route's cost.
Here we just assign a constant cost to each node which represents a reaction
(i.e. AndNodes and MolSetNodes).
This means that the lowest cost routes will be the shortest routes.

We also limit the maximum number of routes extracted to speed up computation time.

In [6]:
for graph in alg_name_to_graph.values():
    for node in graph.nodes():
        
        if isinstance(node, (AndNode, MolSetNode)):
            node.data["route_cost"] = 1.0
        else:
            node.data["route_cost"] = 0.0

In [7]:
%%time
from syntheseus.search.analysis import route_extraction
alg_name_to_routes = dict()
for alg_name, graph in alg_name_to_graph.items():
    routes = list(route_extraction.iter_routes_cost_order(graph, 10_000))
    print(f"Found {len(routes)} routes for {alg_name}", flush=True)
    alg_name_to_routes[alg_name] = routes
    del routes

Found 10000 routes for retro star
Found 422 routes for mcts
CPU times: user 11.3 s, sys: 54.8 ms, total: 11.4 s
Wall time: 11.4 s


In [8]:
# We visualize the routes just to get a sense of what they look like
from syntheseus.search import visualization

visualization.visualize_andor(
    alg_name_to_graph["retro star"],
    filename="retro star route.pdf",
    nodes=alg_name_to_routes["retro star"][0]
)

visualization.visualize_molset(
    alg_name_to_graph["mcts"],
    filename="mcts route.pdf",
    nodes=alg_name_to_routes["mcts"][0]
)

## Step 4: calculate diversity

Specifically, we estimate the _packing number_ of the route set,
i.e. the number of distinct routes which are greater than a distance $r$ away from each other.
Here we use a stringent form of diversity: routes which have no common reactions,
which means that their Jaccard distance is 1.
To do this, we use the `to_synthesis_graph` method of each graph object
which converts them into a common format.

In [9]:
from syntheseus.search.analysis import diversity
for alg_name, graph in alg_name_to_graph.items():
    route_objects = [graph.to_synthesis_graph(nodes) for nodes in alg_name_to_routes[alg_name]]
    packing_set = diversity.estimate_packing_number(
        routes=route_objects,
        distance_metric=diversity.reaction_jaccard_distance,
        radius=0.999  # because comparison is > not >=
    )
    print(f"{alg_name}: number of distinct routes = {len(packing_set)}")

retro star: number of distinct routes = 2
mcts: number of distinct routes = 4
