# Shorter Example

Branch-and-bound method is a general algorithm to solve optimization problems, which allow access to a sub-optimality oracle -- a procedure, which, given an instance of a problem $
    \min\{f(x) \colon x \in X\}
$, returns a non-trivial lower bound $\tau$ such that $\tau \leq f(x)$ for all $x \in X$. BnB cuts-off subsets of the feasibility set $X$ in a methodical and structured manner, but still has to make sure no region is missed.
- In fact, bnb can be adapted to the case, when the problem allows access only to a SAT oracle, which, given $(f, X, \tau)$, decides whether $\exists x \in X$ such that $f(x) \leq \tau$.

In [None]:
import numpy as np
import networkx as nx
from scipy import sparse as sp

from numpy import ndarray
from numpy.random import default_rng

from tqdm import tqdm
from matplotlib import pyplot as plt

from time import monotonic

Invert our bnb solver at the branch variable selection, while still using the depth-first node selector.

In [None]:
from toybnb import search as bnb
from toybnb.milp import MILP
from toybnb.coro import Coroutine
from typing import Generator


def inverted_search(p: MILP, nodesel: callable = bnb.nodesel_dfs) -> ...:
    """Inverted variable branching as generator"""

    # need access to `co` through closure
    def branchrule(T: nx.DiGraph, node: int) -> int:
        return co.co_yield((T, node))

    # def nodesel(G: nx.DiGraph, *reschedule: int) -> int:
    #     # yields must use a tag for use in a state machine
    #     return co.co_yield(("nodesel", G, *reschedule))

    args = p, nodesel, branchrule
    co = Coroutine(bnb.search, args=args, kwargs={})
    return iter(co)


def branching(p: MILP, nodesel: callable = bnb.nodesel_dfs) -> tuple[nx.DiGraph, int]:
    """Branching that allows each node to be visited only once"""
    it, visited, var = inverted_search(p, nodesel), set(), None
    try:
        while True:
            T, node = it.send(var)
            while node in visited:
                T, node = it.throw(IndexError)

            visited.add(node)
            var = yield T, node
            # assert it.gi_frame.f_locals["self"].co_is_suspended

    except StopIteration as e:
        return e.value, None


def send(it: Generator, value: ...) -> ...:
    """Send a value to the generator and get a value from it in return"""
    try:
        return it.send(value), False

    except StopIteration as e:
        return e.value, True


class GeneratorEnv:
    """A wrapper to make generators into envs."""

    def reset(self, it: Generator) -> ...:
        self.it = iter(it)
        return send(self.it, None)

    def step(self, act: ...) -> ...:
        return send(self.it, act)

<br>

#### A generic random MILP

A Mixed Integer Linear Programs (MILP) is a linear program with integrality constraints. A MILP has the following generic form:

$$
\min\Bigl\{
    c^\top x
    \colon
    A x \leq b
    \,, x \in \bigl[l, u\bigr]
    \,,
    x \in \mathbb{Z}^m \times \mathbb{R}^{n-m}
\Bigr\}
    \,, $$

where $A \in \mathbb{R}^{r \times n}$, $b \in \mathbb{R}^r$, $
    l, u \in \mathbb{R}^n
$ with $l \leq u$ and $1 \leq m \leq n$.

In [None]:
from toybnb.milp import generate as generate_generic

#### A generator for MIS problems

For a undirected graph $G = (V, E)$ the Maximum independent set problem is
$$
\begin{aligned}
    & \underset{x\in \{0, 1\}}{\text{maximize}}
      & & \sum_{v \in G} x_v
          \\
    & \text{subject to}
      & & \forall uv \in E
          \colon x_u + x_v \leq 1
          \,.
\end{aligned}
$$

The generator implemented in toybnb.milp uses the Barabasi-Albert graph generator

In [None]:
from toybnb.milp import generate_mis_ba

<br>

### ToyBNB tree

Generate a simple problem and solve it with SCIP to get the reference

In [None]:
# it = generate_generic(100, 50, 10, seed=1458)
# it = generate_generic(150, 50, 15, seed=1458)
it = generate_generic(50, 50, 10, seed=1458)  # many infeasible nodes
# it = generate_mis_ba(210, seed=454)

# it = generate_generic(1500, 1200, 5, seed=53912)
# it = generate_generic(50, 38, 10, seed=1458)  # very slow ub decay

# it = generate_mis_ba(500, seed=69420)
p = next(it)  # x = it.gi_frame.f_locals["p"]

BnB for MILP is an exhaustive search which employs _a continuous relaxation of the MILP_ as the oracle in order to eliminate sub-regions of the feasibility set.
Let the integer feasibility set and its continuous relaxation be, respectively,
$$
\begin{align}
S
    &= \bigl\{
        x \in {\color{red}{\mathbb{Z}^m}} \times \mathbb{R}^{n-m}
        \colon A x \leq b
        \,, x \in [l, u] 
    \bigr\}
    \,, \\
\breve{S}
    &= \bigl\{
        x \in {\color{blue}{\mathbb{R}^m}} \times \mathbb{R}^{n-m}
        \colon A x \leq b
        \,, x \in [l, u] 
    \bigr\}
    \,.
\end{align}
$$
The linear problem
$$
\breve{f}
    = \min_x \{
        c^\top x
        \colon x \in \breve{S}
    \}
    \,, $$
offers a _lower bound on the achievable value_ of the objective in $S$, i.e. every integer-feasible $x \in S$ necessarily has $
    \breve{f} \leq c^\top x
$.

In [None]:
from toybnb.tree import subproblem

Solve the MILP, generated above, with random variable branching.

In [None]:
times = []
with tqdm(ncols=70) as pb:
    rng = default_rng(671)
    env = GeneratorEnv()
    (T, node), fin = env.reset(branching(p))
    while not fin:
        times.append(monotonic())
        pb.update(1)

        # pick a random index
        _, _, mask = subproblem(T, node)
        var = rng.choice(np.flatnonzero(mask))

        # branch
        (T, node), fin = env.step(var)
    times.append(monotonic())

assert node is None
times_rng = np.array(times) - times[0]
nn_rng, pv_rng, dv_rng, lb_rng = map(np.array, zip(*T.graph["track"]))

Plot the primal and dual bound history

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(5, 2), dpi=300)

ax.plot(dv_rng)
ax.plot(pv_rng)
ax.plot(lb_rng)

ax.twinx().plot(times_rng[1:], c="k")

Sanity check

In [None]:
from toybnb.tree import Status

# we have incumbent >= lp ins `dual` and either closed
#  or integer-feasible
st = nx.get_node_attributes(T, "status")
assert all(st[n] in (Status.FEASIBLE, Status.CLOSED) for _, n in T.graph["duals"])

is_leaf = {n: not bool(T[n]) for n in T}
is_fathomed = {n: s != Status.OPEN for n, s in st.items()}
assert all(is_fathomed.values())

assert T.nodes[T.graph["root"]]["best"] is T.graph["incumbent"]

A random branching strategy is absolutely oblivious to the nature and geometry of the underlying problem.

<br>

### lower-bound aware depth-first node selection

A node selection strategy is as important as the variable selection heuristic: the latter's goal is to achieve subset cutoff as soon as possible, while the fomer's is to dive into the nodes with the loosest lower bound possible so as to eat up the potential primal-dual margin in a sub-tree as fast as possible.

In [None]:
from toybnb.tree import Status


def nodesel_dfs_lb(G: nx.DiGraph, *reschedule: int) -> int:
    """Prioritize the child with the worst lp lower bound."""
    stack, dt = G.graph["queue"], G.nodes

    def lower_bound(n: int) -> float:
        return dt[n]["lp"].fun

    # prioritize the children by their lp lower bound, but
    #  schedule only OPEN nodes. The less tight the lower
    #  bound, intuitively, the more margin there is for an
    #  integer feasible solution.
    for n in sorted(reschedule, key=lower_bound, reverse=True):
        if dt[n]["status"] == Status.OPEN:
            stack.append(n)

    while stack:
        n = stack.pop()
        if dt[n]["status"] == Status.OPEN:
            return n

    raise IndexError

Run random branching with a slightly smarter nodesel

In [None]:
times = []
with tqdm(ncols=70) as pb:
    rng = default_rng(671)
    env = GeneratorEnv()
    (T, node), fin = env.reset(branching(p, nodesel_dfs_lb))
    while not fin:
        times.append(monotonic())
        pb.update(1)

        # pick a random index
        _, _, mask = subproblem(T, node)
        var = rng.choice(np.flatnonzero(mask))

        # branch
        (T, node), fin = env.step(var)
    times.append(monotonic())

assert node is None
times_rng_dfs = np.array(times) - times[0]
nn_rng_dfs, pv_rng_dfs, dv_rng_dfs, lb_rng_dfs = map(np.array, zip(*T.graph["track"]))

The trace and the value function

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(5, 2), dpi=300)

ax.plot(dv_rng_dfs)
ax.plot(pv_rng_dfs)
ax.plot(lb_rng_dfs)

ax.twinx().plot(times_rng_dfs[1:], c="k")

<br>

### Strong branching

Strong branching selects a variable to split the MILP with based on exhaustive look-ahead for the most promising  lower bound (lp relaxation, dual bound), with the goal to cut off a half of the search space as quickly as possible.

In [None]:
# wrapper for scipy's LP solver for the relaxed problem w/o integrality
from toybnb.tree import lpsolve

Suppose, we happen to have a candidate (incumbent) $x_*$ with $f^* = c^\top x_*$, which is integer-feasible in the __original problem's domain__ of which $S$ is a proper subset, but with $x_* \notin S$.
Then $f^* < \breve{f}$ __certifies__ that there is __no integer-feasible__ $x$ in $
    \breve{S} \supseteq S
$ with a lower objective value than $x_*$. This means that $\breve{S}$, including the entirety of $S$, may be excluded from the search.

The lp-gains $\Delta_\pm$ are computed based on the left and right relaxation of the integer problem.

If $\breve{f} \leq f^*$ and $\breve{x} \notin S$, there is some $j=1..m$ such that $\breve{x}^j \notin \mathbb{Z}$. Since it would be impossible for any integer-feasible solution $x \in S$ to have $
    x^j \in \bigl(
        \lfloor \breve{x}^j \rfloor,
        \lceil \breve{x}^j \rceil
    \bigr)
$, it becomes reasonable to split the original feasibility set $S$ in two non-overlapping subsets:
$$
    \underbrace{
        \bigl\{
            x \in \mathbb{R}^n \colon
            x^j \leq \lfloor \breve{x}^j \rfloor
        \bigr\}  % \times \mathbb{R}^{n-1}
    }_{R^j_-}
    \uplus
    \underbrace{
        \bigl\{
            x \in \mathbb{R}^n \colon
            \lceil \breve{x}^j \rceil \leq x^j
        \bigr\}  % \times \mathbb{R}^{n-1}
    }_{R^j_+}
    \,.
$$
Note that it is sufficient to split the region $S$ with respect to one variable only, since every other split is guaranteed by integer-feasiblity __not__ to contain a feasible solution in the excluded region of the $j$-th variable.

In [None]:
# partition the feasibility set on a given variable with the specified threshold
from toybnb.tree import split

The sub-regions $
    S^j_\pm = S \cap R^j_\pm
$ bring about new lower bounds $
\breve{f}^j_\pm
    = \min_x \{
        c^\top x\colon x \in \breve{S}^j_\pm
    \}
$. The new sub-problems are obtained from the original by modifying the bounds of the $j$-th variable: $
    \bigl[l^j, \lfloor \breve{x}^j \rfloor\bigr]
$ and $
    \bigl[\lceil \breve{x}^j \rceil, u^j\bigr]
$. The amount of the objective value, by which each bound is tightened, is the _gain_:
$$
\Delta^j_\pm
    = \breve{f}^j_\pm - \breve{f}
    \geq 0
    \,. $$

In [None]:
# compute LP branching gains for each candidate in the binary mask
from toybnb.tree import lp_gains

Scoring functions
* additive $
    s_j = \mu \max\{\Delta^j_-, \Delta^j_+\}
        + (1 - \mu) \min\{\Delta^j_-, \Delta^j_+\}
$
* multiplicative $
    s_j = \Delta^j_- \Delta^j_+
$ -- the default scorefunc in SCIP

The strong branching rule enumerates all fractional variables and evaluates the lp lower bound changes on a split with respect to each one.

In [None]:
"""Choose a variable to split the problem with, based on exhaustive look-ahead."""

times = []
with tqdm(ncols=70) as pb:
    n_lpit = 0
    env = GeneratorEnv()
    (T, node), fin = env.reset(branching(p, nodesel_dfs_lb))
    while not fin:
        times.append(monotonic())
        pb.update(1)

        # get the up-lo branching gains
        p_, lp_, mask = subproblem(T, node)
        gains, nit = lp_gains(p_, lp_, mask)
        n_lpit += nit

        # scores = mu * gains.max(-1) + (1 - mu) * gains.min(-1)
        scores = gains[:, 0] * gains[:, 1]

        # branch
        var = np.nanargmax(scores)
        (T, node), fin = env.step(var)
    times.append(monotonic())

assert node is None
times_sb = np.array(times) - times[0]
nn_sb, pv_sb, dv_sb, lb_sb = map(np.array, zip(*T.graph["track"]))

Plot the track, the value paths, and the tree

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(5, 2), dpi=300)

ax.plot(dv_sb)
ax.plot(pv_sb)
ax.plot(lb_sb)

ax.twinx().plot(times_sb[1:], c="k")

Strong branching is [provably](#citation_needed) the best branching rule in terms of search tree efficiency (the overall number of nodes). However, the exhaustive enumeration of all candidates coupled with invoking an expensive lp solver twice for each, makes the SB rule very computationally expensive in practice.
* At the same time time [there are](https://arxiv.org/abs/2110.10754.pdf) MILP, which produce lp relaxations that fail to represent problem progress

<br>

### Pseudocost branching

Whenever a fractional variable $j$ is picked for branching and both sub-porblems have feasible lp relaxations, we can compute its, so called, _branching pseudocosts_: $
    p^j_\pm = \frac{\Delta^j_\pm}{\phi^j_\pm}
$ where $
    \phi^j_- = \breve{x}^j - \lfloor \breve{x}^j \rfloor
$, and $
    \phi^j_+ = \lceil \breve{x}^j \rceil - \breve{x}^j
$. Conceptually, the $p^j_\pm$ measure the rate of degradation of the objective function value between the nested lp relaxations caused by splitting on a given variable.

Using the envelope theorem, it is possible to view the pseudocosts as crude finite-difference approximations to sensitivity of the function to changing the bound constraint at the current solution: $
    p^j_\pm
        \approx \partial_{\theta_j} \breve{f}_\theta
$.

- the envelope theorem for the problem $
    f(\theta) = \min_x \{f(x, \theta) \colon g_k(x, \theta) \geq 0\}
$ states that, if in some neighbourhood of the $\theta$ the solution $
    x^*(\theta)
$ is differentiable, then the value function $
    f^*(\theta) = f(x^*(\theta), \theta)
$ is differentiable with
$$
\partial_\theta f^*(\theta)
    = \partial_2 f(x^*(\theta), \theta) - \sum_k \lambda^*_k(\theta) \partial_2 g_k(x^*(\theta), \theta)
    \,. $$
    - if a constraint is non-binding at $x_*(\theta)$, then it is non-binding in some open neighbourhood, and thus infinitesimal changes to it do not affect the solution and value
    - if $g_k(x, \theta) = h_k(x, \theta_1) - \theta_2$ for some $k$ and all other constraints do not depend on $\theta_2$, then $
        \partial_{\theta_2} f^*(\theta)
            = \lambda^*_k(\theta)
    $.
    - p.5 of [these notes](http://www.u.arizona.edu/~mwalker/MathCamp2021/EnvelopeTheorem.pdf) seems like a good introductory reference

In [None]:
from collections import namedtuple

Acc = namedtuple("Acc", "n,v")


def fractionality(x: ndarray) -> ndarray:
    """Compute the fractionality of each variable"""
    return np.stack((x - np.floor(x), np.ceil(x) - x), axis=-1)


def acc_get_estimate(acc: Acc, *, min: float = 1e-4, epsilon: float = 1e-5) -> ndarray:
    # compute the Laplace-corrected average estimate
    n = acc.n.sum(0, keepdims=True)
    coef = n / (n + len(acc.n) * epsilon)
    size = (acc.n + epsilon) * coef

    return np.clip(acc.v / size, min, None)

If the split results in an infeasible sub-problem, researchers suggest to use a [_fake objective value_](http://www.or.deis.unibo.it/andrea/pscost-ISMP2009.pdf), computed based on the parent's lp value and the averaged pseudocost of all other variables multiplied by $\phi^j_\pm$:
$$
\tilde{p}^j_\pm
    = \frac{
        \overbrace{
            \breve{f} + A \bigl( \bar{p}_\pm \phi^j_\pm + \epsilon \bigr)
        }^{\text{fake objective value}}
        - \breve{f}
    }{\phi^j_\pm}
    \,, $$
for a small $\epsilon > 0$, a large $A > 0$, and $\bar{p}_\pm$ -- the average pseudocost across the integer variables.

Whenever a pseudocost for a fractional variable is not available, we use partial strong branching to seed the initial up-lo gain estimates.

In [None]:
from math import isfinite


def update_pseudocosts(pc: Acc, T: nx.DiGraph, node: int) -> None:
    p = T.nodes[node]["p"]

    # compute pseudocosts
    for c, dt in T[node].items():
        # decide which pseudocost to update
        k = 0 if dt["key"] < 0 else 1
        j, g, f = dt["j"], dt["g"], dt["f"]

        # compute pseudocosts and handle infeasible lp gains
        pcost = g / f
        if not isfinite(pcost):
            # on the one hand, we want the pcosts to reflect the dual
            #  gain from splitting by a variable, and at the same time
            #  cut off a sub-problem as quickly as possible. On the
            #  other hand, spoiling the pcosts now with a really HIGH
            #  gain, is bad for the using pcost estimate at other nodes.
            continue

            # fake = lp.fun + (avg_pcost * f + eps) * LARGE
            pc_avg = pc.v[: p.m].sum(0) / pc.n[: p.m].sum(0)

            # pcost = (fake - lp.fun) / f
            pcost = (float(pc_avg[k]) * f + 1e-2) * 1e4 / f

        # bump the branching counter and update averages in-place
        pc.n[j, k] += 1
        pc.v[j, k] += pcost
        # pc.v[j, k] += (pcost - pc.v[j, k]) / pc.n[j, k]

Let's try the pseudocost branching approach

In [None]:
node, times = None, []
with tqdm(ncols=70) as pb:
    n_lpit = 0
    env = GeneratorEnv()

    last = node
    (T, node), fin = env.reset(branching(p, nodesel_dfs_lb))

    # get the up-lo pseudocosts
    n = T.graph["p"].n
    pc = Acc(np.zeros((n, 2)), np.zeros((n, 2)))
    while not fin:
        times.append(monotonic())

        pb.update(1)
        assert not np.isnan(pc.v).any()

        # get the up-lo branching gains
        p_, lp_, mask = subproblem(T, node)
        cands = np.flatnonzero(mask)
        assert len(cands) > 0

        frac = fractionality(lp_.x)

        # pick which pseudocosts need to be initialized
        mask = ((pc.n == 0.0).any(-1)) & (mask > 0)
        # pb.set_postfix_str(f"{mask.sum()} {bnb.bnb.gap(T):.2%}")  # XXX slow
        if mask.any():
            gains, nit = lp_gains(p_, lp_, mask)
            n_lpit += nit

            # get the pseudocosts (costs as in `c_j`)
            pcost = np.clip(gains / frac, 0, abs(p_.c).max())
            pc.v[mask] = pcost[mask]
            pc.n[mask] = 10  # XXX we can tweak this parameter

        # decide, which variable to branch on
        gains = acc_get_estimate(pc, min=1e-5) * frac
        # scores = mu * gains.max(-1) + (1 - mu) * gains.min(-1)
        scores = gains[:, 0] * gains[:, 1]
        var = cands[scores[cands].argmax()]

        # branch
        last = node
        (T, node), fin = env.step(var)
        update_pseudocosts(pc, T, last)

    times.append(monotonic())

assert node is None
times_pc = np.array(times) - times[0]
nn_pc, pv_pc, dv_pc, lb_pc = map(np.array, zip(*T.graph["track"]))

Plot the primal-dual bounds evolution, the value function paths, and the search tree for pseudocost branching

In [None]:
from matplotlib import pyplot as plt

fig, ax = plt.subplots(1, 1, figsize=(5, 2), dpi=300)

ax.plot(dv_pc)
ax.plot(pv_pc)
ax.plot(lb_pc)

ax.twinx().plot(times_pc[1:], c="k")

Plot the final pseudocost estimate

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(5, 2), dpi=300)

lo, up = acc_get_estimate(pc).T
ax.plot(up / up.max(), label="up")
ax.plot(-lo / lo.max(), label="lo")
ax.legend(fontsize="xx-small")

<br>

How fast does each method decrease the primal bound?

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(5, 2), dpi=300)

(l,) = ax.semilogx(pv_rng, label="rng", alpha=0.75)
# ax.semilogx(lb_rng, c=l.get_color(), alpha=0.75)

(l,) = ax.semilogx(pv_rng_dfs, label="rng-dfs", alpha=0.75)
# ax.semilogx(lb_rng_dfs, c=l.get_color(), alpha=0.75)

(l,) = ax.semilogx(pv_sb, label="sb", alpha=0.75)
# ax.semilogx(lb_sb, c=l.get_color(), alpha=0.75)

(l,) = ax.semilogx(pv_pc, label="pc", alpha=0.75)
# ax.semilogx(lb_pc, c=l.get_color(), alpha=0.75)

ax.legend(fontsize="xx-small", ncol=4)

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(5, 2), dpi=300)

ax.plot(times_rng, label="rng", alpha=0.75)
ax.plot(times_rng_dfs, label="rng-dfs", alpha=0.75)
ax.plot(times_sb, label="sb", alpha=0.75)
ax.plot(times_pc, label="pc", alpha=0.75)

ax.legend(fontsize="xx-small", ncol=4)

<br>

In [None]:
assert False

<br>

# Log

Reinventing the bicycle gives one better understanding of its inner workings and design.
So in this notebook I re-implement a BnB algorithm for Mixed Integer Linear Programs.

The following log is related to an earlier version and development of the toy bnb algorithm around the 8th of October, 2022.

When doing this study i proceeded through the following steps:
1. i started with defining a MILP specification format, that compatible with scipy's `linprog` API
  * initially i used list of tuples format for lower/upper bounds, but then, after digging through the code of `scipy.optimize` i discovered that passing $n \times 2$ numpy arrays with $\pm\infty$ works just fine (and is preferable, actually).

2. then i implemented the key feasibility checkers
  * bounding box $x_j \in \bigl[l_j, u_j\bigr]$
  * upper bound linear inequality constraints $A_\mathrm{ub} x \leq b_\mathrm{ub}$
  * linear equality constraints $A_\mathrm{eq} x = b_\mathrm{eq}$
  * integer feasibility $x \in \mathbb{Z}^m \times \mathbb{R}^{n-m}$

3. the `lpsolve` procedure, at first, dealt with list-of-tuples bounds, but was later reduced to an interface function

4. then `new` and `add` functions were added. Initially, the lp relaxation was computed outside of `add`, but it was reasonable to put the step inside

5. then i implemented the `bnb_begin` function (used to call `lpsolve`, before moving into `add`) and sketched the main loop of the bnb search: pruning by lower bound, picking the next node to process, and splitting (no infeasibility/integer feasibility fathoming yet). There i immediately started using the min-heap for pruning certifiably sub-optimal sub-problems.

6. with adding incumbent tracking, wild random branching, and variable splitting blocks the loop's body grew ever larger. So i factored three procedures out of it: `bnb_prune`, `bnb_update_incumbent`, and `bnb_branch`.
  - atm, we use a single global incumbent, but likely that the MILP has many solutions
    - [ ] implement storage for the case when there is solution multiplicity (18th of November, 2022)
  - [x] it would be structurally nice to make the stats more `recursive`, and let each node have its own set of incumbents that are best-so-far in that node's sub-problem.

7. variable picking logic was then also moved outside of the branching routine

8. i tinkered with a method to conveniently store branching rules' data and implemented node-local dunder attributes, that allowed the rules to be stateful
  - [ ] add a mask that indicates which variables were branched on, so that if we ever want to mix up branching rules on the fly, that they can communicate each other's choices

9. initially my bnb implementation __did not re-introduce__ not yet fully processed nodes into the dual bound max-heap, which resulted in sub-optimal solutions, since __not all branching alternatives were explored__.
  - [x] currently the dual bound max-heap also serves as the sub-problem prioritizer. This has to change.
  - this is the reason i decided to study bnb by _reinventing_ it: it was not clear to me, based on my tinkering with `ecole` and tree search retrieval by RETRO, how SCIP explored branching alternatives, if at all. Also i had a misconception about what gap it uses to gauge optimization progress.

10. after many attempts to track the status of each node (global fathomed/pruned sets, local flags) it finally dawned on me to introduce common node status codes, that allowed finer tracking of fathomed nodes:
  - __INFEASIBLE__: lp relaxation is infeasible, hence so the node's sub-problem is also infeasible
  - __INTEGER FEASIBLE__: the relaxation produced a solution that satisfies the integrality constraints, hence a global optimum for the node's MILP sub-problem was found, and there is no need to process it further
  - __PRUNED__: the lp produced a lower bound on all possible solutions to this node's MILP, that is higher than some integer feasible solution founds elsewhere. Thus we have a theoretical guarantee that it cannot contain a solution to the original root MILP.
  - __CLOSED__ and __OPEN__: fully processed nodes or nodes that still have some un-branched fractional variables, respectively.
  - status code greatly helped with debugging the search and understanding if it was operating correctly

11. I added gap tracking, `bnb_scip_gap` as defined by SCIP, to monitor progress, and finally put together the `bnb_search` procedure from the bare loop, that was running in a cell

12. (around 15th of October) Having become content with the slow but steady operation on the toy problems, i decided to apply this code on the crab allocation problem. This attempt failed miserably, since the algorithm worked way to slow. Even with the problem $56$ times smaller, the speed was still a huge issue. And the `linprog` solver was very unhappy with the allocation problem, complaining about its ill-posedness.
  - Also strong branching preformed much worse than random branching, for some reason. I suspect the issue is with incorrect node prioritization (indeed it was in hindsight)

13. I started experimenting with `presolve` in `scipy.optimize` sub-package, the code which i studied for hints at ways to improve the runtime. Serendipitously, i stumbled on the HiGHS linear solver, which tremendously sped up the procedure. Now at least the crab problem's gap started decreasing!

14. I suspected that backtracking to a still open node and branching on another variable and then diving could potentially produce a sub-problem the lp feasibility of which is covered by the feasibility region of some other __already__ solved problem. I confirmed by suspicions by inspecting the bounding boxes of the sub-problems produced by the bnb.
  - [ ] ~~we need to implement a lookup for solved sub-MILP that matches by covering bounding boxes (see `bnb_find_solved` stub and `bnb_update_interval_trees`)~~
  - (18th of November, 2022) the problem of feasibility subset revisits was due to branching more than once from every open node (see below, and the newer description above)

15. Started taking node scheduling responsibility off the dual bound max-heap
  - [x] `bnb_schedule_node`, `bnb_select_node`
  - it used to be that the `duals` max heap was automatically purged at the end of the bnb loop (removing FEASIBLE nodes by updating the incumbent and never rescheduling nodes, that were CLOSED)
  
16. testing feasibility by computing slacks $\max\{Ax - b, 0\}$ to _zero_ is unnecessarily slow

17. migrate the code into a python package

18. change the order of node and branch selection calls, so as to make the search code more modular

19. realize that there is NEVER a need to revisit an earlier branched node: the excluded region cannot contain an integer-feasible solution, even under splits by other variables!
20. implemented control inversion (via python threading) for easier experimentation (around the 10th of November, 2022)


##### References
* a well written master thesis on learning to branch [Scavuzzo Montana (2020)](https://repository.tudelft.nl/islandora/object/uuid:e1c09189-0b8f-470f-be99-1e1cf04f805e)
* the phd thesis of the core developer of SCIP [Achterberg (2007)](https://depositonce.tu-berlin.de/items/9f46a10e-2f7b-4dea-8e27-9cae07de5258/full)
* the original paper that proposes BnB [Benichou et al. (1979)](https://doi.org/10.1007/BF01584074)

<br>

### on the lp solver backend

`linprog` of `scipy.optimize` automatically presolves the problem and resolves redundancies.
- we could use it to simplify the search
- the parent-child sub-problems be nested __geometrically__, which __does not entail algebraic__ similarity

```python
from copy import deepcopy
from scipy.optimize._linprog_util import _presolve, _postsolve, _LPProblem
    
lp = _LPProblem(
    p.c, p.A_ub, p.b_ub, p.A_eq, p.b_eq, p.bounds, None
)
lp_o = deepcopy(lp)

# c0 is a constant term in the objective after presolve
lp, c0, x, undo, complete, status, message = _presolve(
    lp, True, None, tol=1e-9
)

x, fun, slack, con = _postsolve(
    x, (lp_o._replace(bounds=lp.bounds), undo, 1, 1), complete
)
```

When the solver is "highs" `linprog` from `scipy.optimize` passes the parsed (`_parse_linprog`), but unmodified problem directly to `_linprog_highs`.
* it appears that `_getAbc` and other problem transformations are currently undergoing a deprecation cycle

`_linprog_highs` transforms the problem from $
    A_\mathrm{ub} x \leq b_\mathrm{ub}
$ to $
    -\infty \leq A_\mathrm{ub} x \leq b_\mathrm{ub}
$ and $
    A_\mathrm{eq} x = b_\mathrm{eq}
$ to $
    b_\mathrm{eq} \leq A_\mathrm{eq} x \leq b_\mathrm{eq}
$.

```python
from scipy.optimize._linprog_highs import _linprog_highs, _highs_wrapper
```

<br>