<div style="
    margin-bottom: 20px;
    border: 1px solid #DDDDDD;
    border-radius: 8px;
    box-shadow: 0 4px 8px rgba(0,0,0,0.1);
    overflow: hidden;">
    <div style="background-color: #004d40; padding: 15px;">
        <h1 style="margin: 0; color: #ffffff;">What's This About</h1>
    </div>
    <div style="background-color: #FFFFFF; padding: 20px 30px;">
        <p style="color: #333333; margin-top: 0;">
            This notebook demonstrates an novel and simple approach for identifying the shortest path from a scrambled state to the solved state by integrating beam search with gradient boosting models. The process involves:
        <ul style="list-style: circle; margin-left: 20px;">
            <li>Data Generation via Random Walks</li>
            <li>Training a gradient boosting model (e.g. XGBoost or CatBoost) to learn the relationship between a state (permutation) and the number of moves required to solve it</li>
            <li>Using the trained model to guide a beam search by selecting the most promising states at each step</li>
        </ul>
    </div>
</div>


<div style="
    padding: 20px;
    margin-bottom: 20px;
    font-family: Arial, sans-serif;">
    <h2 style="margin-top: 0;">Table of Contents</h2>
    <ul style="list-style: none; padding-left: 0;">
        <li><a href="#train_data" style="text-decoration: none; color: #004d40;">Generate training data using Random Walks</a></li>
        <li><a href="#modeling" style="text-decoration: none; color: #004d40;">Modeling</a></li>
        <li><a href="#beam_search" style="text-decoration: none; color: #004d40;">Beam Search with Model Guidance</a></li>
        <li><a href="#experiments" style="text-decoration: none; color: #004d40;">Run experiments</a></li>
        <li><a href="#history" style="text-decoration: none; color: #004d40;">Version history</a></li>
    </ul>
</div>

In [1]:
import pandas as pd
import random
import torch
import logging
import matplotlib.pyplot as plt
import os

from xgboost import XGBRegressor
from catboost import CatBoostRegressor

from time import time
from typing import List, Tuple, Dict, Optional

In [2]:
import sys
sys.version

'3.10.12 (main, Nov  6 2024, 20:22:13) [GCC 11.4.0]'

In [3]:
if torch.cuda.is_available():
    DEVICE = torch.device("cuda")
else:
    DEVICE = torch.device("cpu")

print(f"Using device: {DEVICE}")

Using device: cuda


<div style="
    margin-bottom: 20px;
    border: 1px solid #DDDDDD;
    border-radius: 8px;
    box-shadow: 0 4px 8px rgba(0,0,0,0.1);
    overflow: hidden;">
    <div style="background-color: #004d40; padding: 15px;">
        <h1 id="train_data" style="margin: 0; color: #ffffff;">Training data</h1>
    </div>
    <div style="background-color: #FFFFFF; padding: 20px 30px;">
        <strong>Move Notation</strong>
        <ul style="list-style: circle; margin-left: 20px;">
            <li><strong>X</strong>: Swaps the first two elements</li>
            <li><strong>L</strong>: Moves the first element to the end (all other elements shift left)</li>
            <li><strong>R</strong>: Moves the last element to the start (all other elements shift right)</li>
        </ul>
        <strong>Random Walks Method (RM)</strong>
        <p style="color: #333333;">
            The random walks method generates training data by starting from a sorted permutation and applying a sequence of random moves. It tracks each state’s first occurrence by recording the number of steps taken to reach that state: if a permutation is encountered for the first time, its label is set to the current step count; if not, the earlier label is retained. This approach efficiently produces a large and diverse set of examples that capture varying levels of disorder.
        </p>
        <p style="color: #333333;">
            Both the number of walks and the number of steps per walk are tunable parameters, and can be adjusted based on available resources and task complexity.
        </p>
    </div>
</div>


In [4]:
def create_lrx_moves(state_size: int) -> dict[str, list[int]]:
    """Create basic LRX moves: X (swap), L (left shift), R (right shift)"""
    identity = list(range(state_size))
    
    X = identity.copy()
    X[0], X[1] = X[1], X[0]
    
    L = identity[1:] + [identity[0]]
    R = [identity[-1]] + identity[:-1]
    
    return [X, L, R]

In [5]:
def first_visit_random_walks(
    generators: list,
    n_steps: int,
    n_walks: int,
    device: torch.device = DEVICE
) -> Tuple[torch.Tensor, torch.Tensor]:
    """
    Generate random walks, tracks when each state was first seen,
    returns the sequence of visited states and their first occurrence step
    """
    state_size = len(generators[0])
    all_moves = torch.tensor(generators, device=device, dtype=torch.long)

    # initialize
    total_states = n_steps * n_walks
    X = torch.zeros((total_states, state_size), device=device, dtype=torch.long)
    y = torch.full((total_states,), float('inf'), device=device, dtype=torch.float)
    
    # starting states
    current_states = torch.arange(state_size, device=device).repeat(n_walks, 1)
    X[:n_walks] = current_states
    
    # hash vector for state tracking
    hash_vec = torch.randint(
        low=-(2**30), high=(2**30),
        size=(state_size,),
        device=device, dtype=torch.long
    )
    
    # initial hashing
    init_hashes = torch.sum(hash_vec * current_states, dim=1)
    unique_hashes, first_indices = torch.unique(init_hashes, return_inverse=True)
    y[:n_walks] = 0
    
    # track seen states with first occurrence step
    state_hashes = {h.item(): 0 for h in unique_hashes}
    
    # random walks
    for step in range(1, n_steps):
        chosen_moves = torch.randint(0, len(generators), (n_walks,), device=device)
        current_states = torch.gather(current_states, 1, all_moves[chosen_moves])
        
        # store states
        idx_start = step * n_walks
        idx_end = (step + 1) * n_walks
        X[idx_start:idx_end] = current_states
        
        # hash and track states
        step_hashes = torch.sum(hash_vec * current_states, dim=1)
        for i, h_val in enumerate(step_hashes):
            h_val_item = h_val.item()
            if h_val_item not in state_hashes:
                state_hashes[h_val_item] = step
                y[idx_start + i] = step
            else:
                y[idx_start + i] = state_hashes[h_val_item]
    
    # filter valid states
    valid_mask = y != float('inf')
    return X[valid_mask], y[valid_mask].long()


def nbt_random_walks(
    generators: list,
    n_steps: int,
    n_walks: int,
    device: torch.device = DEVICE
) -> Tuple[torch.Tensor, torch.Tensor]:
    """Generate non-backtracking random walks, optimized for GPU"""
    state_size = len(generators[0])
    all_moves = torch.tensor(generators, device=device, dtype=torch.long)
    
    # initialize starting states
    current_states = torch.arange(state_size, device=device).repeat(n_walks, 1)
    
    # create hash vector for state tracking
    hash_vec = torch.randint(
        low=-(2**30), high=(2**30),
        size=(state_size,),
        device=device, dtype=torch.long
    )
    
    # track valid walks and their lengths
    valid_states = []
    valid_lengths = []
    
    # initialize state tracking for each walk
    current_hashes = torch.sum(hash_vec * current_states, dim=1)
    seen_hashes = [{h.item()} for h in current_hashes]
    
    # store initial states
    valid_states.append(current_states.clone())
    valid_lengths.extend([0] * n_walks)
    
    # random walks
    for step in range(1, n_steps):
        # Try all possible moves for current states
        expanded_states = torch.cat([
            torch.gather(current_states, 1, move.repeat(current_states.size(0), 1))
            for move in all_moves
        ])
        expanded_hashes = torch.sum(hash_vec * expanded_states, dim=1)
        
        # reshape for per-walk processing
        expanded_hashes = expanded_hashes.view(len(generators), -1)
        expanded_states = expanded_states.view(len(generators), -1, state_size)
        
        # select valid moves for each walk
        new_states = []
        active_walks = []
        
        for walk_idx in range(current_states.size(0)):
            # find moves that lead to unseen states
            walk_hashes = expanded_hashes[:, walk_idx]
            valid_moves = [
                i for i in range(len(generators))
                if walk_hashes[i].item() not in seen_hashes[walk_idx]
            ]
            
            if valid_moves:
                # randomly choose one of the valid moves
                chosen_move = random.choice(valid_moves)
                new_state = expanded_states[chosen_move, walk_idx]
                new_hash = walk_hashes[chosen_move].item()
                
                new_states.append(new_state)
                seen_hashes[walk_idx].add(new_hash)
                active_walks.append(walk_idx)
        
        if not new_states:
            break
            
        # update current states
        current_states = torch.stack(new_states)
        valid_states.append(current_states.clone())
        valid_lengths.extend([step] * len(new_states))
        
        if len(active_walks) == 0:
            break
    
    # combine all valid states and their lengths
    X = torch.cat(valid_states)
    y = torch.tensor(valid_lengths, device=device, dtype=torch.long)
    
    return X, y

**Example for first_visit_random_walks:**

In [6]:
n_steps=5
n_walks=3
list_generators = create_lrx_moves(5)
X, y = first_visit_random_walks(generators=list_generators, n_steps=n_steps, n_walks=n_walks)

print("Generated States (X):")
print(f"Shape: {tuple(X.shape)}")
print(X)

print("\nStep Indices (y):")
print(y)

Generated States (X):
Shape: (15, 5)
tensor([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [1, 0, 2, 3, 4],
        [4, 0, 1, 2, 3],
        [1, 0, 2, 3, 4],
        [0, 2, 3, 4, 1],
        [3, 4, 0, 1, 2],
        [0, 1, 2, 3, 4],
        [2, 3, 4, 1, 0],
        [4, 3, 0, 1, 2],
        [1, 0, 2, 3, 4],
        [3, 4, 1, 0, 2],
        [2, 4, 3, 0, 1],
        [4, 1, 0, 2, 3]], device='cuda:0')

Step Indices (y):
tensor([0, 0, 0, 1, 1, 1, 2, 2, 0, 3, 3, 1, 4, 4, 4], device='cuda:0')


**Example for non-backtracking (nbt)_random_walks:**

In [7]:
n_steps=5
n_walks=3
list_generators = create_lrx_moves(5)
X, y = nbt_random_walks(generators=list_generators, n_steps=n_steps, n_walks=n_walks)

print("Generated States (X):")
print(f"Shape: {tuple(X.shape)}")
print(X)

print("\nStep Indices (y):")
print(y)

Generated States (X):
Shape: (15, 5)
tensor([[0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [0, 1, 2, 3, 4],
        [1, 0, 2, 3, 4],
        [1, 0, 2, 3, 4],
        [1, 2, 3, 4, 0],
        [4, 1, 0, 2, 3],
        [4, 1, 0, 2, 3],
        [2, 3, 4, 0, 1],
        [3, 4, 1, 0, 2],
        [3, 4, 1, 0, 2],
        [3, 2, 4, 0, 1],
        [2, 3, 4, 1, 0],
        [4, 3, 1, 0, 2],
        [2, 4, 0, 1, 3]], device='cuda:0')

Step Indices (y):
tensor([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], device='cuda:0')


<div style="
    margin-bottom: 20px;
    border: 1px solid #DDDDDD;
    border-radius: 8px;
    box-shadow: 0 4px 8px rgba(0,0,0,0.1);
    overflow: hidden;">
    <div style="background-color: #004d40; padding: 15px;">
        <h1 id="modeling" style="margin: 0; color: #ffffff;">Modeling</h1>
    </div>
    <div style="background-color: #FFFFFF; padding: 20px 30px;">
        <p style="color: #333333;">
            Gradient boosting models (either XGBoost or CatBoost) are used to estimate the "distance" from any given permutation to the target (sorted) state.
        </p>
        <p style="color: #333333;">
            To train the model, examples are generated by performing random walks starting from the sorted state. Each training example consists of a pair <code>(X, y)</code>, where <code>X</code> represents a permutation obtained after a series of random moves, and <code>y</code> is the step count at which that permutation was first encountered.
        </p>
        <p style="color: #333333;">
            During inference, the trained model estimates the distance to the sorted state for any given permutation. These predictions are then used to guide a beam search algorithm.
        </p>
    </div>
</div>


In [8]:
class CatBoostModel:
    """CatBoost implementation for permutation solving"""
    
    def __init__(
        self,
        n_estimators: int = 2000,
        learning_rate: float = 0.05,
        device: torch.device=DEVICE
    ):
        self.model = CatBoostRegressor(
            n_estimators=n_estimators,
            learning_rate=learning_rate,
            task_type="GPU" if device.type == "cuda" else "CPU",
            thread_count=-1,
            verbose=0
        )
    
    def train(self, X: torch.Tensor, y: torch.Tensor) -> None:
        """Train the model on the given data"""
        self.model.fit(X.cpu().numpy(), y.cpu().numpy())
    
    def predict(self, X: torch.Tensor) -> torch.Tensor:
        """Predict distances for given states"""
        predictions = self.model.predict(X.cpu().numpy())
        return torch.tensor(predictions, device=X.device)

class XGBoostModel:
    """XGBoost implementation for permutation solving"""
    
    def __init__(
        self,
        n_estimators: int = 2000,
        learning_rate: float = 0.07,
        verbose: bool = False,
        device: torch.device=DEVICE
    ):
        self.model = XGBRegressor(
            n_estimators=n_estimators,
            learning_rate=learning_rate,
            tree_method='hist',
            n_jobs=-1,
            device='cuda' if device.type == "cuda" else "CPU",
            verbosity=0
        )
    
    def train(self, X: torch.Tensor, y: torch.Tensor) -> None:
        """Train the model on the given data"""
        self.model.fit(X.cpu().numpy(), y.cpu().numpy())
    
    def predict(self, X: torch.Tensor) -> torch.Tensor:
        """Predict distances for given states"""
        predictions = self.model.predict(X.cpu().numpy())
        return torch.tensor(predictions, device=X.device)

In [9]:
model_classes = {
    "xgboost": XGBoostModel,
    "catboost": CatBoostModel
}

<div style="
    margin-bottom: 20px;
    border: 1px solid #DDDDDD;
    border-radius: 8px;
    box-shadow: 0 4px 8px rgba(0,0,0,0.1);
    overflow: hidden;">
  <div style="background-color: #004d40; padding: 15px;">
    <h1 id="beam_search" style="margin: 0; color: #ffffff;">Beam Search with Model Guidance</h1>
  </div>
  <div style="background-color: #FFFFFF; padding: 20px 30px;">
    <p style="color: #333333; margin-top: 0;">
      To find a solution path from any scrambled permutation to the target state (a sorted sequence), we employ a beam search algorithm enhanced with machine learning heuristics. Our approach combines the systematic exploration of beam search with informed decisions from a trained model to prioritize promising states. Here’s how it works:
    </p>
    <ul style="list-style: circle; margin-left: 20px;">
      <li>
        The beam search maintains a limited set of the most promising states at each step, controlled by the <code>beam_width</code> parameter.
      </li>
      <li>
        At each search step:
        <ul style="list-style: none; margin-left: 0px;">
          <li>- All possible moves (X, L, and R) are applied to each current state.</li>
          <li>- States that have been visited before are discarded, and the algorithm checks if any new state matches the target (sorted) state.</li>
          <li>- If the number of new states exceeds the beam width, a trained model is used to evaluate and prioritize states based on their predicted distance to the target.</li>
          <li>- Once a solution is found, the complete path of moves leading to it is reconstructed.</li>
        </ul>
      </li>
    </ul>
    <p style="color: #333333;">
      By combining a learned distance function (from XGBoost or CatBoost) with beam search, this method can handle permutations of various sizes much more efficiently than a naive exhaustive search.
    </p>
  </div>
</div>


In [10]:
class BeamSearchSolver:
    def __init__(
        self,
        state_size: int,
        beam_width: int,
        max_steps: int,
        use_x_rule: bool = False,
        target_neighborhood_radius: int = 0,
        filter_batch_size: int = 100000,
        predict_batch_size: int = 10000000000,
        history_window_size: int = 5,  # Store only last N steps of hash history
        verbose: bool = False,
        device: torch.device=DEVICE
    ):
        """Initialize beam search solver."""
        self.state_size = state_size
        self.device = device
        self.beam_width = beam_width
        self.max_steps = max_steps
        self.use_x_rule = use_x_rule
        self.target_neighborhood_radius = target_neighborhood_radius
        self.filter_batch_size = filter_batch_size
        self.predict_batch_size = predict_batch_size
        self.history_window_size = history_window_size
        self.verbose = verbose
        
        self.move_names = ['X', 'L', 'R']  # Keep for logging only
        self.MOVE_X = 0
        self.MOVE_L = 1
        self.MOVE_R = 2

        # Pre-compute indices for state transformations in _bulk_state_transform
        self.idx_x = torch.tensor([1, 0] + list(range(2, state_size)), device=device)
        self.idx_l = torch.roll(torch.arange(state_size, device=device), -1)
        self.idx_r = torch.roll(torch.arange(state_size, device=device), 1)
        
        # Pre-compute hash vector for efficient state hashing
        max_int = int(2**62)
        self.hash_vec = torch.randint(
            low=-max_int,
            high=max_int + 1,
            size=(self.state_size,),
            dtype=torch.int64,
            device=device
        ).contiguous()
        
        # Get solved state as the sorted permutation
        self.solved_state = torch.arange(self.state_size, device=device, dtype=torch.long)
        
        # Configure logging: DEBUG if verbose=True, else INFO
        self.logger = logging.getLogger(self.__class__.__name__)
        self.logger.setLevel(logging.DEBUG if verbose else logging.INFO)
        if not self.logger.handlers:
            handler = logging.StreamHandler()
            handler.setFormatter(logging.Formatter(
                '%(asctime)s - %(message)s'
            ))
            self.logger.addHandler(handler)

        # Precompute target neighborhood if radius > 0
        if target_neighborhood_radius > 0:
            self.target_neighborhood, self.target_paths = self._get_target_neighborhood(
                target_neighborhood_radius
            )
            self.log_info(
                f"Got target neighborhood with {len(self.target_neighborhood)} states"
            )
        else:
            self.target_neighborhood = None
            self.target_paths = None

    def reset(self) -> None:
        """Reset solver state for a new problem."""
        self.hash_history = []  # List of tensors, one per step
        self.total_hashes_seen = 0
        
        if hasattr(self, 'model'):
            del self.model
        torch.cuda.empty_cache()

    def solve(self, start_state: torch.Tensor, model) -> Tuple[bool, int, str]:
        """
        Perform beam search using ML guidance. For each step:
        1. Expand states with all possible moves
        2. Check if solution is found
        3. Filter states by hash and rules (in batches)
        4. Prune beam to keep only top beam_width states
        """
        self.reset()
        
        self.model = model
        self.start_state = start_state.clone()
        
        # Initialize with start state
        start_hash = self._compute_state_hashes(self.start_state.unsqueeze(0))
        
        # Add initial hash to history
        self.hash_history.append(start_hash)
        self.total_hashes_seen += 1
        
        current_states = self.start_state.unsqueeze(0)
        self.log_debug(f"Initial state: {start_state.cpu().numpy()}")
        
        parent_indices = []
        move_indices = []
        search_start = time()
        
        for step in range(1, self.max_steps + 1):
            self.log_debug(f"\n{'='*10} Step {step} {'='*10}")
            
            # 1. Expand all states at once
            next_states, next_moves = self._bulk_state_transform(current_states)
            parents = torch.arange(len(current_states), device=self.device).repeat(3)

            # Find unique states in this expansion to avoid duplicates
            unique_states, unique_indices = self._get_unique_states(next_states)
            if unique_states.size(0) < next_states.size(0):
                self.log_debug(
                    f"Removed {next_states.size(0) - unique_states.size(0)} "
                    "duplicate states in expansion"
                )
                
                next_states = unique_states
                parents = parents[unique_indices]
                next_moves = next_moves[unique_indices]

                # Free memory
                del unique_states, unique_indices
                torch.cuda.empty_cache()
            
            # 2. Check for solution
            found, solution_idx, target_path = self._check_solution(next_states)
            
            if found:
                parent_indices.append(parents)
                move_indices.append(next_moves)
                additional_steps = target_path.numel() if target_path is not None else 0
                
                if additional_steps > 0:
                    self.log_info(f"Solution found in target neighborhood after {step} steps")
                    self.log_debug(f"Additional {additional_steps} steps to reach solved state")

                self.log_info(f"Time taken: {time() - search_start}")
                return True, step + additional_steps, self.reconstruct_solution(
                    parent_indices, move_indices, solution_idx, target_path
                )
            
            # 3. Filter states in batches
            all_filtered_states = []
            all_filtered_parents = []
            all_filtered_moves = []
            all_new_hashes = []
            
            for batch_start in range(0, len(next_states), self.filter_batch_size):
                batch_end = min(batch_start + self.filter_batch_size, len(next_states))
                
                batch_states = next_states[batch_start:batch_end]
                batch_parents = parents[batch_start:batch_end]
                batch_moves = next_moves[batch_start:batch_end]
                
                filtered_states, filtered_parents, filtered_moves, new_hashes = self._filter_states(
                    next_states[batch_start:batch_end],
                    next_moves[batch_start:batch_end],
                    parents[batch_start:batch_end],
                    current_states
                )
                
                if filtered_states.shape[0] > 0:
                    all_filtered_states.append(filtered_states)
                    all_filtered_parents.append(filtered_parents)
                    all_filtered_moves.append(filtered_moves)
                    if new_hashes.numel() > 0:
                        all_new_hashes.append(new_hashes)

            # Update hash history with new hashes
            if all_new_hashes:
                step_hashes = torch.cat(all_new_hashes)
                self.hash_history.append(step_hashes)
                self.total_hashes_seen += step_hashes.numel()
                
                # Maintain sliding window of history
                if len(self.hash_history) > self.history_window_size:
                    self.hash_history.pop(0)  # Remove oldest step's hashes
            
            # Free memory
            del next_states, next_moves, parents
            torch.cuda.empty_cache()

            # Combine filtered results
            if all_filtered_states:
                next_states = torch.cat(all_filtered_states)
                parents = torch.cat(all_filtered_parents)
                next_moves = torch.cat(all_filtered_moves)

                # Free memory
                del all_filtered_states, all_filtered_parents, all_filtered_moves
                torch.cuda.empty_cache()
            else:
                next_states = torch.empty((0, self.state_size), device=self.device)
                parents = torch.empty(0, device=self.device, dtype=torch.int32)
                next_moves = torch.empty(0, device=self.device, dtype=torch.int8)

            # 4. Prune beam if needed
            if next_states.shape[0] > self.beam_width:
                current_states, parents, next_moves = self._prune_beam(
                    next_states, parents, next_moves, current_states
                )
            else:
                current_states = next_states
            
            parent_indices.append(parents)
            move_indices.append(next_moves)
            
            self.log_debug(f"End of step:")
            self.log_debug(f"  Number of states: {current_states.shape[0]}")
            self.log_debug(f"  Current hash history size: {sum(h.numel() for h in self.hash_history)}")
        
        self.log_info(f"\nSearch terminated after reaching max steps: {self.max_steps}")
        self.log_info(f"Time taken: {time() - search_start}")
        return False, self.max_steps, ""

    def _get_unique_states(self, states: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Return only unique states from input tensor by using efficient hashing.
        Much faster than torch.unique(states, dim=0).
        
        Returns:
            Tuple of (unique_states, unique_indices)
            - unique_states: Tensor containing only unique states
            - unique_indices: Indices of the first occurrence of each unique state
        """
        if states.size(0) == 0:
            return (
                states,
                torch.empty(0, dtype=torch.long, device=self.device),
                torch.empty(0, dtype=torch.long, device=self.device)
            )
        
        hashes = self._compute_state_hashes(states)
        
        # Sort hashes to identify unique elements
        sorted_hashes, sort_indices = torch.sort(hashes)
        
        # Create mask for unique elements (first element and elements different from previous)
        mask = torch.ones(sorted_hashes.size(0), dtype=torch.bool, device=self.device)
        mask[1:] = sorted_hashes[1:] != sorted_hashes[:-1]
        
        # Get indices of unique elements in original tensor
        unique_indices = sort_indices[mask]
        
        return states[unique_indices], unique_indices
        
    def _filter_states(
        self,
        next_states: torch.Tensor,
        next_moves: torch.Tensor,
        parents: torch.Tensor,
        current_states: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor]:
        """
        Filter states based on hash (already visited states) or rules (X rule, etc.).
        """
        # Calculate hashes for this batch
        state_hashes = self._compute_state_hashes(next_states)
        
        # Check against all hashes in our sliding window efficiently
        is_new = torch.ones(state_hashes.size(0), dtype=torch.bool, device=self.device)
        for history in self.hash_history:
            is_new &= ~torch.isin(state_hashes, history)

        # Extract new hashes (to be returned to solver for addition to history)
        new_hashes = state_hashes[is_new] if torch.any(is_new) else torch.empty(
            0, dtype=torch.int64, device=self.device
        )

        valid_moves = is_new

        # Filter states based on rules (X rule, etc.)
        if self.use_x_rule:
            is_x_move = next_moves == 0
            
            parent_indices = parents % current_states.size(0)
            first_vals = torch.gather(current_states[:, 0], 0, parent_indices)
            second_vals = torch.gather(current_states[:, 1], 0, parent_indices)
            first_smaller = first_vals < second_vals
            
            valid_moves &= ~(is_x_move & first_smaller)
            
        # Log filtering details if verbose
        if self.verbose:
            self._log_move_filtering(
                current_states, next_states, parents, 
                next_moves, valid=valid_moves, is_visited=~is_new
            )

        next_states = next_states[valid_moves]
        next_states = next_states.clone().contiguous()
        
        return next_states, parents[valid_moves], next_moves[valid_moves], new_hashes
    
    def _prune_beam(
        self,
        next_states: torch.Tensor,
        parents: torch.Tensor,
        moves: torch.Tensor,
        current_states: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
        """
        Prune beam to keep only top states according to model predictions.
        Returns a tuple of (pruned_states, pruned_parents, pruned_moves).
        """
        num_states = next_states.shape[0]
        chunk_size = min(self.predict_batch_size, num_states)      
        
        model_distances = torch.empty(
            num_states, device=next_states.device, dtype=torch.float32
        )
        
        with torch.amp.autocast('cuda'):
            for i in range(0, num_states, chunk_size):
                chunk = next_states[i:i + chunk_size]
                distances_chunk = self.model.predict(chunk)
                model_distances[i:i + chunk_size] = distances_chunk
        
        top_indices = torch.topk(model_distances, k=self.beam_width, largest=False).indices
        
        torch.cuda.empty_cache()
        
        # Debug logging
        if self.verbose:
            self._log_pruning_decisions(
                next_states, parents, moves, model_distances, 
                top_indices, current_states
            )
        
        return next_states[top_indices], parents[top_indices], moves[top_indices]

    def reconstruct_solution(
        self,
        parent_indices: List[torch.Tensor],
        move_indices: List[torch.Tensor],
        solution_idx: int,
        target_path: torch.Tensor = None
    ) -> str:
        """Reconstruct solution path from stored moves and verify it."""
        # Get moves in reverse order
        reverse_moves = []
        current_idx = solution_idx
        
        # Get the last move
        reverse_moves.append(move_indices[-1][current_idx].item())
        current_idx = parent_indices[-1][current_idx].item()
        
        # Get previous moves
        for step in range(len(parent_indices)-1, 0, -1):
            reverse_moves.append(move_indices[step-1][current_idx].item())
            current_idx = parent_indices[step-1][current_idx].item()
        
        # Convert to tensor and reverse to get chronological order
        moves = torch.tensor(reverse_moves[::-1], dtype=torch.int8, device=self.device)
        
        # If we found a state in target neighborhood, append its path
        if target_path is not None and target_path.numel() > 0:
            moves = torch.cat([moves, target_path])
        
        # Verify solution
        if self.verbose:
            current_state = self.start_state.clone()
            self.log_debug("\nVerifying solution:")
            self.log_debug(f"Start state: {current_state}")
            
            for i, move in enumerate(moves):
                move_name = self.move_names[move.item()]
                current_state = self._apply_moves(current_state, move)
                self.log_debug(f"After move {i+1} ({move_name}): {current_state}")
            
            if torch.all(current_state == self.solved_state):
                self.log_debug("Solution verified!")
            else:
                self.log_debug(
                    f"Invalid solution! Final state {current_state} != {self.solved_state}"
                )
        
        # Convert moves to string representation
        return '.'.join(self.move_names[m.item()] for m in moves)

    def _apply_moves(self, state: torch.Tensor, move: int) -> torch.Tensor:
        """Apply a single move to a state."""
        new_state = state.clone()
        
        if move == 0:  # X
            new_state[[0, 1]] = new_state[[1, 0]]
        elif move == 1:  # L
            new_state = torch.roll(new_state, shifts=-1)
        elif move == 2:  # R
            new_state = torch.roll(new_state, shifts=1)
    
        return new_state
 
    def _get_target_neighborhood(
        self, radius: int
    ) -> Tuple[torch.Tensor, Dict[int, Tuple[torch.Tensor, torch.Tensor]]]:
        """Precompute states within given radius of target state using BFS."""
        initial_state = self.solved_state.unsqueeze(0)
        initial_hash = self._compute_state_hashes(initial_state).item()
        
        # Track states and their paths TO solved state
        states_dict = {initial_hash: (initial_state[0], [])}
        frontier = [(initial_state[0], [])]
        
        self.log_debug(f"Starting BFS from solved state {initial_state[0]}")
            
        for depth in range(radius):
            if not frontier:
                break
            
            current_states = torch.stack([state for state, _ in frontier])
            current_paths = [path for _, path in frontier]
            
            # Generate all possible next states
            next_states, next_moves = self._bulk_state_transform(current_states)
            
            # Process states
            next_frontier = []
            for i, (state, move) in enumerate(zip(next_states, next_moves)):
                parent_idx = i % len(current_states)
                hash_val = self._compute_state_hashes(state.unsqueeze(0)).item()
                
                if hash_val not in states_dict:
                    # Create path TO solved state by prepending inverse move to parent's path
                    inverse_move = self._get_inverse_move(move.item())
                    new_path = [inverse_move] + current_paths[parent_idx].copy()
                    states_dict[hash_val] = (state, new_path)
                    next_frontier.append((state, new_path))
            
            frontier = next_frontier
            self.log_debug(
                f"Target neighborhood depth {depth + 1}: {len(states_dict)} states"
            )                
        
        # Convert paths to tensors
        final_states_dict = {}
        for hash_val, (state, path) in states_dict.items():
            path_tensor = torch.tensor(path, dtype=torch.int8, device=self.device)
            final_states_dict[hash_val] = (state, path_tensor)
        
        hashes = torch.tensor(
            list(final_states_dict.keys()), dtype=torch.int64, device=self.device
        )
        return hashes, final_states_dict

    def _get_inverse_move(self, move: int) -> int:
        """Get the inverse of a move (X->X, L->R, R->L)."""
        if move == 0:  # X is self-inverse
            return 0
        elif move == 1:  # L inverse is R
            return 2
        else:  # R inverse is L
            return 1

    def _check_solution(self, states: torch.Tensor) -> Tuple[bool, int, torch.Tensor]:
        """Check if any states match solution criteria."""
        if self.target_neighborhood is not None:
            # Check if any states are in target neighborhood using hash intersection
            state_hashes = self._compute_state_hashes(states)
            matches = torch.isin(state_hashes, self.target_neighborhood)
            found = torch.any(matches)
            
            if found:
                idx = torch.where(matches)[0][0].item()
                hash_val = state_hashes[idx].item()
                stored_state, stored_path = self.target_paths[hash_val]
                
                # Verify the state matches (in case of hash collision)
                if not torch.all(states[idx] == stored_state):
                    # Hash collision - search for actual matching state
                    for h, (s, p) in self.target_paths.items():
                        if torch.all(states[idx] == s):
                            return True, idx, p
                    return False, -1, torch.empty(0, dtype=torch.int8, device=self.device)
                
                return True, idx, stored_path
            
            return False, -1, torch.empty(0, dtype=torch.int8, device=self.device)
        else:
            # Check for exact match with solved state
            matches = torch.all(states == self.solved_state, dim=1)
            found = torch.any(matches)
            idx = torch.where(matches)[0][0].item() if found else -1
            return found, idx, torch.empty(0, dtype=torch.int8, device=self.device)

    def _bulk_state_transform(
           self, states: torch.Tensor
       ) -> Tuple[torch.Tensor, torch.Tensor]:
        """
        Apply all possible moves to states efficiently,
        return expanded states and move types.
        """
        n_states = states.size(0)
        
        # Pre-allocate result tensor and copy states
        result = states.repeat(3, 1)  # [N*3, state_size]
        
        # Apply moves using pre-computed indices
        result[:n_states] = result[:n_states].index_select(1, self.idx_x)  # X moves
        result[n_states:2*n_states] = result[n_states:2*n_states].index_select(1, self.idx_l)  # L moves
        result[2*n_states:] = result[2*n_states:].index_select(1, self.idx_r)  # R moves
        
        # Generate move types (0=X, 1=L, 2=R)
        move_types = torch.arange(3, device=states.device).repeat_interleave(n_states)
        
        return result.contiguous(), move_types

    def _compute_state_hashes(self, states: torch.Tensor) -> torch.Tensor:
        """Compute unique hashes for states using vectorized operations."""
        states_int = states.contiguous().to(dtype=torch.int64)
        return torch.sum(states_int * self.hash_vec.unsqueeze(0), dim=1)
    
    def _log_move_filtering(
        self,
        current_states: torch.Tensor,
        new_states: torch.Tensor,
        parents: torch.Tensor,
        moves: torch.Tensor,
        valid: torch.Tensor,
        is_visited: torch.Tensor
    ) -> None:
        """Log detailed move filtering information."""
        self.log_debug(f"Filtering {len(new_states)} new states:")
        for j in range(len(new_states)):
            parent_idx = parents[j].item()
            parent_state = current_states[parent_idx % len(current_states)]
            move_code = moves[j].item()
            move_name = self.move_names[move_code]
            new_state = new_states[j]
            
            if not valid[j]:
                reason = "visited state" if is_visited[j] else "X rule"
                self.log_debug(
                    f"From parent {parent_idx} {parent_state.cpu().numpy()} → "
                    f"Move {move_name} → {new_state.cpu().numpy()} ❌ - {reason}"
                )
            else:
                self.log_debug(
                    f"From parent {parent_idx} {parent_state.cpu().numpy()} → "
                    f"Move {move_name} → {new_state.cpu().numpy()} ✅"
                )

    def _log_pruning_decisions(
        self,
        next_states: torch.Tensor,
        parents: torch.Tensor,
        moves: torch.Tensor,
        model_distances: torch.Tensor,
        top_indices: torch.Tensor,
        current_states: torch.Tensor
    ) -> None:
        """Log detailed information about pruning decisions efficiently."""
        with torch.no_grad():
            # Get min/max distances
            min_dist, max_dist = model_distances.min().item(), model_distances.max().item()
            
            # Convert indices to set
            kept_indices = top_indices.cpu().numpy()
            kept_set = set(kept_indices)
            
            self.log_debug("\nPruning decisions:")
            self.log_debug(f"Total states: {len(next_states)}, Keeping: {len(kept_indices)}")
            self.log_debug(f"Distance range: {min_dist:.1f} to {max_dist:.1f}")
            
            next_states_np = next_states.cpu().numpy()
            parents_np = parents.cpu().numpy()
            moves_np = moves.cpu().numpy()
            distances_np = model_distances.cpu().numpy()
            current_states_np = current_states.cpu().numpy()
    
            # Log details for each state
            for idx in range(len(next_states)):
                parent_idx = parents_np[idx]
                move_type = moves_np[idx]
                move = self.move_names[move_type]
                parent_state = current_states_np[parent_idx % len(current_states_np)]
                new_state = next_states_np[idx]
                pred = distances_np[idx]
                kept = idx in kept_set
    
                self.log_debug(
                    f"From parent {parent_idx} {parent_state} → "
                    f"Move {move} → {new_state} "
                    f"pred {pred:.1f} {'✅' if kept else '❌'}"
                )
                
    def log_debug(self, message: str) -> None:
        """Log warning information based on verbosity level."""
        self.logger.debug(message)

    def log_info(self, message: str) -> None:
        """Log information based on verbosity level."""
        self.logger.info(message)

**Example run with processing info in the logs**

In [11]:
state_size = 5
max_steps = 100
model_name = "xgboost"

# Initialize solver
solver = BeamSearchSolver(
    state_size=state_size,
    beam_width=5,
    max_steps=max_steps,
    use_x_rule=False,
    target_neighborhood_radius=2,
    filter_batch_size = 50_000,
    predict_batch_size = 100_000,
    history_window_size=100,
    verbose=True
)

# Train model
model = model_classes[model_name]()
generators = create_lrx_moves(state_size)
X, y = nbt_random_walks(generators, n_steps = 30, n_walks=10000)
model.train(X, y)

# Solve the test permutaion
test_state = torch.arange(state_size - 1, -1, -1, device=DEVICE)
found, steps, solution_path = solver.solve(test_state, model)

# Output results
print("\n=== SOLVER RESULTS ===")
print(f"Solution Found: {found}")
print(f"Steps Taken: {steps}")
print(f"Solution Path: {solution_path if found else 'No solution'}")

2025-03-09 11:16:37,374 - Starting BFS from solved state tensor([0, 1, 2, 3, 4], device='cuda:0')
2025-03-09 11:16:37,480 - Target neighborhood depth 1: 4 states
2025-03-09 11:16:37,482 - Target neighborhood depth 2: 10 states
2025-03-09 11:16:37,484 - Got target neighborhood with 10 states
2025-03-09 11:17:01,923 - Initial state: [4 3 2 1 0]
2025-03-09 11:17:01,924 - 
2025-03-09 11:17:02,086 - Filtering 3 new states:
2025-03-09 11:17:02,088 - From parent 0 [4 3 2 1 0] → Move X → [3 4 2 1 0] ✅
2025-03-09 11:17:02,090 - From parent 0 [4 3 2 1 0] → Move L → [3 2 1 0 4] ✅
2025-03-09 11:17:02,091 - From parent 0 [4 3 2 1 0] → Move R → [0 4 3 2 1] ✅
2025-03-09 11:17:02,093 - End of step:
2025-03-09 11:17:02,093 -   Number of states: 3
2025-03-09 11:17:02,094 -   Current hash history size: 4
2025-03-09 11:17:02,094 - 
2025-03-09 11:17:02,097 - Removed 2 duplicate states in expansion
2025-03-09 11:17:02,099 - Filtering 7 new states:
2025-03-09 11:17:02,100 - From parent 2 [0 4 3 2 1] → Move X


=== SOLVER RESULTS ===
Solution Found: True
Steps Taken: 8
Solution Path: X.L.X.R.X.R.R.X


<div style="
    margin-bottom: 20px;
    border: 1px solid #DDDDDD;
    border-radius: 8px;
    box-shadow: 0 4px 8px rgba(0,0,0,0.1);
    overflow: hidden;">
  <div style="background-color: #004d40; padding: 15px;">
    <h1 id="experiments" style="margin: 0; color: #ffffff;">Run experiments</h1>
  </div>
</div>

In [12]:
test_path = 'test.csv'
test_df = pd.read_csv(test_path)

n_min = 25
n_max = 25
test_df = test_df[(test_df["n"] >= n_min) & (test_df["n"] <= n_max)]
test_df

Unnamed: 0,n,permutation
23,25,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,..."


In [13]:
# set experiment parameters
N_RUNS = 20
BEAM_WIDTH = 2**16
rw_type = "nbt"
model_name = "catboost"
history_window_sizes = [32,600]
use_x_rule=False
target_neighborhood_radius=15
verbose = False

In [14]:
script_start = time()

results = []
for _, row in test_df.iterrows():
    size = row["n"]
    print(f"\nProcessing state size: {size}")

    conj_steps = int(size * (size - 1) / 2) 
    MAX_STEPS = int(conj_steps * 2)
    print(f"MAX_STEPS: {MAX_STEPS}")

    perm = torch.tensor(
        [int(x) for x in row['permutation'].split(',')],
        device=DEVICE
    )
    if rw_type == "nbt":
        rw_fun = nbt_random_walks
    elif rw_type == "first_visit":
        rw_fun = first_visit_random_walks
    
    for history_window_size in history_window_sizes:
        print(f"History window size: {history_window_size}")
        
        for run in range(1, N_RUNS + 1):
            torch.cuda.empty_cache()
            print(f"\nRun {run} for n={size}")
        
            generators = create_lrx_moves(size)
            model = model_classes[model_name]()
            X, y = rw_fun(generators, n_steps = conj_steps, n_walks=10000)
            start_time = time()
            model.train(X, y)
            train_time = time() - start_time
            torch.cuda.empty_cache()
            print(
                f"Training {model_name} on data generated by {rw_fun.__name__} "
                f"with n_steps={conj_steps}, n_walks=10000 took {train_time:.2f}s"
            )
            
            # Initialize solver
            solver = BeamSearchSolver(
                state_size=size, 
                beam_width=BEAM_WIDTH,
                max_steps=MAX_STEPS,
                use_x_rule=use_x_rule,
                target_neighborhood_radius=target_neighborhood_radius,
                filter_batch_size = 50_000,
                predict_batch_size = 100_000,
                history_window_size = history_window_size,
                verbose=verbose
            )           
            
            solve_start = time()
            found, steps, solution = solver.solve(start_state=perm, model=model)
            solve_time = time() - solve_start
            
            results.append({
                'random_walks': rw_type,
                'model': model_name,
                'permutation': row['permutation'],
                'n': size,
                'beam_width': BEAM_WIDTH,
                'history_window_size': history_window_size,
                'max_steps': MAX_STEPS,
                'run': run,
                'success': found,
                'solution': solution if found else "Not found",
                'steps': steps,
                'train_time': train_time,
                'solve_time': solve_time
            })
            
            print(
                f"Success: {found}, Steps: {steps}, "
                f"Time: {solve_time:.2f}s"
            )
    
# Save results
results_df = pd.DataFrame(results)
results_df.to_csv('solutions.csv', index=False)

total_time = time() - script_start
print(f"\nTotal execution time: {total_time:.2f} seconds")
print(f"Solutions saved to solutions.csv")


Processing state size: 25
MAX_STEPS: 600
History window size: 32

Run 1 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.06s


2025-03-09 11:21:34,462 - Got target neighborhood with 47747 states
2025-03-09 11:26:12,640 - Solution found in target neighborhood after 316 steps
2025-03-09 11:26:12,641 - Time taken: 278.1759066581726


Success: True, Steps: 331, Time: 278.19s

Run 2 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.86s


2025-03-09 11:30:45,860 - Got target neighborhood with 47747 states
2025-03-09 11:36:52,004 - Solution found in target neighborhood after 405 steps
2025-03-09 11:36:52,005 - Time taken: 366.0541341304779


Success: True, Steps: 420, Time: 366.09s

Run 3 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.01s


2025-03-09 11:41:22,230 - Got target neighborhood with 47747 states
2025-03-09 11:46:04,975 - Solution found in target neighborhood after 327 steps
2025-03-09 11:46:04,976 - Time taken: 282.6589753627777


Success: True, Steps: 342, Time: 282.68s

Run 4 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.10s


2025-03-09 11:50:35,963 - Got target neighborhood with 47747 states
2025-03-09 11:56:45,185 - Solution found in target neighborhood after 400 steps
2025-03-09 11:56:45,186 - Time taken: 369.10987305641174


Success: True, Steps: 415, Time: 369.16s

Run 5 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.93s


2025-03-09 12:01:15,564 - Got target neighborhood with 47747 states
2025-03-09 12:07:57,847 - Solution found in target neighborhood after 465 steps
2025-03-09 12:07:57,848 - Time taken: 402.19194436073303


Success: True, Steps: 480, Time: 402.23s

Run 6 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.91s


2025-03-09 12:12:27,928 - Got target neighborhood with 47747 states
2025-03-09 12:16:46,661 - Solution found in target neighborhood after 298 steps
2025-03-09 12:16:46,663 - Time taken: 258.6464102268219


Success: True, Steps: 313, Time: 258.67s

Run 7 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.82s


2025-03-09 12:21:16,679 - Got target neighborhood with 47747 states
2025-03-09 12:25:34,927 - Solution found in target neighborhood after 297 steps
2025-03-09 12:25:34,928 - Time taken: 258.13702416419983


Success: True, Steps: 312, Time: 258.18s

Run 8 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.76s


2025-03-09 12:30:04,767 - Got target neighborhood with 47747 states
2025-03-09 12:34:58,845 - Solution found in target neighborhood after 329 steps
2025-03-09 12:34:58,845 - Time taken: 293.9346363544464


Success: True, Steps: 344, Time: 293.99s

Run 9 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.73s


2025-03-09 12:39:27,803 - Got target neighborhood with 47747 states
2025-03-09 12:47:59,756 - 
Search terminated after reaching max steps: 600
2025-03-09 12:47:59,757 - Time taken: 511.85149788856506


Success: False, Steps: 600, Time: 511.88s

Run 10 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.71s


2025-03-09 12:52:29,787 - Got target neighborhood with 47747 states
2025-03-09 12:57:32,281 - Solution found in target neighborhood after 335 steps
2025-03-09 12:57:32,282 - Time taken: 302.40407490730286


Success: True, Steps: 350, Time: 302.43s

Run 11 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.11s


2025-03-09 13:02:03,489 - Got target neighborhood with 47747 states
2025-03-09 13:11:04,186 - 
Search terminated after reaching max steps: 600
2025-03-09 13:11:04,187 - Time taken: 540.5693790912628


Success: False, Steps: 600, Time: 540.64s

Run 12 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.46s


2025-03-09 13:15:34,031 - Got target neighborhood with 47747 states
2025-03-09 13:24:41,217 - 
Search terminated after reaching max steps: 600
2025-03-09 13:24:41,218 - Time taken: 547.0926358699799


Success: False, Steps: 600, Time: 547.11s

Run 13 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.41s


2025-03-09 13:29:11,609 - Got target neighborhood with 47747 states
2025-03-09 13:34:47,443 - Solution found in target neighborhood after 390 steps
2025-03-09 13:34:47,444 - Time taken: 335.61657428741455


Success: True, Steps: 405, Time: 335.77s

Run 14 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.69s


2025-03-09 13:39:16,965 - Got target neighborhood with 47747 states
2025-03-09 13:45:38,398 - Solution found in target neighborhood after 418 steps
2025-03-09 13:45:38,398 - Time taken: 381.33242535591125


Success: True, Steps: 433, Time: 381.36s

Run 15 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.79s


2025-03-09 13:50:09,010 - Got target neighborhood with 47747 states
2025-03-09 13:54:14,694 - Solution found in target neighborhood after 293 steps
2025-03-09 13:54:14,695 - Time taken: 245.56870937347412


Success: True, Steps: 308, Time: 245.62s

Run 16 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.02s


2025-03-09 13:58:45,114 - Got target neighborhood with 47747 states
2025-03-09 14:04:12,693 - Solution found in target neighborhood after 356 steps
2025-03-09 14:04:12,694 - Time taken: 327.46769547462463


Success: True, Steps: 371, Time: 327.51s

Run 17 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.07s


2025-03-09 14:08:43,144 - Got target neighborhood with 47747 states
2025-03-09 14:13:52,859 - Solution found in target neighborhood after 360 steps
2025-03-09 14:13:52,859 - Time taken: 309.6233808994293


Success: True, Steps: 375, Time: 309.65s

Run 18 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.02s


2025-03-09 14:18:22,415 - Got target neighborhood with 47747 states
2025-03-09 14:22:45,524 - Solution found in target neighborhood after 295 steps
2025-03-09 14:22:45,525 - Time taken: 262.98867297172546


Success: True, Steps: 310, Time: 263.04s

Run 19 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.80s


2025-03-09 14:27:15,119 - Got target neighborhood with 47747 states
2025-03-09 14:35:44,147 - 
Search terminated after reaching max steps: 600
2025-03-09 14:35:44,148 - Time taken: 508.9332273006439


Success: False, Steps: 600, Time: 508.95s

Run 20 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.07s


2025-03-09 14:40:14,161 - Got target neighborhood with 47747 states
2025-03-09 14:45:47,332 - Solution found in target neighborhood after 367 steps
2025-03-09 14:45:47,333 - Time taken: 333.06219148635864


Success: True, Steps: 382, Time: 333.13s
History window size: 600

Run 1 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.85s


2025-03-09 14:50:19,974 - Got target neighborhood with 47747 states
2025-03-09 14:57:20,190 - Solution found in target neighborhood after 341 steps
2025-03-09 14:57:20,191 - Time taken: 420.12772035598755


Success: True, Steps: 356, Time: 420.15s

Run 2 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.71s


2025-03-09 15:01:50,663 - Got target neighborhood with 47747 states
2025-03-09 15:09:06,491 - Solution found in target neighborhood after 338 steps
2025-03-09 15:09:06,492 - Time taken: 435.6065230369568


Success: True, Steps: 353, Time: 435.70s

Run 3 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.90s


2025-03-09 15:13:36,332 - Got target neighborhood with 47747 states
2025-03-09 15:20:16,945 - Solution found in target neighborhood after 321 steps
2025-03-09 15:20:16,947 - Time taken: 400.4016568660736


Success: True, Steps: 336, Time: 400.55s

Run 4 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.31s


2025-03-09 15:24:46,363 - Got target neighborhood with 47747 states
2025-03-09 15:41:37,586 - 
Search terminated after reaching max steps: 600
2025-03-09 15:41:37,586 - Time taken: 1011.0721714496613


Success: False, Steps: 600, Time: 1011.15s

Run 5 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.99s


2025-03-09 15:46:07,861 - Got target neighborhood with 47747 states
2025-03-09 15:54:29,215 - Solution found in target neighborhood after 387 steps
2025-03-09 15:54:29,216 - Time taken: 500.9484648704529


Success: True, Steps: 402, Time: 501.29s

Run 6 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.69s


2025-03-09 15:59:00,301 - Got target neighborhood with 47747 states
2025-03-09 16:05:23,846 - Solution found in target neighborhood after 309 steps
2025-03-09 16:05:23,847 - Time taken: 383.3862051963806


Success: True, Steps: 324, Time: 383.48s

Run 7 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.76s


2025-03-09 16:09:55,144 - Got target neighborhood with 47747 states
2025-03-09 16:18:19,252 - Solution found in target neighborhood after 383 steps
2025-03-09 16:18:19,253 - Time taken: 503.9661269187927


Success: True, Steps: 398, Time: 504.04s

Run 8 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.53s


2025-03-09 16:22:49,750 - Got target neighborhood with 47747 states
2025-03-09 16:28:52,905 - Solution found in target neighborhood after 290 steps
2025-03-09 16:28:52,906 - Time taken: 363.0398371219635


Success: True, Steps: 305, Time: 363.08s

Run 9 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.85s


2025-03-09 16:33:23,515 - Got target neighborhood with 47747 states
2025-03-09 16:39:17,034 - Solution found in target neighborhood after 300 steps
2025-03-09 16:39:17,035 - Time taken: 353.3366105556488


Success: True, Steps: 315, Time: 353.41s

Run 10 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.88s


2025-03-09 16:43:48,185 - Got target neighborhood with 47747 states
2025-03-09 16:49:33,877 - Solution found in target neighborhood after 289 steps
2025-03-09 16:49:33,878 - Time taken: 345.5232689380646


Success: True, Steps: 304, Time: 345.62s

Run 11 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.30s


2025-03-09 16:54:04,033 - Got target neighborhood with 47747 states
2025-03-09 17:10:39,282 - 
Search terminated after reaching max steps: 600
2025-03-09 17:10:39,283 - Time taken: 995.1085267066956


Success: False, Steps: 600, Time: 995.16s

Run 12 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.62s


2025-03-09 17:15:08,710 - Got target neighborhood with 47747 states
2025-03-09 17:21:07,375 - Solution found in target neighborhood after 296 steps
2025-03-09 17:21:07,376 - Time taken: 358.47523498535156


Success: True, Steps: 311, Time: 358.60s

Run 13 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.69s


2025-03-09 17:25:37,527 - Got target neighborhood with 47747 states
2025-03-09 17:31:38,486 - Solution found in target neighborhood after 305 steps
2025-03-09 17:31:38,487 - Time taken: 360.8149380683899


Success: True, Steps: 320, Time: 360.89s

Run 14 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.44s


2025-03-09 17:36:13,890 - Got target neighborhood with 47747 states
2025-03-09 17:44:02,219 - Solution found in target neighborhood after 354 steps
2025-03-09 17:44:02,220 - Time taken: 468.20451617240906


Success: True, Steps: 369, Time: 468.26s

Run 15 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.64s


2025-03-09 17:48:36,117 - Got target neighborhood with 47747 states
2025-03-09 17:55:43,453 - Solution found in target neighborhood after 339 steps
2025-03-09 17:55:43,454 - Time taken: 427.21518182754517


Success: True, Steps: 354, Time: 427.27s

Run 16 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.15s


2025-03-09 18:00:16,834 - Got target neighborhood with 47747 states
2025-03-09 18:12:17,930 - Solution found in target neighborhood after 479 steps
2025-03-09 18:12:17,931 - Time taken: 720.8809084892273


Success: True, Steps: 494, Time: 721.01s

Run 17 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.86s


2025-03-09 18:16:51,200 - Got target neighborhood with 47747 states
2025-03-09 18:30:27,221 - Solution found in target neighborhood after 534 steps
2025-03-09 18:30:27,222 - Time taken: 815.8812234401703


Success: True, Steps: 549, Time: 815.96s

Run 18 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 51.13s


2025-03-09 18:35:01,392 - Got target neighborhood with 47747 states
2025-03-09 18:45:50,706 - Solution found in target neighborhood after 446 steps
2025-03-09 18:45:50,707 - Time taken: 649.130268573761


Success: True, Steps: 461, Time: 649.25s

Run 19 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 50.78s


2025-03-09 18:50:24,532 - Got target neighborhood with 47747 states
2025-03-09 18:56:16,372 - Solution found in target neighborhood after 293 steps
2025-03-09 18:56:16,373 - Time taken: 351.66751742362976


Success: True, Steps: 308, Time: 351.77s

Run 20 for n=25
Training catboost on data generated by nbt_random_walks with n_steps=300, n_walks=10000 took 49.95s


2025-03-09 19:00:49,382 - Got target neighborhood with 47747 states
2025-03-09 19:07:47,984 - Solution found in target neighborhood after 335 steps
2025-03-09 19:07:47,985 - Time taken: 418.3132438659668


Success: True, Steps: 350, Time: 418.37s

Total execution time: 28245.44 seconds
Solutions saved to solutions.csv


In [15]:
results_df

Unnamed: 0,random_walks,model,permutation,n,beam_width,history_window_size,max_steps,run,success,solution,steps,train_time,solve_time
0,nbt,catboost,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,...",25,65536,32,600,1,True,L.L.L.L.L.X.L.X.R.X.R.X.L.X.L.X.L.L.L.X.L.X.L....,331,51.06305,278.191108
1,nbt,catboost,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,...",25,65536,32,600,2,True,L.L.L.X.L.L.L.X.R.X.L.X.L.L.X.L.L.L.L.X.L.X.L....,420,51.856498,366.085362
2,nbt,catboost,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,...",25,65536,32,600,3,True,L.L.L.L.X.L.X.R.R.X.L.X.L.L.X.L.X.L.X.R.R.X.R....,342,51.006986,282.679768
3,nbt,catboost,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,...",25,65536,32,600,4,True,R.R.R.X.R.X.L.X.L.X.R.X.R.X.R.X.L.X.L.X.L.X.L....,415,51.096294,369.160177
4,nbt,catboost,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,...",25,65536,32,600,5,True,R.R.R.R.R.X.R.R.X.R.R.R.R.R.R.R.R.R.R.R.X.L.X....,480,50.934664,402.22683
5,nbt,catboost,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,...",25,65536,32,600,6,True,L.L.L.L.L.X.L.X.R.X.R.X.L.X.L.X.L.L.L.L.L.X.R....,313,50.914995,258.673341
6,nbt,catboost,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,...",25,65536,32,600,7,True,L.L.L.X.L.X.L.X.L.X.L.X.L.X.R.R.X.R.X.R.X.L.X....,312,50.81759,258.183956
7,nbt,catboost,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,...",25,65536,32,600,8,True,R.R.X.R.R.X.L.X.L.X.L.X.L.L.L.L.L.L.L.X.R.X.L....,344,50.756999,293.986481
8,nbt,catboost,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,...",25,65536,32,600,9,False,Not found,600,50.73473,511.87736
9,nbt,catboost,"1,0,24,23,22,21,20,19,18,17,16,15,14,13,12,11,...",25,65536,32,600,10,True,R.X.R.X.R.X.L.X.L.X.L.X.L.L.L.L.X.L.X.L.X.L.X....,350,50.706211,302.432559


In [16]:
exp_details = results_df[
['random_walks', 'model', 'n', 'beam_width', 'max_steps']
].drop_duplicates()
print("Experiment details:")
print(exp_details.to_string(index=False))

success_rate = results_df.groupby('history_window_size')['success'].mean() * 100

solved = results_df[results_df['success']]
stats = solved.groupby('history_window_size').agg(
    runs=('steps', 'size'),
    median_steps=('steps', 'median'),
    min_steps=('steps', 'min'),
    max_steps=('steps', 'max'),
    std_steps=('steps', 'std'),
    mean_solve_time=('solve_time', 'mean'),
    mean_train_time=('train_time', 'mean')
).T

stats.loc['success_rate (%)'] = success_rate.round(2)

print("\nStatistics per history_window_size:")
stats

Experiment details:
random_walks    model  n  beam_width  max_steps
         nbt catboost 25       65536        600

Statistics per history_window_size:


history_window_size,32,600
runs,16.0,18.0
median_steps,360.5,351.5
min_steps,308.0,304.0
max_steps,480.0,549.0
std_steps,51.234388,69.960703
mean_solve_time,312.981445,459.926807
mean_train_time,50.951242,50.801031
success_rate (%),80.0,90.0


<div style="
    margin-bottom: 20px;
    border: 1px solid #DDDDDD;
    border-radius: 8px;
    box-shadow: 0 4px 8px rgba(0,0,0,0.1);
    overflow: hidden;">
  <div style="background-color: #004d40; padding: 15px;">
    <h1 id="history" style="margin: 0; color: #ffffff;">Version history</h1>
  </div>
</div>

- Versions 1-3: setup and kaggle limits exploration
- Version 4: Solved "longest" permutation of size 15 with 105 steps (10 runs with different beam widths from 2^17 to 2^19)
- Version 5: Solved "longest" permutation of size 16 with 120 steps (10 runs with different beam widths from 2^17 to 2^19)
- Version 6: Added method description and working example for CPU and permutation of size 8
- Version 7: Added non-backtracking (nbt) random walks and Beam search (cancelled since found a bug in experiment code)
- Version 8: Run experiments with nbt random walks and catboost model: n=16 to 20, 3 runs with beam_width=100_000, GPU
- Version 9: Run experiments with nbt random walks and xgboost model: n=16 to 18, 3 runs with beam_width=100_000, GPU
- Versions 10-13: Run experiments with nbt random walks and xgboost model: n=19 to 25, 3 runs with beam_width=100_000, GPU - OOM
- Version 14: Run experiments with nbt random walks and xgboost model: n=19 to 25, 3 runs with beam_width=100_000, GPU (same code run interactively, quick save)
- Version 15: Run experiments with nbt random walks and catboost model: n=19 to 25, 3 runs with beam_width=100_000, GPU
- Version 16: Run experiments with first-visit random walks and catboost model: n=16 to 25, 3 runs with beam_width=100_000, GPU
- Version 17-19: Run experiments with nbt random walks and catboost model: n=26 to 35, 3 runs with beam_width=500_000, GPU - OOM
- Version 20-21: Run experiments with nbt random walks, catboost model, beam_width=2**16: n=25, 5 runs for each history_window_size (to test how the number of steps to keep state history in non-backtracking affects performance), GPU
- Version 22: Run experiments with nbt random walks, catboost model, beam_width=2**16: n=25, 20 runs for each history_window_size (to test how the number of steps to keep state history in non-backtracking affects performance), GPU