# Play with the Trained Durak Model

This notebook demonstrates how to use your trained AlphaZero-like model to:

1. Play from any position
2. Get model evaluations for positions
3. See the top recommended moves

## Requirements
1. You need a trained model checkpoint
2. You need the code from `src/utils/play_utils.py`

Let's get started!

In [1]:
%cd ../

import torch
import numpy as np
import pyspiel
import matplotlib.pyplot as plt

from src.durak.durak_game import DurakGame, card_to_string
from src.model.network import AlphaZeroNet
from src.utils.play_utils import get_model_move, print_state_info, play_from_position, create_custom_state, action_to_readable
from src.utils.checkpoint import load_checkpoint

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

/home/ivan/Projects/Neurodurak/AlphaZero-Durak
Using device: cuda


## Load the Trained Model

First, we need to load a checkpoint from our trained model.

In [2]:
# Create the same network architecture that was used for training
network = AlphaZeroNet(
    input_dim=158,      # from DurakObserver
    hidden_dim=256,     # bigger net
    num_actions=39,     # 36 cards + 3 extra actions
    num_layers=4,       # deeper
    use_history=True,   # Enable LSTM history
    history_dim=128     # Size of history embedding
).to(device)

# Load the checkpoint (adjust path as needed)
checkpoint_path = "checkpoint_reborn/1750.ckpt"  # Change to your best checkpoint
game_count, _ = load_checkpoint(checkpoint_path, network, device=device)

Loaded checkpoint from checkpoint_reborn/1750.ckpt


  checkpoint = torch.load(checkpoint_path, map_location=device)


## Example 1: Play from a Random Initial Position

Let's start by playing from a random initial position to see the model's recommendations.

In [3]:
# Create a new game and play until we have an interesting position
game = DurakGame()
state = game.new_initial_state()

# Handle chance node for initial shuffling and dealing
while state.is_chance_node():
    outcomes = state.chance_outcomes()
    action = outcomes[0][0]  # Take first action (only one possible)
    state.apply_action(action)

# Print the current state and get model recommendations
player_viewpoint = 0  # The player whose perspective we're viewing from
print_state_info(state, player_viewpoint)

action, policy, win_prob = get_model_move(
    network, state, device=device, mcts_simulations=200, use_argmax=True
)

print(f"\nModel's evaluation: {win_prob:.2%} chance of winning")
print(f"Model's chosen action: {action_to_readable(action)}")

# Print top actions by probability
print("\nTop actions by probability:")
sorted_actions = sorted(policy.items(), key=lambda x: x[1], reverse=True)
for i, (act, prob) in enumerate(sorted_actions[:5]):
    print(f"  {i+1}. {action_to_readable(act)}: {prob:.2%}")


Player 0 viewpoint:
Trump suit: ♣ (card: 8♣)
Hand: ['6♠', 'Q♠', '6♣', '9♣', 'K♣', 'J♦']
Opponent has 6 cards
Deck has 24 cards remaining
Table: empty
Phase: ATTACK
Current player: 0
Legal actions: ['6♠', 'Q♠', '6♣', '9♣', 'K♣', 'J♦']

Model's evaluation: 0.11% chance of winning
Model's chosen action: Q♠

Top actions by probability:
  1. Q♠: 100.00%
  2. 6♠: 0.00%
  3. 6♣: 0.00%
  4. 9♣: 0.00%
  5. K♣: 0.00%


## Example 2: Create and Play from a Custom Position

Here we'll set up a specific game position and get the model's recommendations.

In [4]:
# Define a custom position
# Let's create an interesting defensive scenario

# Card indices: 0-8 = ♠6-A, 9-17 = ♣6-A, 18-26 = ♦6-A, 27-35 = ♥6-A
trump_card = 27  # ♥6 (trump suit is hearts)

player0_hand = [0, 1, 9, 18, 27, 35]  # Mix of suits including trumps
player1_hand = [2, 10, 19, 28, 29, 30]  # Mix with several trumps
hands = [player0_hand, player1_hand]

# Table cards: list of (attacking_card, defending_card_or_None)
table_cards = [(3, None), (12, None)]  # Two undefended cards

phase = 2  # DEFENSE phase
attacker = 0  # Player 0 is attacking
defender = 1  # Player 1 is defending
deck_size = 12  # Cards remaining in the deck

# Create the custom state
custom_state = create_custom_state(
    trump_card=trump_card,
    hands=hands,
    table_cards=table_cards,
    phase=phase,
    attacker=attacker,
    deck_size=deck_size
)

# Print state and get model recommendations
print_state_info(custom_state, defender)  # From defender's viewpoint

action, policy, win_prob = get_model_move(
    network, custom_state, device=device, mcts_simulations=200, use_argmax=True
)

print(f"\nModel's evaluation: {win_prob:.2%} chance of winning")
print(f"Model's chosen action: {action_to_readable(action)}")

# Print top actions by probability
print("\nTop actions by probability:")
sorted_actions = sorted(policy.items(), key=lambda x: x[1], reverse=True)
for i, (act, prob) in enumerate(sorted_actions[:5]):
    print(f"  {i+1}. {action_to_readable(act)}: {prob:.2%}")


Player 1 viewpoint:
Trump suit: ♥ (card: 6♥)
Hand: ['8♠', '7♣', '7♦', '7♥', '8♥', '9♥']
Opponent has 6 cards
Deck has 12 cards remaining
Table:
  1. 9♠ -> ?
  2. 9♣ -> ?
Phase: DEFENSE
Current player: 1
Legal actions: ['7♥', '8♥', '9♥', 'TAKE_CARDS']

Model's evaluation: 0.00% chance of winning
Model's chosen action: 7♥

Top actions by probability:
  1. 7♥: 100.00%
  2. 8♥: 0.00%
  3. 9♥: 0.00%
  4. TAKE_CARDS: 0.00%


## Example 3: Play Through a Game Step by Step

Now let's start a fresh game and step through it move by move, seeing the model's evaluations at each step.

In [5]:
game = DurakGame()
state = game.new_initial_state()

# Handle chance node for initial shuffling and dealing
while state.is_chance_node():
    outcomes = state.chance_outcomes()
    action = outcomes[0][0]
    state.apply_action(action)

# Play through 10 steps (or until game ends)
for step in range(10):
    if state.is_terminal():
        print("\nGame over!")
        returns = state.returns()
        print(f"Returns: Player 0: {returns[0]}, Player 1: {returns[1]}")
        break
        
    if state.is_chance_node():
        outcomes = state.chance_outcomes()
        action = outcomes[0][0]
        state.apply_action(action)
        continue
        
    print(f"\n\n--- Step {step+1} ---")
    current_player = state.current_player()
    print_state_info(state, current_player)
    
    # Get model recommendation
    action, policy, win_prob = get_model_move(
        network, state, device=device, mcts_simulations=200, use_argmax=True
    )
    
    print(f"\nModel's evaluation: {win_prob:.2%} chance of winning")
    print(f"Model's chosen action: {action_to_readable(action)}")
    
    # Print top 3 actions
    print("\nTop actions by probability:")
    sorted_actions = sorted(policy.items(), key=lambda x: x[1], reverse=True)
    for i, (act, prob) in enumerate(sorted_actions[:3]):
        print(f"  {i+1}. {action_to_readable(act)}: {prob:.2%}")
    
    # Apply the model's chosen action
    state.apply_action(action)



--- Step 1 ---

Player 0 viewpoint:
Trump suit: ♠ (card: 9♠)
Hand: ['J♠', '8♣', '10♣', 'A♣', '8♦', '6♥']
Opponent has 6 cards
Deck has 24 cards remaining
Table: empty
Phase: ATTACK
Current player: 0
Legal actions: ['J♠', '8♣', '10♣', 'A♣', '8♦', '6♥']

Model's evaluation: 0.17% chance of winning
Model's chosen action: 8♣

Top actions by probability:
  1. 8♣: 100.00%
  2. J♠: 0.00%
  3. 10♣: 0.00%


--- Step 2 ---

Player 0 viewpoint:
Trump suit: ♠ (card: 9♠)
Hand: ['J♠', '10♣', 'A♣', '8♦', '6♥']
Opponent has 6 cards
Deck has 24 cards remaining
Table:
  1. 8♣ -> ?
Phase: ATTACK
Current player: 0
Legal actions: ['8♦', 'FINISH_ATTACK']

Model's evaluation: 0.26% chance of winning
Model's chosen action: 8♦

Top actions by probability:
  1. 8♦: 100.00%
  2. FINISH_ATTACK: 0.00%


--- Step 3 ---

Player 0 viewpoint:
Trump suit: ♠ (card: 9♠)
Hand: ['J♠', '10♣', 'A♣', '6♥']
Opponent has 6 cards
Deck has 24 cards remaining
Table:
  1. 8♣ -> ?
  2. 8♦ -> ?
Phase: ATTACK
Current player: 0
Legal

## Example 4: Interactive Play Against the Model

In [6]:
def play_interactive_game(network, device, human_player=1, mcts_simulations=200):
    """
    Interactive play mode where the human (you) makes moves against the model.
    
    Args:
        network: The trained neural network
        device: The device to run inference on
        human_player: Which player you want to be (0 or 1)
        mcts_simulations: Number of MCTS simulations for model moves
    """
    from src.durak.durak_game import DurakGame, ExtraAction
    
    game = DurakGame()
    state = game.new_initial_state()
    model_player = 1 - human_player
    
    # Handle chance node for initial shuffling and dealing
    while state.is_chance_node():
        outcomes = state.chance_outcomes()
        action = outcomes[0][0]
        state.apply_action(action)
    
    move_number = 1
    
    while not state.is_terminal():
        print(f"\n\n--- Move {move_number} ---")
        
        # Print state info from human's perspective
        print_state_info(state, human_player)
        
        if state.is_chance_node():
            print("Handling chance node...")
            outcomes = state.chance_outcomes()
            action = outcomes[0][0]
            state.apply_action(action)
            continue
        
        current_player = state.current_player()
        
        # Get model evaluation for current position
        _, _, win_prob = get_model_move(
            network, state, device=device, mcts_simulations=100, 
            player_perspective=human_player
        )
        print(f"\nModel thinks your win probability is: {(1-win_prob):.2%}" if current_player == model_player else 
              f"\nModel thinks your win probability is: {win_prob:.2%}")
        
        if current_player == human_player:
            # Human's turn
            legal_actions = state.legal_actions()
            if not legal_actions:
                print("No legal actions available!")
                break
                
            print("\nYour legal moves:")
            for idx, action in enumerate(legal_actions):
                print(f"  {idx}: {action_to_readable(action)}")
                
            # Get user input
            while True:
                try:
                    choice = int(input("\nEnter the number of your chosen move: "))
                    if 0 <= choice < len(legal_actions):
                        break
                    else:
                        print("Invalid choice, try again.")
                except ValueError:
                    print("Please enter a valid number.")
            
            chosen_action = legal_actions[choice]
            print(f"\nYou chose: {action_to_readable(chosen_action)}")
            
        else:
            # Model's turn
            print("\nModel is thinking...")
            chosen_action, policy, _ = get_model_move(
                network, state, device=device, mcts_simulations=mcts_simulations, use_argmax=True
            )
            print(f"Model chooses: {action_to_readable(chosen_action)}")
            
            # Print top alternatives the model considered
            print("\nTop alternatives the model considered:")
            sorted_actions = sorted(policy.items(), key=lambda x: x[1], reverse=True)
            for i, (act, prob) in enumerate(sorted_actions[:3]):
                if i > 0:  # Skip the chosen action which should be first
                    print(f"  {action_to_readable(act)}: {prob:.2%}")
        
        # Apply the action
        state.apply_action(chosen_action)
        move_number += 1
        
        # Check if terminal after applying action
        if state.is_terminal():
            break
    
    # Game over
    print("\n=== Game Over ===")
    returns = state.returns()
    
    if returns[human_player] > 0:
        print("You win! 🎉")
    elif returns[human_player] < 0:
        print("You lose! 😢")
    else:
        print("It's a draw! 🤝")

Start an interactive game. Uncomment to play:

In [7]:
#human_player = 1  # Change to 0 if you want to go first
#play_interactive_game(network, device, human_player=human_player)

## Example 5: Simulate a Full Game Between Model and Rule Agent

In [8]:
def simulate_game_with_details(network, device, model_player=0, mcts_simulations=200):
    """
    Simulates a complete game between the model and a rule agent,
    printing detailed information at each step.
    
    Args:
        network: The trained neural network
        device: The device to run inference on
        model_player: Which player the model plays as (0 or 1)
        mcts_simulations: Number of MCTS simulations for model moves
    """
    from src.durak.durak_game import DurakGame
    from src.evaluation.rule_agent import RuleAgent
    
    game = DurakGame()
    state = game.new_initial_state()
    rule_agent = RuleAgent()  # Create rule-based agent
    
    # Handle chance node for initial shuffling and dealing
    while state.is_chance_node():
        outcomes = state.chance_outcomes()
        action = outcomes[0][0]
        state.apply_action(action)
    
    move_number = 1
    
    print("\n=== Game Start ===")
    
    while not state.is_terminal():
        print(f"\n\n--- Move {move_number} ---")
        
        if state.is_chance_node():
            print("Handling chance node...")
            outcomes = state.chance_outcomes()
            action = outcomes[0][0]
            state.apply_action(action)
            continue
        
        current_player = state.current_player()
        print(f"Current player: {'Model' if current_player == model_player else 'Rule Agent'}")
        
        # Print the state from the current player's perspective
        print_state_info(state, current_player)
        
        # Get model evaluation
        _, _, win_prob = get_model_move(
            network, state, device=device, mcts_simulations=50
        )
        print(f"\nModel evaluation: {win_prob:.2%} chance model will win")
        
        if current_player == model_player:
            # Model's turn
            chosen_action, policy, _ = get_model_move(
                network, state, device=device, mcts_simulations=mcts_simulations, use_argmax=True
            )
            print(f"\nModel chooses: {action_to_readable(chosen_action)}")
            
            # Print top alternatives
            print("Top alternatives considered:")
            sorted_actions = sorted(policy.items(), key=lambda x: x[1], reverse=True)
            for i, (act, prob) in enumerate(sorted_actions[1:4]):  # Next 3 alternatives
                print(f"  {action_to_readable(act)}: {prob:.2%}")
        else:
            # Rule agent's turn
            chosen_action = rule_agent.step(state)
            print(f"\nRule agent chooses: {action_to_readable(chosen_action)}")
        
        # Apply the action
        state.apply_action(chosen_action)
        move_number += 1
        
        # Check if terminal after applying action
        if state.is_terminal():
            break
    
    # Game over
    print("\n=== Game Over ===")
    returns = state.returns()
    
    if returns[model_player] > 0:
        print("Model wins! 🎮")
    elif returns[model_player] < 0:
        print("Rule agent wins! 🤖")
    else:
        print("It's a draw! 🤝")
    
    print(f"Final score - Model: {returns[model_player]}, Rule Agent: {returns[1-model_player]}")

Run a simulated game. Uncomment to simulate:

In [22]:
model_player = 0  # Change to 1 if you want model to play second
simulate_game_with_details(network, device, model_player=model_player)


=== Game Start ===


--- Move 1 ---
Current player: Rule Agent

Player 1 viewpoint:
Trump suit: ♠ (card: A♠)
Hand: ['J♠', 'Q♣', 'A♣', '9♦', '10♦', 'Q♦']
Opponent has 6 cards
Deck has 24 cards remaining
Table: empty
Phase: ATTACK
Current player: 1
Legal actions: ['J♠', 'Q♣', 'A♣', '9♦', '10♦', 'Q♦']

Model evaluation: 1.01% chance model will win

Rule agent chooses: 9♦


--- Move 2 ---
Current player: Rule Agent

Player 1 viewpoint:
Trump suit: ♠ (card: A♠)
Hand: ['J♠', 'Q♣', 'A♣', '10♦', 'Q♦']
Opponent has 6 cards
Deck has 24 cards remaining
Table:
  1. 9♦ -> ?
Phase: ATTACK
Current player: 1
Legal actions: ['FINISH_ATTACK']

Model evaluation: 1.14% chance model will win

Rule agent chooses: FINISH_ATTACK


--- Move 3 ---
Current player: Model

Player 0 viewpoint:
Trump suit: ♠ (card: A♠)
Hand: ['K♠', '10♣', 'K♣', '7♦', '8♦', '9♥']
Opponent has 5 cards
Deck has 24 cards remaining
Table:
  1. 9♦ -> ?
Phase: DEFENSE
Current player: 0
Legal actions: ['K♠', 'TAKE_CARDS']

Model evaluation

## Bonus: Quick Evaluation Function

Let's also add a utility function to quickly evaluate the model's strength:

In [10]:
def evaluate_model_strength(network, device, num_games=10, mcts_simulations=100):
    """
    Evaluates model strength by playing multiple games against the rule agent
    and returns win rate statistics.
    """
    from src.evaluation.evaluator import evaluate_model_vs_rule_agent
    
    print("Evaluating model strength against rule agent...")
    
    # Evaluate as player 0
    win_rate_p0 = evaluate_model_vs_rule_agent(
        network=network,
        device=device,
        num_games=num_games,
        model_player=0,
        mcts_simulations=mcts_simulations,
        use_argmax=True
    )
    
    # Evaluate as player 1
    win_rate_p1 = evaluate_model_vs_rule_agent(
        network=network,
        device=device,
        num_games=num_games,
        model_player=1,
        mcts_simulations=mcts_simulations,
        use_argmax=True
    )
    
    print(f"\nResults from {num_games} games as each player:")
    print(f"- Win rate as player 0 (first player): {win_rate_p0:.2%}")
    print(f"- Win rate as player 1 (second player): {win_rate_p1:.2%}")
    print(f"- Average win rate: {(win_rate_p0 + win_rate_p1) / 2:.2%}")
    
    return win_rate_p0, win_rate_p1

In [11]:
evaluate_model_strength(network, device, num_games=5)

Evaluating model strength against rule agent...

Results from 5 games as each player:
- Win rate as player 0 (first player): 20.00%
- Win rate as player 1 (second player): 20.00%
- Average win rate: 20.00%


(0.2, 0.2)