# Rollout Datasets

Each model trained on non-deterministic A* search is evaluated by prompting the model with a test prompt and then randomly generating 64 response token sequences.
This dataset can be downloaded [here](https://dl.fbaipublicfiles.com/searchformer/rolloutDB.gz) and can be imported with [`mongorestore`](https://www.mongodb.com/docs/database-tools/mongorestore/).
The archive can be downloaded and imported by running

```
wget https://dl.fbaipublicfiles.com/searchformer/rolloutDB.gz
mongorestore --gzip --archive=rolloutDB.gz
```

After importing, the included rollout datasets can be listed with the following code.

In [7]:
import sys
sys.path.append("/mnt/d/work/searchformer/")

import logging
from searchformer.rollout import RolloutDataStore


logging.basicConfig(
    level=logging.DEBUG,
    format="%(levelname)s - %(asctime)s - %(name)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)

datastore = RolloutDataStore()


In [8]:
datastore.list_all()

INFO - 2025-05-06 08:25:09 - root - Connecting to mongodb://localhost:27017/mongo
DEBUG - 2025-05-06 08:25:09 - pymongo.topology - {"topologyId": {"$oid": "6819c76524109d162eec78ac"}, "message": "Starting topology monitoring"}
DEBUG - 2025-05-06 08:25:09 - pymongo.topology - {"topologyId": {"$oid": "6819c76524109d162eec78ac"}, "previousDescription": "<TopologyDescription id: 6819c76524109d162eec78ac, topology_type: Unknown, servers: []>", "newDescription": "<TopologyDescription id: 6819c76524109d162eec78ac, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None>]>", "message": "Topology description changed"}
DEBUG - 2025-05-06 08:25:09 - pymongo.topology - {"topologyId": {"$oid": "6819c76524109d162eec78ac"}, "serverHost": "localhost", "serverPort": 27017, "message": "Starting server monitoring"}
DEBUG - 2025-05-06 08:25:09 - pymongo.connection - {"clientId": {"$oid": "6819c76524109d162eec78ac"}, "message": "Connection pool created", "s

Unnamed: 0,_id,checkpoint_id,dataset_name,sampler_name,rollout_len,rollout_repeats,prefix_len,min_reasoning_len,max_reasoning_len
0,65ca57d67f455f390d05bf33,sokoban-7722-m-trace-plan-100k-2-step-2,sokoban.7-by-7-walls-2-boxes-2.with-box-40k,probability,11000,32,0,0,10000


DEBUG - 2025-05-06 08:25:19 - pymongo.topology - {"topologyId": {"$oid": "6819c76524109d162eec78ac"}, "driverConnectionId": 1, "serverConnectionId": 71, "serverHost": "localhost", "serverPort": 27017, "awaited": true, "durationMS": 10013.418100999843, "reply": "{\"isWritablePrimary\": true, \"topologyVersion\": {\"processId\": {\"$oid\": \"68199a333ff792416d1f8d34\"}}, \"maxBsonObjectSize\": 16777216, \"maxMessageSizeBytes\": 48000000, \"maxWriteBatchSize\": 100000, \"localTime\": {\"$date\": \"2025-05-06T08:25:19.749Z\"}, \"logicalSessionTimeoutMinutes\": 30, \"connectionId\": 71, \"maxWireVersion\": 25, \"ok\": 1.0}", "message": "Server heartbeat succeeded"}
DEBUG - 2025-05-06 08:25:19 - pymongo.topology - {"topologyId": {"$oid": "6819c76524109d162eec78ac"}, "driverConnectionId": 1, "serverConnectionId": 71, "serverHost": "localhost", "serverPort": 27017, "awaited": true, "message": "Server heartbeat started"}
DEBUG - 2025-05-06 08:25:29 - pymongo.topology - {"topologyId": {"$oid": "

In [9]:
datastore.dataset_collection

Collection(Database(MongoClient(host=['localhost:27017'], document_class=dict, tz_aware=False, connect=True, sockettimeoutms=1800000, connecttimeoutms=1800000), 'rolloutDB'), 'dataset')

In [10]:
datastore.list_all().sort_values(by=["checkpoint_id"])

DEBUG - 2025-05-06 08:25:32 - pymongo.serverSelection - {"message": "Server selection started", "selector": "Primary()", "operation": "find", "topologyDescription": "<TopologyDescription id: 6819c76524109d162eec78ac, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Standalone, rtt: 0.004675013997621136>]>", "clientId": {"$oid": "6819c76524109d162eec78ac"}}
DEBUG - 2025-05-06 08:25:32 - pymongo.serverSelection - {"message": "Server selection succeeded", "selector": "Primary()", "operation": "find", "topologyDescription": "<TopologyDescription id: 6819c76524109d162eec78ac, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Standalone, rtt: 0.004675013997621136>]>", "clientId": {"$oid": "6819c76524109d162eec78ac"}, "serverHost": "localhost", "serverPort": 27017}
DEBUG - 2025-05-06 08:25:32 - pymongo.connection - {"clientId": {"$oid": "6819c76524109d162eec78ac"}, "message": "Connection checkout started", "serverHost": 

Unnamed: 0,_id,checkpoint_id,dataset_name,sampler_name,rollout_len,rollout_repeats,prefix_len,min_reasoning_len,max_reasoning_len
0,65ca57d67f455f390d05bf33,sokoban-7722-m-trace-plan-100k-2-step-2,sokoban.7-by-7-walls-2-boxes-2.with-box-40k,probability,11000,32,0,0,10000


## Loading generated response sequences

A rollout dataset can be loaded with its corresponding id (`_id` column in the dataframe outputted above).
Then, one can iterate over the generated response sequences to load each stored response.

In [13]:
dataset = datastore.load_by_id("65ca57d67f455f390d05bf33")

DEBUG - 2025-05-06 08:26:17 - pymongo.serverSelection - {"message": "Server selection started", "selector": "Primary()", "operation": "find", "topologyDescription": "<TopologyDescription id: 6819c76524109d162eec78ac, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Standalone, rtt: 0.002081677539611003>]>", "clientId": {"$oid": "6819c76524109d162eec78ac"}}
DEBUG - 2025-05-06 08:26:17 - pymongo.serverSelection - {"message": "Server selection succeeded", "selector": "Primary()", "operation": "find", "topologyDescription": "<TopologyDescription id: 6819c76524109d162eec78ac, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Standalone, rtt: 0.002081677539611003>]>", "clientId": {"$oid": "6819c76524109d162eec78ac"}, "serverHost": "localhost", "serverPort": 27017}
DEBUG - 2025-05-06 08:26:17 - pymongo.connection - {"clientId": {"$oid": "6819c76524109d162eec78ac"}, "message": "Connection checkout started", "serverHost": 

DEBUG - 2025-05-06 08:28:54 - pymongo.serverSelection - {"message": "Server selection started", "selector": "Primary()", "operation": "find", "topologyDescription": "<TopologyDescription id: 6819c7ad24109d162eec78ae, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Standalone, rtt: 0.0008332905713947422>]>", "clientId": {"$oid": "6819c7ad24109d162eec78ae"}}
DEBUG - 2025-05-06 08:28:54 - pymongo.serverSelection - {"message": "Server selection succeeded", "selector": "Primary()", "operation": "find", "topologyDescription": "<TopologyDescription id: 6819c7ad24109d162eec78ae, topology_type: Single, servers: [<ServerDescription ('localhost', 27017) server_type: Standalone, rtt: 0.0008332905713947422>]>", "clientId": {"$oid": "6819c7ad24109d162eec78ae"}, "serverHost": "localhost", "serverPort": 27017}
DEBUG - 2025-05-06 08:28:54 - pymongo.connection - {"clientId": {"$oid": "6819c7ad24109d162eec78ae"}, "message": "Connection checkout started", "serverHost"

StopIteration: 

In [None]:
rollout = next(iter(dataset.rollout_test_it()))

print("Dataset parameters:")
for k, v in dataset.params.to_doc().items():
    print(f"\t{k}: {v}")

INFO - 2025-05-06 08:26:21 - root - Connecting to mongodb://localhost:27017/mongo
DEBUG - 2025-05-06 08:26:21 - pymongo.topology - {"topologyId": {"$oid": "6819c7ad24109d162eec78ae"}, "message": "Starting topology monitoring"}
DEBUG - 2025-05-06 08:26:21 - pymongo.topology - {"topologyId": {"$oid": "6819c7ad24109d162eec78ae"}, "previousDescription": "<TopologyDescription id: 6819c7ad24109d162eec78ae, topology_type: Unknown, servers: []>", "newDescription": "<TopologyDescription id: 6819c7ad24109d162eec78ae, topology_type: Unknown, servers: [<ServerDescription ('localhost', 27017) server_type: Unknown, rtt: None>]>", "message": "Topology description changed"}
DEBUG - 2025-05-06 08:26:21 - pymongo.topology - {"topologyId": {"$oid": "6819c7ad24109d162eec78ae"}, "serverHost": "localhost", "serverPort": 27017, "message": "Starting server monitoring"}
DEBUG - 2025-05-06 08:26:21 - pymongo.connection - {"clientId": {"$oid": "6819c7ad24109d162eec78ae"}, "message": "Connection pool created", "s

StopIteration: 

DEBUG - 2025-05-06 08:26:31 - pymongo.topology - {"topologyId": {"$oid": "6819c7ad24109d162eec78ae"}, "driverConnectionId": 1, "serverConnectionId": 77, "serverHost": "localhost", "serverPort": 27017, "awaited": true, "durationMS": 10014.010723996762, "reply": "{\"isWritablePrimary\": true, \"topologyVersion\": {\"processId\": {\"$oid\": \"68199a333ff792416d1f8d34\"}}, \"maxBsonObjectSize\": 16777216, \"maxMessageSizeBytes\": 48000000, \"maxWriteBatchSize\": 100000, \"localTime\": {\"$date\": \"2025-05-06T08:26:31.761Z\"}, \"logicalSessionTimeoutMinutes\": 30, \"connectionId\": 77, \"maxWireVersion\": 25, \"ok\": 1.0}", "message": "Server heartbeat succeeded"}
DEBUG - 2025-05-06 08:26:31 - pymongo.topology - {"topologyId": {"$oid": "6819c7ad24109d162eec78ae"}, "driverConnectionId": 1, "serverConnectionId": 77, "serverHost": "localhost", "serverPort": 27017, "awaited": true, "message": "Server heartbeat started"}
DEBUG - 2025-05-06 08:26:41 - pymongo.topology - {"topologyId": {"$oid": "

In [3]:
prompt_str = " ".join(rollout.prompt).replace("start", "\n\tstart").replace("goal", "\n\tgoal ").replace("wall", "\n\twall ")
response_str = " ".join(rollout.rollouts[0]).replace("bos ", "\n\tbos").replace("eos", "\n\teos").replace("create", "\n\tcreate").replace("close", "\n\tclose ").replace("plan ", "\n\tplan   ")

print("prompt: " + prompt_str)
print("response: " + response_str)

prompt: 
	start 3 6 
	goal  4 2 
	wall  0 0 
	wall  3 0 
	wall  4 0 
	wall  2 1 
	wall  4 1 
	wall  5 1 
	wall  9 1 
	wall  0 2 
	wall  1 2 
	wall  2 2 
	wall  6 2 
	wall  7 2 
	wall  5 3 
	wall  6 3 
	wall  7 3 
	wall  8 3 
	wall  9 3 
	wall  1 4 
	wall  2 4 
	wall  3 4 
	wall  9 4 
	wall  3 5 
	wall  4 5 
	wall  6 5 
	wall  5 6 
	wall  6 6 
	wall  9 6 
	wall  0 7 
	wall  2 7 
	wall  4 7 
	wall  6 8 
	wall  9 8 
	wall  2 9 
	wall  3 9 
	wall  4 9 
	wall  6 9 
	wall  8 9
response: 
	bos
	create 3 6 c0 c5 
	close  3 6 c0 c5 
	create 3 7 c1 c6 
	create 4 6 c1 c4 
	create 2 6 c1 c6 
	close  4 6 c1 c4 
	close  3 7 c1 c6 
	create 3 8 c2 c7 
	close  2 6 c1 c6 
	create 2 5 c2 c5 
	create 1 6 c2 c7 
	close  2 5 c2 c5 
	create 1 5 c3 c6 
	close  1 5 c3 c6 
	create 0 5 c4 c7 
	close  1 6 c2 c7 
	create 1 7 c3 c8 
	create 0 6 c3 c8 
	close  3 8 c2 c7 
	create 2 8 c3 c8 
	create 4 8 c3 c6 
	close  4 8 c3 c6 
	create 5 8 c4 c7 
	close  5 8 c4 c7 
	create 5 9 c5 c8 
	create 5 7 c5 c6 
	close  5 7 c5