# Welcome to Modal notebooks!

Write Python code and collaborate in real time. Your code runs in Modal's
**serverless cloud**, and anyone in the same workspace can join.

This notebook comes with some common Python libraries installed. Run
cells with `Shift+Enter`.

In [1]:
# Keep uv itself current (uses the kernel's python)
!uv pip install --upgrade -q uv --python=$(which python)

# (Optional) let numpy pin match what's already loaded
try:
    import numpy
    get_numpy = f"numpy=={numpy.__version__}"
except Exception:
    get_numpy = "numpy"

# PyTorch nightly (CUDA 12.8): install into the kernel's env
!uv pip install --upgrade --pre --python=$(which python) \
  torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128

# Core stack (into the kernel's env)
!uv pip install --python=$(which python) \
  "torch>=2.8.0" "triton>=3.4.0" $get_numpy torchvision bitsandbytes \
  "transformers>=4.55.3" \
  "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" \
  "unsloth[base] @ git+https://github.com/unslothai/unsloth" \
  git+https://github.com/triton-lang/triton.git@05b2c186c1b6c9a08375389d5efe9cb4c401c075#subdirectory=python/triton_kernels

# Pin transformers/tokenizers (still to kernel's env)
!uv pip install --upgrade --no-deps --python=$(which python) transformers==4.56.2 tokenizers

# trl without deps (to kernel's env)
!uv pip install --no-deps --python=$(which python) trl==0.22.2


[2mUsing Python 3.12.6 environment at: /usr/local[0m
[37m⠋[0m [2mResolving dependencies...                                                     [0m[2K[37m⠋[0m [2mResolving dependencies...                                                     [0m[2K[37m⠙[0m [2mResolving dependencies...                                                     [0m[2K[37m⠹[0m [2mResolving dependencies...                                                     [0m[2K[37m⠹[0m [2mtorch==2.10.0.dev20250929+cu128                                               [0m[2K[37m⠸[0m [2mtorch==2.10.0.dev20250929+cu128                                               [0m[2K[37m⠸[0m [2mnvidia-cuda-nvrtc-cu12==12.8.93                                               [0m[2K[37m⠸[0m [2mnvidia-cuda-nvrtc-cu12==12.8.93                                               [0m[2K[37m⠸[0m [2mnvidia-cuda-runtime-cu12==12.8.90                                             [0m[2K[37m⠸[0m [2mnvidia-cuda

In [2]:
!uv pip list -v --python=$(which python) | head -20

[34mDEBUG[39m uv 0.8.22
[34mDEBUG[39m Acquired shared lock for `/root/.cache/uv`
[34mDEBUG[39m Checking for Python interpreter at path `/usr/local/bin/python`
[2mUsing Python 3.12.6 environment at: /usr/local[0m
Package                   Version
[34mDEBUG[39m Released lock at `/root/.cache/uv/.lock`
------------------------- ------------------------
absl-py                   2.3.1
accelerate                1.10.1
aiofiles                  24.1.0
aiohappyeyeballs          2.4.3
aiohttp                   3.10.8
aiosignal                 1.3.1
altair                    5.5.0
annotated-types           0.7.0
anthropic                 0.66.0
anyio                     4.10.0
asttokens                 3.0.0
asyncpg                   0.30.0
attrs                     24.2.0
authlib                   1.6.3
awscrt                    0.27.6
basedpyright              1.31.4
beautifulsoup4            4.13.5
bitsandbytes              0.47.0


In [3]:
import sys
print(sys.executable)


/usr/local/bin/python


In [4]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 127000 # Can increase for longer RL output
lora_rank = 4 # Larger rank = smarter, but slower
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/gpt-oss-20b",
    max_seq_length = max_seq_length,
    load_in_4bit = True, # False for LoRA 16bit
    offload_embedding = True, # Reduces VRAM by 1GB
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.9.9: Fast Gpt_Oss patching. Transformers: 4.56.2.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.494 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.10.0.dev20250929+cu128. CUDA: 8.0. CUDA Toolkit: 12.8. Triton: 3.5.0
\        /    Bfloat16 = TRUE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Unsloth: Offloading embeddings to RAM to save 1.08 GB.


In [5]:
messages = [
  {
    "role": "system",
    "content": """You're a crossword expert.
I will provide you with a 5x5 mini crossword and you should solve the entire puzzle in one go.

## Response format:
Provide all your guesses in a single message using the format: "guess 1a=red"
You can provide multiple guesses separated by commas, like: "guess 1a=red, guess 2d=blue, guess 3a=green"
You can also delete guesses you believe to be incorrect using "delete 1a"

DO NOT try to call a tool, simply respond with the response format. This is a multi-turn conversation.

## Important:
Try to solve the entire puzzle at once. Analyze all the clues together and provide your complete solution.
Think about how the across and down clues intersect and use those intersections to validate your answers.
Provide ALL your answers in ONE message to solve the puzzle as efficiently as possible."""
  },
  {
    "role": "user",
    "content": """# Crossword Puzzle Serialization
## Grid (5x5)
Legend:
- `black` = black square
- Number = clue label for the cell
- `.` = empty white square without a label
- Letter = filled entry
Grid layout:
Row1: col1 black, col2 01, col3 02, col4 03, col5 black.
Row2: col1 04, col2 ., col3 ., col4 ., col5 05.
Row3: col1 06, col2 ., col3 ., col4 ., col5 .
Row4: col1 07, col2 ., col3 ., col4 ., col5 .
Row5: col1 black, col2 08, col3 ., col4 ., col5 black.

## Clues

### Across
1A: Key above Caps Lock (3 letters)
4A: Biased sports fan (5 letters)
6A: What puts the "i" in Silicon Valley? (5 letters)
7A: Triangular road sign (5 letters)
8A: Items in a music library, for short (3 letters)

### Down
1D: Conversation subject (5 letters)
2D: Pumped up (5 letters)
3D: "Silver ___" (Christmas classic) (5 letters)
4D: Farm fodder (3 letters)
5D: Like pants in the classic Nantucket style (3 letters)"""
  }
]


text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,
    reasoning_effort = "low",
)

from transformers import TextStreamer
import time
torch.cuda.synchronize()
t0 = time.perf_counter()
res = model.generate(
    **tokenizer(text, return_tensors = "pt").to("cuda"),
    temperature = 1.0,
    max_new_tokens = 100000,
    streamer = TextStreamer(tokenizer, skip_prompt = False),
)
torch.cuda.synchronize()
t1 = time.perf_counter()
decoded = tokenizer.decode(res[0], skip_special_tokens=True)
print(decoded)
print(f"Took {t1-t0}")

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-09-30

Reasoning: low

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

You're a crossword expert.
I will provide you with a 5x5 mini crossword and you should solve the entire puzzle in one go.

## Response format:
Provide all your guesses in a single message using the format: "guess 1a=red"
You can provide multiple guesses separated by commas, like: "guess 1a=red, guess 2d=blue, guess 3a=green"
You can also delete guesses you believe to be incorrect using "delete 1a"

DO NOT try to call a tool, simply respond with the response format. This is a multi-turn conversation.

## Important:
Try to solve the entire puzzle at once. Analyze all the clues together and provide your complete solution.
Think abo