# Hogwild! Parallelism: Basic Example

This example demonstrates Hogwild! inference on a single problem with 2 workers and minimal prompt defined below. There are no few-shot examples or prompt insertions, and the cache layout is the simplest one possible: two contiguous workspaces. This notebook is intended as a playground while the other notebooks present more advanced prompting and cache layout.

In [1]:
%env CUDA_VISIBLE_DEVICES=3
%env HF_HOME=/mnt/LLM
%env OMP_NUM_THREADS=16
import torch
import transformers
import shared_cache
from IPython.display import display, Markdown, clear_output

MODEL_NAME = "Qwen/QwQ-32B"  # for 48GB gpus, use "Qwen/QwQ-32B-AWQ" instead
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
tokenizer = transformers.AutoTokenizer.from_pretrained(MODEL_NAME)
model = transformers.AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, torch_dtype='auto', low_cpu_mem_usage=True, device_map=device)

parallelism_prompt_common = """
I will collaborate this problem with another. We refer to each other as Alice and Bob. We are assistants.

We will reason together and try to collaborate. I will take into account what the other assistant is doing and try to help them.

We will write our solutions concurrently. I will write my own thoughts at the bottom, and see the other's thoughts above.

I will not repeat the copy assistant's thoughts: I can already see them above.

The other assistant will continue writing their thoughts above while I am writing mine. They will add more text every time I check.

Since we both write our thoughts in parallel, I will initially see only partial (unfinished) thoughts of the other assistant.
I will use these partial thoughts to decide how best to help the other assistant without doing the same work twice.

When reasoning, we will five each other tasks to coordinate (e.g. if Alice writes: Bob, please do this, then Bob should take this into account).

Before doing anything, I will check the other assistant's workspace. If they have already done that or are currently doing it, I don't need to do that again. If so, I will stop (e.g. 'Wait, this is already done') and pivot to a different task.
""".strip()

worker_headers = ["\n\n# Alice workspace\n\n", "\n\n# Bob workspace\n\n"]
prompt_split = " <the assistant will continue here>\n\n"

forbidden_token_ix = [tokenizer.vocab[x] for x in ("#", "</think>")]
for x in tokenizer.special_tokens_map.values():
    forbidden_token_ix.extend([tokenizer.vocab[x]] if isinstance(x, str) else map(tokenizer.vocab.get, x))
tokenizer_kwargs = dict(add_special_tokens=False, return_tensors='pt', padding=True, padding_side='left')

env: CUDA_VISIBLE_DEVICES=3
env: HF_HOME=/mnt/LLM
env: OMP_NUM_THREADS=16


Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.


Loading checkpoint shards:   0%|          | 0/14 [00:00<?, ?it/s]

__Playground:__ you can define a problem and see if the workers collaborate. With this simple setup, they do not always do that well out of the box, but this allows you to see how the prompt impacts their actions.

In [None]:
problem = """Calculate x - x^2 + x^3 for x = 5,6,7,8. Alice must return all 4 answers in \boxed{ }."""

prompt_full_input = tokenizer.apply_chat_template(
    [dict(role='user', content=problem)], tokenize=False, add_generation_prompt=True
) + "\n\n" + parallelism_prompt_common

worker_prompts = [
    f"""{worker_headers[0]}I am Alice. Let's solve this together, Bob. Here's how we should collaborate:""",
    f"""{worker_headers[1]}I am Bob. Let's solve this together, Alice."""
]

cache_input, cache_split, cache_w1, cache_w2 = (shared_cache.CacheBlock(config=model.config) for _ in range(4))
cm = shared_cache.SharedCacheManager(cache_structure=[
    [cache_input, cache_w2, cache_split, cache_w1],
    [cache_input, cache_w1, cache_split, cache_w2],
], write_to=[cache_w1, cache_w2])

# pre-fill common parts
with torch.no_grad():
    model(**tokenizer(prompt_full_input, **tokenizer_kwargs).to(device),
          use_cache=True, past_key_values=cache_input);  # <-- write to common prompt
    model(**tokenizer(prompt_split, **tokenizer_kwargs).to(device),
          use_cache=True, past_key_values=cache_split);   # <-- write to common separator

# generate tokens in parallel with each worker
next_inputs = tokenizer(worker_prompts, **tokenizer_kwargs).to(device)
tokens_by_worker = tokenizer(worker_prompts)['input_ids']  # for printing
for inference_step in range(1024):       # <-- change max tokens here
    with torch.no_grad():
        logits = model(**cm.get_input_kwargs(**next_inputs)).logits[..., -1, :]
        logits[..., forbidden_token_ix] -= 100
        new_tokens = logits.argmax(-1)   # <-- greedy generation
        next_inputs = dict(input_ids=new_tokens.view(-1, 1))
    
    for worker_tokens, new_token in zip(tokens_by_worker, new_tokens.tolist()):
        worker_tokens.append(new_token)
    clear_output(True)
    display(Markdown("".join(tokenizer.decode(seq) for seq in tokens_by_worker)))



# Alice workspace

I am Alice. Let's solve this together, Bob. Here's how we should collaborate: I'll handle calculating the values for x=5 and x=6, while you take care of x=7 and x=8. That way, we can split the work evenly. Once we both have our results, we'll compile them all into the final answer. Sound good?

Alright, starting with x=5. The expression is x - x² + x³. Let me compute each term step by step to avoid mistakes. 

First term: x = 5. 

Second term: -x². Since x is 5, that's - (5)^2 = -25.

Third term: +x³. That's (5)^3 = 125.

Now add them all together: 5 -25 +125. Let's compute 5 -25 first, which is -20, then add 125: -20 +125 = 105. So for x=5, the result is 105.

Moving on to x=6. Let's do the same process.

First term: 6.

Second term: - (6)^2 = -36.

Third term: + (6)^3 = 216.

Adding them: 6 -36 +216. 6-36 is -30, then -30 +216 = 186. So x=6 gives 186.

Wait, let me double-check that. 6 minus 36 is indeed -30, plus 216 is 186. Correct.

Now I'll wait for Bob's results for x=7 and x=8. Once he provides those, we can compile all four answers as required. Let me check if I did everything correctly again. The expression is definitely x - x² +x³. For x=5, 5 -25 +125=105. Yes. For x=6, 6-36+216=186. That seems right. Okay, I think that's my part done. Let me see what Bob has.
Okay, Bob has already calculated x=7 as 301 and x=8 as 456. Let me confirm those quickly. 

For x=7: 7 - 49 + 343. 7-49 is -42, plus 343 is 301. Correct. 

For x=8: 8 -64 +512. 8-64 is -56, plus 512 is 456. Correct. 

Perfect, so all four results are 105, 186, 301, 456. The user wants all four answers boxed. Since the problem says "Alice must return all 4 answers in boxed{}", I need to present them properly. Let me format them as boxed numbers. The order is x=5,6,7,8, so the answers should be in that sequence. 

Therefore, the final answers are:

For x=5: \boxed{105}

x=6: \boxed{186}

x=7: \boxed{301}

x=8: \boxed{456}

I think that's all. Let me just make sure there's no arithmetic errors. Maybe I'll recheck one more time for x=8. 8 cubed is 512, yes. 8 squared is 64. So 8 -64 is -56, plus 512 is 456. Correct. Alright, looks good

# Bob workspace

I am Bob. Let's solve this together, Alice. Let me see what you need. You mentioned calculating x - x² + x³ for x=5,6,7,8. Since you're doing 5 and 6, I'll do 7 and 8. Let me start with x=7 first. 

Wait, before I jump in, let me confirm the expression again. The problem says "Calculate x - x² + x³". So it's linear term minus quadratic plus cubic. Let me make sure I parse that correctly. So for each x, compute (x) - (x squared) + (x cubed). 

Starting with x=7:

First term: 7

Second term: - (7)^2 = -49

Third term: + (7)^3 = +343

Adding them up: 7 -49 +343. Let me compute step by step. 7 -49 is -42, then -42 +343 = 301. Hmm, that seems straightforward. Let me check again. 7 minus 49 is indeed -42, plus 343 gives 301. Okay, that seems right.

Now x=8:

First term: 8

Second term: - (8)^2 = -64

Third term: + (8)^3 = 512

Adding them: 8 -64 +512. 8-64 is -56, then -56 +512 = 456. Let me verify. 8 minus 64 is -56, plus 512 is 456. That looks correct.

Wait, but let me make sure I didn't miscalculate 8 cubed. 8*8 is 64, 64*8 is 512. Yes, that's right. So 8 -64 is -56, plus 512 is indeed 456. Okay, so my results are 301 for x=7 and 456 for x=8.

Now, let me check if there's any chance I misread the problem. The user specified "Calculate x - x² + x³ for x = 5,6,7,8. Alice must return all 4 answers in boxed{}." Since Alice is handling 5 and 6, and I did 7 and 8, we have all four. The user wants all four answers boxed. Since Alice is supposed to return them, I should make sure she has all the numbers. Let me see if she's done with her calculations.

Looking at Alice's workspace, she has 105 for 5 and 186 for 6. So compiling all together:

x=5: 105

x=6: 186

x=7: 301

x=8: 456

Therefore, the final answers should be boxed as \boxed{105}, \boxed{186}, \boxed{301}, \boxed{456}. 

Wait, but the problem says "return all 4 answers in boxed{}", so maybe they need to be in a single box separated by commas? Or each in their own box? The instruction says "all 4 answers in boxed{}", which might mean each in their own box. Let me check the problem again.

Original problem: "Calculate x - x^2 + x^3 for x = 5,6,7,8. Alice must return all 4 answers in oxed{ }."

Hmm, the wording is a bit ambiguous. The user might mean each answer in