Based on the paper [Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach](https://arxiv.org/pdf/2502.05171), the Huginn-0125 model takes advantage of inference time recurrent layers.

Let's look at exactly what this means:

![image](https://i.imgur.com/pTSofbN.png)

So when we do inference, we have the ability to select how many green boxes we are using.

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig

model = AutoModelForCausalLM.from_pretrained("tomg-group-umd/huginn-0125", torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("tomg-group-umd/huginn-0125")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
model.eval()
config = GenerationConfig(max_length=256, stop_strings=["<|end_text|>", "<|end_turn|>"],
                          use_cache=True,
                          do_sample=False, temperature=None, top_k=None, top_p=None, min_p=None,
                          return_dict_in_generate=True,
                          eos_token_id=65505,bos_token_id=65504,pad_token_id=65509)
model.to(device)

RavenForCausalLM(
  (transformer): ModuleDict(
    (wte): Embedding(65536, 5280)
    (prelude): ModuleList(
      (0-1): 2 x SandwichBlock(
        (norm_1): RMSNorm()
        (attn): CausalSelfAttention(
          (Wqkv): Linear(in_features=5280, out_features=15840, bias=False)
          (proj): Linear(in_features=5280, out_features=5280, bias=False)
        )
        (norm_2): RMSNorm()
        (mlp): GatedMLP(
          (fc): Linear(in_features=5280, out_features=35840, bias=False)
          (proj): Linear(in_features=17920, out_features=5280, bias=False)
          (nonlin): SiLU()
        )
        (norm_3): RMSNorm()
        (norm_4): RMSNorm()
      )
    )
    (adapter): Linear(in_features=10560, out_features=5280, bias=False)
    (core_block): ModuleList(
      (0-3): 4 x SandwichBlock(
        (norm_1): RMSNorm()
        (attn): CausalSelfAttention(
          (Wqkv): Linear(in_features=5280, out_features=15840, bias=False)
          (proj): Linear(in_features=5280, out_feature

## Clip Example

In [None]:
messages = []
messages.append({"role": "system", "content" : "You are a helpful assistant."})
messages.append({"role": "user", "content" : "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"})
chat_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(chat_input)
input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)

outputs = model.generate(input_ids, config, num_steps=4, tokenizer=tokenizer)
tokenizer.batch_decode(outputs[0], skip_special_tokens=True)

<|begin_text|><|begin_header|>system<|end_header|>

You are a helpful assistant.<|end_turn|><|begin_header|>user<|end_header|>

Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?<|end_turn|><|begin_header|>Huginn<|end_header|>




['system\n\nYou are a helpful assistant.user\n\nNatalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?Huginn\n\nNatalia sold 48 clips in April. In May, she sold half as many clips as she sold in April. In total, Natalia sold 48 + 48 = 96 clips in April and May.']

In [None]:
input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)

outputs = model.generate(input_ids, config, num_steps=8, tokenizer=tokenizer)
tokenizer.batch_decode(outputs[0], skip_special_tokens=True)

['system\n\nYou are a helpful assistant.user\n\nNatalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?Huginn\n\nNatalia sold 48 clips in April.\nIn May, she sold half as many clips as in April, which is 48 / 2 = 24 clips.\nTo find the total number of clips sold in April and May, we add the number of clips sold in each month: 48 + 24 = 72 clips.\nTherefore, Natalia sold 72 clips altogether in April and May.']

In [None]:
input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)

outputs = model.generate(input_ids, config, num_steps=16, tokenizer=tokenizer)
tokenizer.batch_decode(outputs[0], skip_special_tokens=True)

['system\n\nYou are a helpful assistant.user\n\nNatalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?Huginn\n\nNatalia sold 48 clips in April. In May, she sold half as many clips as in April, which means she sold 48 / 2 = 24 clips in May. To find the total number of clips she sold in April and May, we add the number of clips sold in each month: 48 + 24 = 72 clips. Therefore, Natalia sold 72 clips altogether in April and May.']

## Weight Example

In [None]:
messages = []
messages.append({"role": "system", "content" : "You are a helpful assistant."})
messages.append({"role": "user", "content" : "Ken created a care package to send to his brother, who was away at boarding school. Ken placed a box on a scale, and then he poured into the box enough jelly beans to bring the weight to 2 pounds. Then, he added enough brownies to cause the weight to triple. Next, he added another 2 pounds of jelly beans. And finally, he added enough gummy worms to double the weight once again. What was the final weight of the box of goodies, in pounds?"})
chat_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(chat_input)
input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)

outputs = model.generate(input_ids, config, num_steps=4, tokenizer=tokenizer)
tokenizer.batch_decode(outputs[0], skip_special_tokens=True)

<|begin_text|><|begin_header|>system<|end_header|>

You are a helpful assistant.<|end_turn|><|begin_header|>user<|end_header|>

Ken created a care package to send to his brother, who was away at boarding school. Ken placed a box on a scale, and then he poured into the box enough jelly beans to bring the weight to 2 pounds. Then, he added enough brownies to cause the weight to triple. Next, he added another 2 pounds of jelly beans. And finally, he added enough gummy worms to double the weight once again. What was the final weight of the box of goodies, in pounds?<|end_turn|><|begin_header|>Huginn<|end_header|>




['system\n\nYou are a helpful assistant.user\n\nKen created a care package to send to his brother, who was away at boarding school. Ken placed a box on a scale, and then he poured into the box enough jelly beans to bring the weight to 2 pounds. Then, he added enough brownies to cause the weight to triple. Next, he added another 2 pounds of jelly beans. And finally, he added enough gummy worms to double the weight once again. What was the final weight of the box of goodies, in pounds?Huginn\n\nThe final weight of the box of goodies is 10 pounds.\nThe weight of the box of jelly beans is 2 pounds.\nThe weight of the box of brownies is 1 pound.\nThe weight of the box of gummy worms is 2 pounds.\nThe weight of the box of double the weight of the box of gummy worms is 2 pounds.\nThe weight of the box of double the weight of the box of gummy worms is 2 pounds.\nThe weight of the box of double the weight of the box of gummy worms is 2 pounds.\nThe weight of the box of double the weight of the 

In [None]:
input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)

outputs = model.generate(input_ids, config, num_steps=8, tokenizer=tokenizer)
tokenizer.batch_decode(outputs[0], skip_special_tokens=True)

['system\n\nYou are a helpful assistant.user\n\nKen created a care package to send to his brother, who was away at boarding school. Ken placed a box on a scale, and then he poured into the box enough jelly beans to bring the weight to 2 pounds. Then, he added enough brownies to cause the weight to triple. Next, he added another 2 pounds of jelly beans. And finally, he added enough gummy worms to double the weight once again. What was the final weight of the box of goodies, in pounds?Huginn\n\nThe final weight of the box of goodies is 12 pounds.']

In [None]:
input_ids = tokenizer.encode(chat_input, return_tensors="pt", add_special_tokens=False).to(device)

outputs = model.generate(input_ids, config, num_steps=16, tokenizer=tokenizer)
tokenizer.batch_decode(outputs[0], skip_special_tokens=True)

["system\n\nYou are a helpful assistant.user\n\nKen created a care package to send to his brother, who was away at boarding school. Ken placed a box on a scale, and then he poured into the box enough jelly beans to bring the weight to 2 pounds. Then, he added enough brownies to cause the weight to triple. Next, he added another 2 pounds of jelly beans. And finally, he added enough gummy worms to double the weight once again. What was the final weight of the box of goodies, in pounds?Huginn\n\nLet's calculate the final weight of Ken's care package step by step:\n1. The box initially weighed 2 pounds.\n2. Ken added enough jelly beans to bring the weight to 2 pounds.\n3. Then, he added enough brownies to cause the weight to triple.\n4. The weight tripled to 3 x 2 = 6 pounds.\n5. Next, Ken added another 2 pounds of jelly beans.\n6. The weight now becomes 6 + 2 = 8 pounds.\n7. Finally, Ken added enough gummy worms to double the weight once again.\n8. The weight doubled to 2 x 8 = 16 pounds.