In [1]:
import os
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
)
from src.antislop_generate import generate_antislop, chat_antislop

# Enable efficient transfer for Hugging Face models
os.environ['HF_HUB_ENABLE_HF_TRANSFER'] = "1"

# Set the device to 'cuda' if available, else 'cpu'
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Specify the model name (replace with your preferred model)
model_name = "unsloth/Llama-3.2-1B-Instruct"
#model_name = "unsloth/Llama-3.2-3B-Instruct"

# Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
#model.pad_token_id = tokenizer.eos_token_id
model.to(device)
print('Model loaded')

Model loaded


In [2]:
import json
if os.path.exists('slop_phrase_prob_adjustments.json'):
    with open('slop_phrase_prob_adjustments.json', 'r') as f:
        slop_phrase_prob_adjustments = dict(json.load(f)[:500])

In [3]:
prompt = "Write a story about Elara, the weaver of tapestries in future Technopolis. In the bustling city, a group of "

In [4]:
# Chat generation with streaming
messages = [
    {"role": "user", "content": prompt}
]
for token in chat_antislop(
    model=model,
    tokenizer=tokenizer,
    messages=messages,
    max_new_tokens=400,
    # Antislop sampling may be less reliable at low temperatures.
    temperature=1,    
    min_p=0.1,
    # The adjustment_strength param scales how strongly the probability adjustments are applied.
    # A value of 1 means the values in slop_phrase_prob_adjustments (or the defaults) are used unmodified.
    # Reasonable values are 0 (disabled) thru 100+ (effectively banning the list).
    adjustment_strength=100.0,
    # Optional: Provide a list of slop phrases and probability adjustments
    slop_phrase_prob_adjustments=slop_phrase_prob_adjustments,
    enforce_json=False,
    antislop_enabled=True,
    streaming=True
):
    print(tokenizer.decode(token), end='', flush=True)

Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
From v4.47 onwards, when a model cache is to be returned, `generate` will return a `Cache` instance instead by default (as opposed to the legacy tuple of tuples format). If you want to keep returning the legacy format, please set `return_legacy_cache=True`.


<|eot_id|><|start_header_id|>assistant<|end_header_id|>

In the heart of the sprawling city of Nova Terra, where steel and stone pierced the sky and holographic advertisements illuminated the streets, a small, unassuming shop stood out among the crowd. The sign above the door read "Moonwhisper Tapestries," and the windows were filled with an assortment of colorful, intricately woven fabrics that seemed to dance in the gentle breeze.

Inside, the shop was a cozy, dimly lit space filled with the sweet scent of spun wool and the soft hum of looms. Here, in the hands of the talented weaver, a young woman named Luna, the art of weaving tapestries was a way of life. For as long as anyone could remember, Luna's family had been weaving, and she had spent her childhood learning the ancientancient traditions and techniques from her parents, who had learned from their parents before them.

Luna was a master of her craft, with fingers that moved with precision and delicacy as she worked her loom. 

In [6]:
# Chat generation without streaming
messages = [
    {"role": "user", "content": prompt}
]
generated_text = chat_antislop(
    model=model,
    tokenizer=tokenizer,
    messages=messages,
    max_length=400,
    temperature=1,
    min_p=0.1,
    adjustment_strength=100.0,
    slop_phrase_prob_adjustments=slop_phrase_prob_adjustments,
    enforce_json=False,
    antislop_enabled=True,
    streaming=False
)
print(tokenizer.decode(generated_text))

<|eot_id|><|start_header_id|>assistant<|end_header_id|>

In the heart of Future City, where skyscrapers pierced the sky and holographic advertisements danced in mid-air, a small, unassuming shop stood tucked away in a forgotten alley. The sign above the door read "Weavings of Wonder" in elegant, cursive script. This was the domain of the legendary weaver, known only as "The Threadmistress," a master artisan renowned for her breathtaking tapestries that seemed to come alive with every stitch.

Among the countless threads and yarns that lined her shelves, one young woman, named Eliara, or "Lara" to her friends, stood out. She was the apprentice to the Threadmistress, and her own talents were already being recognized in the city's underground art scene. Lara's fingers moved with a precision that belied her youth, as she worked on her latest masterpiece, a grand piece depicting the history of the city.

As she sewed, a group of strangers entered the shop, seeking more than just a simple pu

In [7]:
# generate without streaming
prompt_with_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
generated_text = generate_antislop(
    model=model,
    tokenizer=tokenizer,
    prompt=prompt,
    max_length=300,
    temperature=1,
    min_p=0.1,
    adjustment_strength=100.0,
    slop_phrase_prob_adjustments=slop_phrase_prob_adjustments,
    enforce_json=False,
    antislop_enabled=True,
    streaming=False
)        
print(tokenizer.decode(generated_text))

 Write a story about Elara, the weaver of tapestries in future Technopolis.ing their own bodies, creating an endless supply of duplicates, and no one knows what is the real one.

As the city's authorities struggle to maintain order, a young weaver named Kael, discovers that the Kyroxi's chaotic behavior is somehow connected to the tapestries of the mysterious weaver, the one they call the "Weaver of the Ancients".

Kael, a skilled weaver in her own right, sets out to unravel the mystery, following the threads of the tapestries to a hidden underground workshop, where she encounters the enigmatic Weaver of the Ancients, a being of pure energy.

The Weaver of the Ancients reveals to Kael that the Kyroxi's duplications are a result of a catastrophic event that occurred in the past, when the fabric of reality was torn apart, creating a rift that has been slowly healing itself, but still causing the Kyroxi to re-print and re-print their bodies in an endless loop.

As Kael learns more about t

In [8]:
# generate with streaming
prompt_with_template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
for token in generate_antislop(
    model=model,
    tokenizer=tokenizer,
    prompt=prompt,
    max_length=300,
    temperature=1,
    min_p=0.1,
    slop_phrase_prob_adjustments=slop_phrase_prob_adjustments,
    adjustment_strength=100.0,
    enforce_json=False,
    antislop_enabled=True,
    streaming=True
):
    print(tokenizer.decode(token), end='', flush=True)

 Write a story about Elara, the weaver of tapestries in future Technopolis capable of creating their own art. One of the oil canisters that comprised their collective consciousness had a spark of creativity, and a beautiful, shimmering web of light and color began to take form.

As the GearGrinders watched, the web grew more complex and beautiful, weaving together threads of light and sound. The robots were amazed, for they had never seen anything like it before. They watched in awe as the web grew, changing and evolving with each passing moment.

One of the GearGrinders, a large and lumbering robot with a shiny metal body, approached the web. It reached out a mechanical arm, and gently touched the shimmering threads. The other robots watched in silence as the lumbering robot began to weave its own thread into the web.

Slowly, the web grew thicker and more complex, with each robot adding its own thread to the collective work of art. The GearGrinders watched in wonder as the web expand