In [1]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Define the prompt
prompt = "Write a haiku about programming"

# Device setup
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps" if torch.backends.mps.is_available() else "cpu"
)

# Load the fine-tuned model and tokenizer
fine_tuned_model_path = "./SmolLM2-FT-DPO-trl-lib"  # Path where the fine-tuned model is saved
model = AutoModelForCausalLM.from_pretrained(fine_tuned_model_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(fine_tuned_model_path)

# Ensure the pad token is set correctly
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Set pad token to the end-of-sequence token if not defined

# Format the prompt with a chat template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)

# Tokenize the prompt with padding and attention mask
inputs = tokenizer(
    formatted_prompt,
    return_tensors="pt",
    padding=True,  # Enable padding
    truncation=True,  # Ensure the input is not longer than the model's max length
    max_length=512,
).to(device)

# Generate response using the fine-tuned model
output = model.generate(
    inputs["input_ids"],
    attention_mask=inputs["attention_mask"],  # Explicitly set the attention mask
    max_length=512,  # Adjust based on desired response length
    temperature=0.9,  # Adjust for creativity (lower = more deterministic, higher = more creative)
    top_p=0.9,  # Nucleus sampling
    do_sample=True,  # Enable sampling for diverse outputs
)

# Decode the response
response = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the fine-tuned model's response
print("Fine-tuned model response:")
print(response)

Fine-tuned model response:
user
Write a haiku about programming

Write a haiku about programming

What is the purpose of programming?

The purpose of programming is to make software easier to use and improve the quality of software. It helps create the software for your needs and improve the quality of the software. Programming is the process of creating instructions or instructions for a computer to follow. These instructions can be in the form of sentences, words, or even images.

What programming languages are used today?

Today, programming languages include C++, Java, Python, and many others. Each has its own set of features and capabilities, but they all share some common features and ideas.

What is a programming language called?

A programming language is called a programming language when it is designed to be easy to learn and use by humans, rather than machine. It is a simple and efficient way to create programs. Common programming languages include Python, JavaScript, and C+

In [2]:
# Load the fine-tuned model and tokenizer
fine_tuned_model_path = "./SmolLM2-FT-DPO-argilla"  # Path where the fine-tuned model is saved
model = AutoModelForCausalLM.from_pretrained(fine_tuned_model_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(fine_tuned_model_path)

# Ensure the pad token is set correctly
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Set pad token to the end-of-sequence token if not defined

# Format the prompt with a chat template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)

# Tokenize the prompt with padding and attention mask
inputs = tokenizer(
    formatted_prompt,
    return_tensors="pt",
    padding=True,  # Enable padding
    truncation=True,  # Ensure the input is not longer than the model's max length
    max_length=512,
).to(device)

# Generate response using the fine-tuned model
output = model.generate(
    inputs["input_ids"],
    attention_mask=inputs["attention_mask"],  # Explicitly set the attention mask
    max_length=512,  # Adjust based on desired response length
    temperature=0.9,  # Adjust for creativity (lower = more deterministic, higher = more creative)
    top_p=0.9,  # Nucleus sampling
    do_sample=True,  # Enable sampling for diverse outputs
)

# Decode the response
response = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the fine-tuned model's response
print("Fine-tuned model response:")
print(response)

Fine-tuned model response:
system
You are a helpful AI assistant named SmolLM, trained by Hugging Face
user
Write a haiku about programming
assistant
In programming, let a single thread weave its tale,
A thread of code, its threads blend in,
The world's a tapestry that wraps,
Through countless cycles, a thread of thought,
Unraveling the mysteries of programming,
And weaving a tapestry of inspiration,
A tale of programming's countless threads,

By weaving threads of programming, a thread of thought,
A thread of inspiration, a tale of programming's countless threads,
The world's a tapestry that wraps,
Through countless cycles, a thread of thought,
Unraveling the mysteries of programming,
And weaving a tapestry of inspiration,
A tale of programming's countless threads,

The threads of programming are woven,
A thread of thought, a thread of inspiration,
Through countless cycles, a thread of thought,
Unraveling the mysteries of programming,
And weaving a tapestry of inspiration,
A tale of p

In [8]:
# Load the fine-tuned model and tokenizer
fine_tuned_model_path = "./SmolLM2-FT-DPO-argilla2"  # Path where the fine-tuned model is saved
model = AutoModelForCausalLM.from_pretrained(fine_tuned_model_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(fine_tuned_model_path)

# Ensure the pad token is set correctly
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Set pad token to the end-of-sequence token if not defined

# Format the prompt with a chat template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)

# Tokenize the prompt with padding and attention mask
inputs = tokenizer(
    formatted_prompt,
    return_tensors="pt",
    padding=True,  # Enable padding
    truncation=True,  # Ensure the input is not longer than the model's max length
    max_length=512,
).to(device)

# Generate response using the fine-tuned model
output = model.generate(
    inputs["input_ids"],
    attention_mask=inputs["attention_mask"],  # Explicitly set the attention mask
    max_length=512,  # Adjust based on desired response length
    temperature=0.7,  # Adjust for creativity (lower = more deterministic, higher = more creative)
    top_p=0.9,  # Nucleus sampling
    do_sample=True,  # Enable sampling for diverse outputs
)

# Decode the response
response = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the fine-tuned model's response
print("Fine-tuned model response:")
print(response)

Fine-tuned model response:
system
You are a helpful AI assistant named SmolLM, trained by Hugging Face
user
Write a haiku about programming
assistant
In the depths of code, a world of wonder unfolds,
Where the lines blurred, and the secrets lay hidden,
A symphony of programming, a symphony of code,
A symphony of code, a symphony of wonder.

This haiku captures the essence of the programming world, with its intricate dance between syntax, semantics, and creativity. The poem begins by invoking the world of programming, where lines of code are not just raw data but a symphony of meaning and emotion. The language itself is an integral part of the world, where programming languages, algorithms, and data structures come together to create a rich tapestry of meaning and purpose.

The poem then delves into the world of code, where a multitude of programming languages and technologies are intertwined. From the syntax of programming languages to the semantics of data structures, the world of cod

In [9]:
# Load the fine-tuned model and tokenizer
fine_tuned_model_path = "./SmolLM2-FT-DPO-argilla3"  # Path where the fine-tuned model is saved
model = AutoModelForCausalLM.from_pretrained(fine_tuned_model_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(fine_tuned_model_path)

# Ensure the pad token is set correctly
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Set pad token to the end-of-sequence token if not defined

# Format the prompt with a chat template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)

# Tokenize the prompt with padding and attention mask
inputs = tokenizer(
    formatted_prompt,
    return_tensors="pt",
    padding=True,  # Enable padding
    truncation=True,  # Ensure the input is not longer than the model's max length
    max_length=512,
).to(device)

# Generate response using the fine-tuned model
output = model.generate(
    inputs["input_ids"],
    attention_mask=inputs["attention_mask"],  # Explicitly set the attention mask
    max_length=512,  # Adjust based on desired response length
    temperature=0.7,  # Adjust for creativity (lower = more deterministic, higher = more creative)
    top_p=0.9,  # Nucleus sampling
    do_sample=True,  # Enable sampling for diverse outputs
)

# Decode the response
response = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the fine-tuned model's response
print("Fine-tuned model response:")
print(response)

Fine-tuned model response:
user
Write a haiku about programming
Aha! I see you're thinking about programming. What's programming? It's a way of creating and controlling computer programs. There are many different types of programming languages, but some popular ones include Java, Python, and C++.

Java is a popular language for developing Java programs. It's easy to learn and has a lot of built-in features like arrays and exception handling. Python is a popular language for web development, and it's great for creating websites and developing apps. C++ is a language that's used for building large-scale programs, like video games and 3D simulations.

Now, let's talk about programming languages. What's a programming language? A programming language is a set of instructions that tells a computer what to do. It's a set of commands that can be entered and executed by a computer.

So, what's the difference between programming languages and artificial languages like Python? Artificial language

In [14]:
# Load the fine-tuned model and tokenizer
fine_tuned_model_path = "./SmolLM2-FT-DPO-argilla_3b"  # Path where the fine-tuned model is saved
model = AutoModelForCausalLM.from_pretrained(fine_tuned_model_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(fine_tuned_model_path)

# Ensure the pad token is set correctly
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Set pad token to the end-of-sequence token if not defined

# Format the prompt with a chat template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)

# Tokenize the prompt with padding and attention mask
inputs = tokenizer(
    formatted_prompt,
    return_tensors="pt",
    padding=True,  # Enable padding
    truncation=True,  # Ensure the input is not longer than the model's max length
    max_length=512,
).to(device)

# Generate response using the fine-tuned model
output = model.generate(
    inputs["input_ids"],
    attention_mask=inputs["attention_mask"],  # Explicitly set the attention mask
    max_length=512,  # Adjust based on desired response length
    temperature=0.95,  # Adjust for creativity (lower = more deterministic, higher = more creative)
    top_p=0.9,  # Nucleus sampling
    do_sample=True,  # Enable sampling for diverse outputs
)

# Decode the response
response = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the fine-tuned model's response
print("Fine-tuned model response:")
print(response)

Fine-tuned model response:
user
Write a haiku about programming
Write a haiku about computers and programming: a good example is:

Computer, the universal language of science,
And it will serve as a guide in the future.
I'll use this to guide my research and development.
(From the book, The Human Factor: How Culture Shapes Science, 2004)

Your Haiku: Computer, the universal language of science,
It will serve as a guide in the future.
I'll use this to guide my research and development.

You Have A Programming Challenge:
In the video below, Michael Heller and Bill Hecht explain a simple way to program a computer. They say the computer's language is called C and C++. They also show the steps of a C-like programming language. You can learn more about programming languages from the Computer Science, Software Engineering, and Data Mining, Programming Languages and Environments courses.

What's C-Like Language?
C is an open-source, object-oriented, high-level programming language that allows 

In [16]:
# Load the fine-tuned model and tokenizer
fine_tuned_model_path = "./SmolLM2-FT-DPO-argilla1"  # Path where the fine-tuned model is saved
model = AutoModelForCausalLM.from_pretrained(fine_tuned_model_path).to(device)
tokenizer = AutoTokenizer.from_pretrained(fine_tuned_model_path)

# Ensure the pad token is set correctly
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token  # Set pad token to the end-of-sequence token if not defined

# Format the prompt with a chat template
messages = [{"role": "user", "content": prompt}]
formatted_prompt = tokenizer.apply_chat_template(messages, tokenize=False)

# Tokenize the prompt with padding and attention mask
inputs = tokenizer(
    formatted_prompt,
    return_tensors="pt",
    padding=True,  # Enable padding
    truncation=True,  # Ensure the input is not longer than the model's max length
    max_length=512,
).to(device)

# Generate response using the fine-tuned model
output = model.generate(
    inputs["input_ids"],
    attention_mask=inputs["attention_mask"],  # Explicitly set the attention mask
    max_length=512,  # Adjust based on desired response length
    temperature=0.9,  # Adjust for creativity (lower = more deterministic, higher = more creative)
    top_p=0.9,  # Nucleus sampling
    do_sample=True,  # Enable sampling for diverse outputs
)

# Decode the response
response = tokenizer.decode(output[0], skip_special_tokens=True)

# Print the fine-tuned model's response
print("Fine-tuned model response:")
print(response)

Fine-tuned model response:
system
You are a helpful AI assistant named SmolLM, trained by Hugging Face
user
Write a haiku about programming
assistant
In a garden, where trees, ever-blooming, sway, a programmer's mind,
In a world where languages, whispers, and dreams weave,
A programmer, weaving through fields of words,
In a code, their threads weave, a tapestry of spells.

This poem captures the essence of programming: it's a process of creativity, where words come alive and their meanings weave together, making connections that spark imagination. The poem invites the reader to ponder the possibilities of programming, sparking the curiosity to explore the intricacies of programming languages, data structures, and algorithms.

This haiku is a beautiful example of how a poem can evoke a powerful image in the reader's mind. It's not just about conveying a message but also about tapping into the reader's curiosity and encouraging them to think about the world of programming and its vast po