## Idea

I have an idea of implementing a Chatbot with the style “Sheldon” from Big Bang. It would be interesting.

It’s just an idea now, I may use LSTM or some smaller pre trained language models. I’m still doing research.

## Test the origin Model

1. Install dependency

In [1]:
# Install required packages
!pip install -q transformers torch pandas numpy scikit-learn tqdm

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m363.4/363.4 MB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.8/13.8 MB[0m [31m111.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.6/24.6 MB[0m [31m80.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m883.7/883.7 kB[0m [31m62.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m664.8/664.8 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m211.5/211.5 MB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m56.3/56.3 MB[0m [31m41.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 MB[0m [31m19.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

2. Import packages

In [2]:
# Import necessary libraries
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

print("Libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

Libraries imported successfully!
PyTorch version: 2.6.0+cu124
CUDA available: True
CUDA device: NVIDIA A100-SXM4-40GB


3. Load original Dialo model

In [3]:
# Load the base DialoGPT model and tokenizer
print("Loading DialoGPT-small model...")
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-small")
print("Model loaded successfully!")

# Set pad token if not exists
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
    print("Set pad_token to eos_token")

Loading DialoGPT-small model...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/641 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/351M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

Model loaded successfully!
Set pad_token to eos_token


4. Test the origin model

In [4]:
# Test the base model with some simple prompts
print("Testing base DialoGPT model...")
print("=" * 50)

prompts = [
    "Hello, how are you?",
    "What can you do?",
    "Tell me about yourself",
    "Do you like science?"
]

chat_history_ids = None

for i, prompt in enumerate(prompts):
    print(f"\nUser: {prompt}")

    # Encode the input
    new_user_input_ids = tokenizer.encode(prompt + tokenizer.eos_token, return_tensors='pt')

    # Concatenate with chat history if exists
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if chat_history_ids is not None else new_user_input_ids

    # Generate response
    chat_history_ids = model.generate(
        bot_input_ids,
        max_length=1000,
        pad_token_id=tokenizer.eos_token_id,
        no_repeat_ngram_size=3,
        do_sample=True,
        top_k=50,
        top_p=0.9,
        temperature=0.7
    )

    # Decode the response
    response = tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)
    print(f"DialoGPT: {response}")
    print("-" * 30)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Testing base DialoGPT model...

User: Hello, how are you?
DialoGPT: hey guys
------------------------------

User: What can you do?
DialoGPT: I can do this!
------------------------------

User: Tell me about yourself
DialoGPT: What do you want to do with your time?
------------------------------

User: Do you like science?
DialoGPT: How can I become an astronaut?
------------------------------


## Config the Model

5. Model Config

In [5]:
# Simple configuration for our Sheldon chatbot training
class TrainingConfig:
    def __init__(self):
        # Model settings
        self.model_name = 'microsoft/DialoGPT-small'
        self.output_dir = 'sheldon_model'

        # Training parameters
        self.learning_rate = 5e-5
        self.num_epochs = 1  # Reduced for faster training
        self.batch_size = 16   # Smaller batch size for Colab
        self.max_length = 512

        # Data settings
        self.context_length = 5  # Number of previous responses to use as context

        # Generation parameters
        self.temperature = 0.8
        self.top_k = 50
        self.top_p = 0.9

# Create config instance
config = TrainingConfig()
print("Training configuration created!")
print(f"Model: {config.model_name}")
print(f"Learning rate: {config.learning_rate}")
print(f"Epochs: {config.num_epochs}")
print(f"Batch size: {config.batch_size}")

Training configuration created!
Model: microsoft/DialoGPT-small
Learning rate: 5e-05
Epochs: 1
Batch size: 16


6. Import necessary packages

In [6]:
# Import only the libraries we actually need
import os
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from torch.optim import AdamW  # Correct import path
from sklearn.model_selection import train_test_split
from transformers import AutoTokenizer, AutoModelForCausalLM
from tqdm.notebook import tqdm

print("Libraries imported successfully!")

Libraries imported successfully!


7. check cpu/gpu

In [7]:
# Check available device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
else:
    print("Using CPU - training will be slower")

Using device: cuda
GPU: NVIDIA A100-SXM4-40GB
GPU Memory: 42.5 GB


## Prepare datasets

8. download bigbang

In [9]:
# Download the Big Bang Theory dataset from Kaggle
import kagglehub

print("Downloading Big Bang Theory dataset...")
the_big_bang_theory_series_transcript_path = kagglehub.dataset_download('mitramir5/the-big-bang-theory-series-transcript')
print('Dataset download complete!')
print(f"Dataset path: {the_big_bang_theory_series_transcript_path}")

Downloading Big Bang Theory dataset...
Dataset download complete!
Dataset path: /kaggle/input/the-big-bang-theory-series-transcript


9. look at the datasets

In [10]:
# Explore the downloaded dataset
import os

print("Available files in the dataset:")
# Use the actual downloaded path
dataset_path = '/root/.cache/kagglehub/datasets/mitramir5/the-big-bang-theory-series-transcript/versions/4'
for dirname, _, filenames in os.walk(dataset_path):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Load the main transcript file from the correct path
path = os.path.join(dataset_path, '1_10_seasons_tbbt.csv')
df = pd.read_csv(path)

print(f"\nDataset shape: {df.shape}")
print("\nFirst few rows:")
print(df.head())

print("\nColumn names:")
print(df.columns.tolist())

Available files in the dataset:
/root/.cache/kagglehub/datasets/mitramir5/the-big-bang-theory-series-transcript/versions/4/sentences_sentiment_dicts.pkl
/root/.cache/kagglehub/datasets/mitramir5/the-big-bang-theory-series-transcript/versions/4/1_10_seasons_tbbt.csv

Dataset shape: (54406, 3)

First few rows:
                           episode_name  \
0  Series 01 Episode 01 – Pilot Episode   
1  Series 01 Episode 01 – Pilot Episode   
2  Series 01 Episode 01 – Pilot Episode   
3  Series 01 Episode 01 – Pilot Episode   
4  Series 01 Episode 01 – Pilot Episode   

                                            dialogue person_scene  
0                        A corridor at a sperm bank.        Scene  
1   So if a photon is directed through a plane wi...      Sheldon  
2                         Agreed, what’s your point?      Leonard  
3   There’s no point, I just think it’s a good id...      Sheldon  
4                                         Excuse me?      Leonard  

Column names:
['episod

10. preprocess dataset

In [11]:
# Convert dataset to conversation format with context
print("Creating conversation context...")

contexted = []
n = config.context_length  # Use 5 previous responses as context

for i in range(n, len(df['dialogue'])):
    row = []
    prev = i - 1 - n  # Get n previous responses
    for j in range(i, prev, -1):
        row.append(df['dialogue'][j])
    contexted.append(row)

print(f"Created {len(contexted)} conversation contexts")

# Create DataFrame with context columns
columns = ['response', 'context']
columns = columns + [f'context_{i}' for i in range(n-1)]

df_context = pd.DataFrame.from_records(contexted, columns=columns)
print(f"\nContext DataFrame shape: {df_context.shape}")
print("\nFirst few rows with context:")
print(df_context.head(3))

Creating conversation context...
Created 54401 conversation contexts

Context DataFrame shape: (54401, 6)

First few rows with context:
                                            response  \
0                                          Hang on.    
1   One across is Aegean, eight down is Nabakov, ...   
2                                    Can I help you?   

                                             context  \
0                                         Excuse me?   
1                                          Hang on.    
2   One across is Aegean, eight down is Nabakov, ...   

                                           context_0  \
0   There’s no point, I just think it’s a good id...   
1                                         Excuse me?   
2                                          Hang on.    

                                           context_1  \
0                         Agreed, what’s your point?   
1   There’s no point, I just think it’s a good id...   
2                    

11. split the datasets

In [12]:
# Split dataset into training and validation sets
print("Splitting dataset...")

trn_df, val_df = train_test_split(df_context, test_size=0.1, random_state=42)
print(f"Training set size: {len(trn_df)}")
print(f"Validation set size: {len(val_df)}")

# Show some examples
print("\nTraining set examples:")
print(trn_df.head(2))

Splitting dataset...
Training set size: 48960
Validation set size: 5441

Training set examples:
                                               response  \
42881                             Everything all right?   
33322   And we can just put this whole thing behind us.   

                                context  \
42881   Well, the night is still young.   
33322                             Fine.   

                                               context_0  \
42881   Thank you. Finally, there’s a Mrs. Hofstadter...   
33322   Okay, I found the, uh, court papers that you ...   

                                               context_1  \
42881                                       Nicely done.   
33322   No. But I did get tackled in the hallway once...   

                        context_2                               context_3  
42881                   I’m okay.   Okay, let’s do it. You gonna make it?  
33322   A little in college. You?                 You ever play football?  


12. Create Train dataset class

In [13]:
# Create a custom dataset class for training
class ConversationDataset(Dataset):
    def __init__(self, tokenizer, df, max_length=512):
        self.tokenizer = tokenizer
        self.df = df
        self.max_length = max_length

    def __len__(self):
        return len(self.df)

    def __getitem__(self, idx):
        row = self.df.iloc[idx]

        # Combine response and context into a single text
        # Format: context + [SEP] + response + [EOS]
        context_text = " ".join([str(x) for x in row[1:] if pd.notna(x) and str(x).strip()])
        response_text = str(row['response']) if pd.notna(row['response']) else ""

        # Combine context and response
        full_text = f"{context_text} [SEP] {response_text}"

        # Tokenize
        encoding = self.tokenizer(
            full_text,
            truncation=True,
            max_length=self.max_length,
            padding='max_length',
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': encoding['input_ids'].flatten()
        }

print("Dataset class created successfully!")

Dataset class created successfully!


13. prepare train dataset

In [14]:
# Prepare training and validation datasets
print("Preparing datasets...")

# Create datasets
train_dataset = ConversationDataset(tokenizer, trn_df, max_length=config.max_length)
val_dataset = ConversationDataset(tokenizer, val_df, max_length=config.max_length)

print(f"Training dataset size: {len(train_dataset)}")
print(f"Validation dataset size: {len(val_dataset)}")

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=config.batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=config.batch_size, shuffle=False)

print("Data loaders created successfully!")

Preparing datasets...
Training dataset size: 48960
Validation dataset size: 5441
Data loaders created successfully!


## Train and Evaluation

14. Set hyperparameters and Optimizer

In [15]:
# Set up training parameters and optimizer
print("Setting up training...")

# Move model to device
model = model.to(device)
print(f"Model moved to {device}")

# Set up optimizer
optimizer = AdamW(model.parameters(), lr=config.learning_rate)
print(f"Optimizer created with learning rate: {config.learning_rate}")

# Set up learning rate scheduler
from transformers import get_linear_schedule_with_warmup
total_steps = len(train_loader) * config.num_epochs
scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=0,
    num_training_steps=total_steps
)
print(f"Learning rate scheduler created for {total_steps} total steps")

Setting up training...
Model moved to cuda
Optimizer created with learning rate: 5e-05
Learning rate scheduler created for 3060 total steps


15. Train

In [16]:
# Training loop
print("Starting training...")
print("=" * 50)

model.train()
total_loss = 0
best_loss = float('inf')

for epoch in range(config.num_epochs):
    print(f"\nEpoch {epoch + 1}/{config.num_epochs}")
    epoch_loss = 0

    # Training
    progress_bar = tqdm(train_loader, desc=f"Training Epoch {epoch + 1}")
    for batch_idx, batch in enumerate(progress_bar):
        # Move batch to device
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        # Forward pass
        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels
        )

        loss = outputs.loss
        epoch_loss += loss.item()

        # Backward pass
        loss.backward()
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()

        # Update progress bar
        progress_bar.set_postfix({'loss': f'{loss.item():.4f}'})

        # Print loss every 1000 steps
        if (batch_idx + 1) % 1000 == 0:
            print(f"Step {batch_idx + 1}, Loss: {loss.item():.4f}")

    avg_epoch_loss = epoch_loss / len(train_loader)
    total_loss += avg_epoch_loss

    print(f"Epoch {epoch + 1} completed. Average loss: {avg_epoch_loss:.4f}")

    # Save model checkpoint
    if avg_epoch_loss < best_loss:
        best_loss = avg_epoch_loss
        model.save_pretrained(f"{config.output_dir}/best_model")
        tokenizer.save_pretrained(f"{config.output_dir}/best_model")
        print(f"New best model saved with loss: {best_loss:.4f}")

print(f"\nTraining completed! Total average loss: {total_loss/config.num_epochs:.4f}")

Starting training...

Epoch 1/1


Training Epoch 1:   0%|          | 0/3060 [00:00<?, ?it/s]

`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.


Step 1000, Loss: 0.6513
Step 2000, Loss: 0.5505
Step 3000, Loss: 0.6166
Epoch 1 completed. Average loss: 0.6386
New best model saved with loss: 0.6386

Training completed! Total average loss: 0.6386


16. Load the Trained Model

In [17]:
# Load the trained model
print("Loading trained Sheldon model...")

# Load model and tokenizer from saved directory
trained_model = AutoModelForCausalLM.from_pretrained(f"{config.output_dir}/best_model")
trained_tokenizer = AutoTokenizer.from_pretrained(f"{config.output_dir}/best_model")

# Move to device
trained_model = trained_model.to(device)
trained_model.eval()  # Set to evaluation mode

print("Trained model loaded successfully!")

Loading trained Sheldon model...
Trained model loaded successfully!


17. Create a Chat Function

In [20]:
# Function to chat with Sheldon
def chat_with_sheldon(user_input, chat_history_ids=None, max_length=200):
    """
    Chat with the trained Sheldon model

    Args:
        user_input (str): User's message
        chat_history_ids: Previous chat history
        max_length (int): Maximum length of generated response

    Returns:
        str: Sheldon's response
        tensor: Updated chat history
    """
    # Encode user input and move to device
    new_user_input_ids = trained_tokenizer.encode(user_input + trained_tokenizer.eos_token, return_tensors='pt')
    new_user_input_ids = new_user_input_ids.to(device)  # Move to device

    # Concatenate with chat history if exists
    if chat_history_ids is not None:
        chat_history_ids = chat_history_ids.to(device)  # Ensure chat history is on device
        bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)
    else:
        bot_input_ids = new_user_input_ids

    # Generate response
    chat_history_ids = trained_model.generate(
        bot_input_ids,
        max_length=max_length,
        pad_token_id=trained_tokenizer.eos_token_id,
        no_repeat_ngram_size=3,
        do_sample=True,
        top_k=50,
        top_p=0.9,
        temperature=0.8
    )

    # Decode the response
    response = trained_tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)

    return response, chat_history_ids

print("Chat function created successfully!")

Chat function created successfully!


18. Let's Chat with Little Sheldon

In [21]:
# Test conversation with Sheldon
print("Let's chat with Sheldon!")
print("=" * 50)

# Test prompts
test_prompts = [
    "Hello Sheldon, how are you?",
    "What do you think about quantum physics?",
    "Tell me about your roommate Leonard",
    "Do you like comic books?",
    "What's your favorite spot on the couch?"
]

chat_history_ids = None

for prompt in test_prompts:
    print(f"\nUser: {prompt}")

    # Get Sheldon's response
    response, chat_history_ids = chat_with_sheldon(prompt, chat_history_ids)

    print(f"Sheldon: {response}")
    print("-" * 40)

Let's chat with Sheldon!

User: Hello Sheldon, how are you?
Sheldon: ‘Cause when I get back, it’ll be over.   No, you’re not.  That’s right, it will be.  Oh, that’d be a nice one. [SEP]  I think you‘re on to something.
----------------------------------------

User: What do you think about quantum physics?
Sheldon: 
----------------------------------------

User: Tell me about your roommate Leonard
Sheldon: 
----------------------------------------

User: Do you like comic books?
Sheldon: .
----------------------------------------

User: What's your favorite spot on the couch?
Sheldon: , uh, Leonard?
----------------------------------------


## Improve the Model and Train Process

19. Improve the Hyperparameters

In [23]:
# 改进配置 - 增加训练轮数和调整参数
config.num_epochs = 3  # 从1增加到3（平衡训练时间和效果）
config.learning_rate = 5e-5  # 保持当前学习率
config.batch_size = 8  # 从16减少到8（更稳定）

print("Updated config for better training:")
print(f"Epochs: {config.num_epochs}")
print(f"Learning rate: {config.learning_rate}")
print(f"Batch size: {config.batch_size}")
print(f"Total training steps: {len(train_loader) * config.num_epochs}")

Updated config for better training:
Epochs: 3
Learning rate: 5e-05
Batch size: 8
Total training steps: 9180


20. Set Train Parameters

In [24]:
# 重新设置训练参数
print("Setting up training with improved config...")

# Move model to device
model = model.to(device)
print(f"Model moved to {device}")

# Set up optimizer
optimizer = AdamW(model.parameters(), lr=config.learning_rate)
print(f"Optimizer created with learning rate: {config.learning_rate}")

# Set up learning rate scheduler
total_steps = len(train_loader) * config.num_epochs
scheduler = get_linear_schedule_with_warmup(
    optimizer,
    num_warmup_steps=0,
    num_training_steps=total_steps
)
print(f"Learning rate scheduler created for {total_steps} total steps")

Setting up training with improved config...
Model moved to cuda
Optimizer created with learning rate: 5e-05
Learning rate scheduler created for 9180 total steps


21. Re-train

In [25]:
# 开始训练
print("Starting improved training...")
print("=" * 50)

model.train()
total_loss = 0
best_loss = float('inf')

for epoch in range(config.num_epochs):
    print(f"\nEpoch {epoch + 1}/{config.num_epochs}")
    epoch_loss = 0

    # Training
    progress_bar = tqdm(train_loader, desc=f"Training Epoch {epoch + 1}")
    for batch_idx, batch in enumerate(progress_bar):
        # Move batch to device
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)

        # Forward pass
        outputs = model(
            input_ids=input_ids,
            attention_mask=attention_mask,
            labels=labels
        )

        loss = outputs.loss
        epoch_loss += loss.item()

        # Backward pass
        loss.backward()
        optimizer.step()
        scheduler.step()
        optimizer.zero_grad()

        # Update progress bar
        progress_bar.set_postfix({'loss': f'{loss.item():.4f}'})

        # Print loss every 1000 steps
        if (batch_idx + 1) % 1000 == 0:
            print(f"Step {batch_idx + 1}, Loss: {loss.item():.4f}")

    avg_epoch_loss = epoch_loss / len(train_loader)
    total_loss += avg_epoch_loss

    print(f"Epoch {epoch + 1} completed. Average loss: {avg_epoch_loss:.4f}")

    # Save model checkpoint
    if avg_epoch_loss < best_loss:
        best_loss = avg_epoch_loss
        model.save_pretrained(f"{config.output_dir}/best_model_v2")
        tokenizer.save_pretrained(f"{config.output_dir}/best_model_v2")
        print(f"New best model saved with loss: {best_loss:.4f}")

print(f"\nTraining completed! Total average loss: {total_loss/config.num_epochs:.4f}")

Starting improved training...

Epoch 1/3


Training Epoch 1:   0%|          | 0/3060 [00:00<?, ?it/s]

Step 1000, Loss: 0.5188
Step 2000, Loss: 0.5585
Step 3000, Loss: 0.5371
Epoch 1 completed. Average loss: 0.5450
New best model saved with loss: 0.5450

Epoch 2/3


Training Epoch 2:   0%|          | 0/3060 [00:00<?, ?it/s]

Step 1000, Loss: 0.5097
Step 2000, Loss: 0.4595
Step 3000, Loss: 0.4291
Epoch 2 completed. Average loss: 0.4841
New best model saved with loss: 0.4841

Epoch 3/3


Training Epoch 3:   0%|          | 0/3060 [00:00<?, ?it/s]

Step 1000, Loss: 0.4223
Step 2000, Loss: 0.4169
Step 3000, Loss: 0.3919
Epoch 3 completed. Average loss: 0.4549
New best model saved with loss: 0.4549

Training completed! Total average loss: 0.4946


22. Load the Model

In [26]:
# Load the improved trained model
print("Loading improved trained Sheldon model...")

# Load model and tokenizer from saved directory
improved_model = AutoModelForCausalLM.from_pretrained(f"{config.output_dir}/best_model_v2")
improved_tokenizer = AutoTokenizer.from_pretrained(f"{config.output_dir}/best_model_v2")

# Move to device
improved_model = improved_model.to(device)
improved_model.eval()  # Set to evaluation mode

print("Improved trained model loaded successfully!")

Loading improved trained Sheldon model...
Improved trained model loaded successfully!


23. Improved Little Sheldon Talk Model

In [27]:
# Improved chat function with better generation parameters
def chat_with_sheldon_improved(user_input, chat_history_ids=None, max_length=150):
    """
    Improved chat function with better generation parameters
    """
    # Encode user input and move to device
    new_user_input_ids = improved_tokenizer.encode(user_input + improved_tokenizer.eos_token, return_tensors='pt')
    new_user_input_ids = new_user_input_ids.to(device)

    # Concatenate with chat history if exists
    if chat_history_ids is not None:
        chat_history_ids = chat_history_ids.to(device)
        bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)
    else:
        bot_input_ids = new_user_input_ids

    # Generate response with improved parameters
    chat_history_ids = improved_model.generate(
        bot_input_ids,
        max_length=max_length,
        pad_token_id=improved_tokenizer.eos_token_id,
        no_repeat_ngram_size=2,  # 减少重复
        do_sample=True,
        top_k=20,  # 减少top_k
        top_p=0.8,  # 减少top_p
        temperature=0.6,  # 降低温度，更确定性
        repetition_penalty=1.2  # 添加重复惩罚
    )

    # Decode the response
    response = improved_tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)

    return response, chat_history_ids

print("Improved chat function created successfully!")

Improved chat function created successfully!


24. Chat Function

In [29]:
# Fixed chat function with better length handling
def chat_with_sheldon_fixed(user_input, chat_history_ids=None, max_new_tokens=100):
    """
    Fixed chat function with better length handling
    """
    # Encode user input and move to device
    new_user_input_ids = improved_tokenizer.encode(user_input + improved_tokenizer.eos_token, return_tensors='pt')
    new_user_input_ids = new_user_input_ids.to(device)

    # Concatenate with chat history if exists
    if chat_history_ids is not None:
        chat_history_ids = chat_history_ids.to(device)
        bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)
    else:
        bot_input_ids = new_user_input_ids

    # Generate response with fixed parameters
    chat_history_ids = improved_model.generate(
        bot_input_ids,
        max_new_tokens=max_new_tokens,  # 使用max_new_tokens而不是max_length
        pad_token_id=improved_tokenizer.eos_token_id,
        no_repeat_ngram_size=2,
        do_sample=True,
        top_k=20,
        top_p=0.8,
        temperature=0.6,
        repetition_penalty=1.2
    )

    # Decode the response
    response = improved_tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)

    return response, chat_history_ids

print("Fixed chat function created successfully!")

Fixed chat function created successfully!


25. Chat

In [30]:
# Test conversation with fixed Sheldon
print("Let's chat with the fixed Sheldon!")
print("=" * 50)

# Test prompts
test_prompts = [
    "Hello Sheldon, how are you?",
    "What do you think about quantum physics?",
    "Tell me about your roommate Leonard",
    "Do you like comic books?",
    "What's your favorite spot on the couch?",
    "Bazinga!"
]

chat_history_ids = None

for prompt in test_prompts:
    print(f"\nUser: {prompt}")

    # Get Sheldon's response
    response, chat_history_ids = chat_with_sheldon_fixed(prompt, chat_history_ids)

    print(f"Sheldon: {response}")
    print("-" * 40)

Let's chat with the fixed Sheldon!

User: Hello Sheldon, how are you?
Sheldon: Knock on door.  Oh! I’m sorry to hear that… (entering) Hello Penny and Leonard. We just wanted a little privacy before we go in the hallway for an hour without my phone calling me back home from work today at eight o ‘clock: 00 pst., and no one has told us yet if there will be any surprises after nine o clock or ten ooh-oohs till dawn until then. And last but not least it would help
----------------------------------------

User: What do you think about quantum physics?
Sheldon:  The apartment building lobby is playing Rock Band with live music samples synced up through walls as they descend into song of Doom by Nine In Diamond. There goes your night cap again…. [SEP]  Hi guys what brings everybody here tonight so early this morning while Howard was out doing laundry he got bored driving around town like some kinda crazy person who lives across his hall singing Howie Mandelbrot backwards all over him. It was