<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_02_2_multi_prompt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative Artificial Intelligence
**Module 2: Code Generation**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 2 Material

* Part 2.1: Prompting for Code Generation [[Video]](https://www.youtube.com/watch?v=HVId6kYKKgQ) [[Notebook]](t81_559_class_02_1_dev.ipynb)
* **Part 2.2: Handling Revision Prompts** [[Video]](https://www.youtube.com/watch?v=APpV46tplXA) [[Notebook]](t81_559_class_02_2_multi_prompt.ipynb)
* Part 2.3: Using a LLM to Help Debug [[Video]](https://www.youtube.com/watch?v=VPqSNb38QK0) [[Notebook]](t81_559_class_02_3_llm_debug.ipynb)
* Part 2.4: Tracking Prompts in Software Development [[Video]](https://www.youtube.com/watch?v=oUFUuYfvXZU) [[Notebook]](t81_559_class_02_4_software_eng.ipynb)
* Part 2.5: Limits of LLM Code Generation [[Video]](https://www.youtube.com/watch?v=dKtRI0LZSyY) [[Notebook]](t81_559_class_02_5_code_gen_limits.ipynb)


# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [1]:
import os

try:
    from google.colab import drive, userdata
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai

Note: using Google CoLab
Collecting langchain_openai
  Downloading langchain_openai-0.3.30-py3-none-any.whl.metadata (2.4 kB)
Downloading langchain_openai-0.3.30-py3-none-any.whl (74 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.4/74.4 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: langchain_openai
Successfully installed langchain_openai-0.3.30


# 2.2: Handling Revision Prompts

Previously, we just sent one prompt to the LLM, which generated code. It is possible to perform this code more conversationally. In this module, we will see how to converse with the LLM to request changes to outputted code and even help the LLM to produce a more accurate model.

We will also see that it might be beneficial to recreate your conversation as one single prompt that generates the final result. Keeping track of one prompt, rather than a conversation, that created your final code is more maintainable.

## Conversational Code Generation

We will introduce a more advanced code generation function that allows you to start the conversation to generate code and follow up with additional prompts if needed.

In future modules, we will see how to create chatbots similar to this one. We will use the code I provided to generate your code for now. This generator uses a system prompt that requests that the generated code conform to the following:

* Imports should be sorted
* Code should conform to PEP-8 formatting
* Do not mix uncompilable notes with code
* Add comments

In [2]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferWindowMemory
from langchain_openai import ChatOpenAI
from langchain_core.prompts.chat import PromptTemplate
from IPython.display import display_markdown

MODEL = 'gpt-5-mini'
TEMPLATE = """The following is a friendly conversation between a human and an
AI to generate Python code. If you have notes about the code, place them before
the code. Any nots about execution should follow the code. If you do mix any
notes with the code, make them comments. Add proper comments to the code.
Sort imports and follow PEP-8 formatting.

Current conversation:
{history}
Human: {input}
Code Assistant:"""
PROMPT_TEMPLATE = PromptTemplate(input_variables=["history", "input"], template=TEMPLATE)

def start_conversation():
    # Initialize the OpenAI LLM with your API key
    llm = ChatOpenAI(
        model=MODEL,
        temperature=0.0,
        n=1
    )

    # Initialize memory and conversation
    memory = ConversationBufferWindowMemory()
    conversation = ConversationChain(
        prompt=PROMPT_TEMPLATE,
        llm=llm,
        memory=memory,
        verbose=False
    )

    return conversation

def generate_code(conversation, prompt):
    print("Model response:")
    output = conversation.invoke(prompt)
    display_markdown(output['response'], raw=True)


## First Attempt at an XOR Approximator

We will construct a prompt that requests the LLM to generate a PyTorch neural network to approximate the [Exclusive Or](https://en.wikipedia.org/wiki/Exclusive_or). The truth table for the Exclusive Or (XOR) function is provided here:

```
0 XOR 0 = 0
1 XOR 0 = 1
0 XOR 1 = 1
1 XOR 1 = 0
```

If given data, neural networks can learn to approximate functions, so let's create a PyTorch neural network to approximate the XOR function.

In [3]:
conversation = start_conversation()
generate_code(conversation, """Write Python code to learn the XOR function with PyTorch.""")

  memory = ConversationBufferWindowMemory()
  conversation = ConversationChain(


Model response:


Notes about the code:
- This example trains a small feedforward network to learn XOR using PyTorch.
- It uses a 2-neuron hidden layer with tanh activation and BCEWithLogitsLoss (stable sigmoid+BCELoss).
- Matplotlib is used to optionally plot the decision boundary; it's not required to train.

# Python code (save as xor_pytorch.py and run)
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim


# Small neural network for XOR: 2 -> 2 -> 1
class XORNet(nn.Module):
    def __init__(self):
        super().__init__()
        # Use a hidden layer with 2 units and tanh activation (classic for XOR)
        self.net = nn.Sequential(
            nn.Linear(2, 2),  # input to hidden
            nn.Tanh(),        # nonlinearity
            nn.Linear(2, 1)   # hidden to output (logit)
        )

    def forward(self, x):
        return self.net(x)


def main():
    # Reproducibility
    torch.manual_seed(0)
    np.random.seed(0)

    # Use GPU if available (not required)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # XOR dataset: four inputs and labels
    inputs = torch.tensor([[0.0, 0.0],
                           [0.0, 1.0],
                           [1.0, 0.0],
                           [1.0, 1.0]],
                          device=device)
    labels = torch.tensor([[0.0], [1.0], [1.0], [0.0]], device=device)

    # Model, loss and optimizer
    model = XORNet().to(device)
    loss_fn = nn.BCEWithLogitsLoss()  # stable combination of sigmoid + BCELoss
    optimizer = optim.Adam(model.parameters(), lr=0.1)

    # Training loop
    n_epochs = 5000
    for epoch in range(1, n_epochs + 1):
        model.train()
        optimizer.zero_grad()

        logits = model(inputs)               # raw scores (logits)
        loss = loss_fn(logits, labels)       # compute loss
        loss.backward()                      # backpropagate
        optimizer.step()                     # update weights

        # Print progress occasionally
        if epoch % 500 == 0 or epoch == 1:
            with torch.no_grad():
                probs = torch.sigmoid(logits)
                preds = (probs > 0.5).float()
                accuracy = (preds == labels).float().mean().item()
            print(f"Epoch {epoch:4d}  Loss: {loss.item():.6f}  Accuracy: {accuracy * 100:.1f}%")

    # Evaluation on the 4 XOR inputs
    model.eval()
    with torch.no_grad():
        logits = model(inputs)
        probs = torch.sigmoid(logits).cpu().numpy().flatten()
        preds = (probs > 0.5).astype(int)

    print("\nInputs | Probability | Pred")
    for inp, p, pr in zip(inputs.cpu().numpy(), probs, preds):
        print(f"{inp} -> {p:.4f} -> {pr}")

    # Optional: plot decision boundary (requires matplotlib)
    try:
        # Create a grid over input space
        grid_x = np.linspace(-0.5, 1.5, 200)
        grid_y = np.linspace(-0.5, 1.5, 200)
        xx, yy = np.meshgrid(grid_x, grid_y)
        grid = np.stack([xx.ravel(), yy.ravel()], axis=1).astype(np.float32)

        # Evaluate model on grid
        model.eval()
        with torch.no_grad():
            grid_t = torch.from_numpy(grid).to(device)
            logits_grid = model(grid_t)
            probs_grid = torch.sigmoid(logits_grid).cpu().numpy().reshape(xx.shape)

        # Plot
        plt.figure(figsize=(5, 4))
        plt.contourf(xx, yy, probs_grid, levels=50, cmap="RdYlBu", alpha=0.8)
        plt.colorbar(label="P(y=1)")
        # Plot training points
        xs = inputs.cpu().numpy()
        ys = labels.cpu().numpy().flatten()
        plt.scatter(xs[:, 0], xs[:, 1], c=ys, cmap="bwr", edgecolor="k", s=80)
        plt.title("Learned XOR decision surface (probability)")
        plt.xlabel("x1")
        plt.ylabel("x2")
        plt.xlim(-0.5, 1.5)
        plt.ylim(-0.5, 1.5)
        plt.show()
    except Exception:
        # If plotting fails (e.g., matplotlib not available), skip silently
        pass


if __name__ == "__main__":
    main()

Execution notes:
- Requires PyTorch. Install with pip if needed, e.g.:
  - pip install torch torchvision    (or follow instructions at pytorch.org for the correct package for your platform)
- To get the plotting output ensure matplotlib is installed: pip install matplotlib
- Save the code to a file (e.g., xor_pytorch.py) and run: python xor_pytorch.py
- Expected outcome: the network should reach ~100% accuracy on the four XOR inputs and the printed probabilities should be near 0 or 1. Adjust learning rate, optimizer, or epochs if training fails to converge.

# Requesting a Change to Generated Code

If you've taken my other course, you will know I prefer PyTorch sequences over extending the nn.Module class, at least for simple neural networks like an XOR approximator. LLMs do not share this opinion. However, the LLM will gladly humor me and generate a sequence. Here, I provide an additional prompt to request this rather than resubmitting a modified version of my first prompt.

In [4]:
generate_code(conversation, """Could you make use of a PyTorch sequence rather than a nn.Module class?""")

Model response:


Notes about the code:
- This example trains a small feedforward network to learn XOR using PyTorch.
- It uses nn.Sequential (a PyTorch sequence) instead of defining a custom nn.Module subclass.
- The network is 2 -> 2 -> 1 with Tanh activation in the hidden layer and BCEWithLogitsLoss.
- Matplotlib is optionally used to plot the learned decision surface.

# Python code (save as xor_pytorch_seq.py and run)
```python
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim


# Train a small network for XOR using nn.Sequential (no custom class)
def main():
    # Reproducibility
    torch.manual_seed(0)
    np.random.seed(0)

    # Use GPU if available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # XOR dataset: four inputs and labels (float tensors)
    inputs = torch.tensor(
        [[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]],
        dtype=torch.float32,
        device=device,
    )
    labels = torch.tensor([[0.0], [1.0], [1.0], [0.0]],
                          dtype=torch.float32,
                          device=device)

    # Build the model using nn.Sequential (2 -> 2 -> 1)
    model = nn.Sequential(
        nn.Linear(2, 2),  # input to hidden
        nn.Tanh(),        # nonlinearity
        nn.Linear(2, 1)   # hidden to output (logit)
    ).to(device)

    # Loss and optimizer
    loss_fn = nn.BCEWithLogitsLoss()  # combines sigmoid + BCE in a numerically stable way
    optimizer = optim.Adam(model.parameters(), lr=0.1)

    # Training loop
    n_epochs = 5000
    for epoch in range(1, n_epochs + 1):
        model.train()
        optimizer.zero_grad()

        logits = model(inputs)            # raw output (logits)
        loss = loss_fn(logits, labels)    # compute loss
        loss.backward()                   # backpropagate
        optimizer.step()                  # update weights

        # Print progress occasionally
        if epoch % 500 == 0 or epoch == 1:
            with torch.no_grad():
                probs = torch.sigmoid(logits)
                preds = (probs > 0.5).float()
                accuracy = (preds == labels).float().mean().item()
            print(f"Epoch {epoch:4d}  Loss: {loss.item():.6f}  Accuracy: {accuracy * 100:.1f}%")

    # Evaluation on the 4 XOR inputs
    model.eval()
    with torch.no_grad():
        logits = model(inputs)
        probs = torch.sigmoid(logits).cpu().numpy().flatten()
        preds = (probs > 0.5).astype(int)

    print("\nInputs | Probability | Pred")
    for inp, p, pr in zip(inputs.cpu().numpy(), probs, preds):
        print(f"{inp} -> {p:.4f} -> {pr}")

    # Optional: plot decision boundary (requires matplotlib)
    try:
        # Create a grid over input space
        grid_x = np.linspace(-0.5, 1.5, 200)
        grid_y = np.linspace(-0.5, 1.5, 200)
        xx, yy = np.meshgrid(grid_x, grid_y)
        grid = np.stack([xx.ravel(), yy.ravel()], axis=1).astype(np.float32)

        # Evaluate model on the grid
        model.eval()
        with torch.no_grad():
            grid_t = torch.from_numpy(grid).to(device)
            logits_grid = model(grid_t)
            probs_grid = torch.sigmoid(logits_grid).cpu().numpy().reshape(xx.shape)

        # Plot probability surface and training points
        plt.figure(figsize=(5, 4))
        plt.contourf(xx, yy, probs_grid, levels=50, cmap="RdYlBu", alpha=0.8)
        plt.colorbar(label="P(y=1)")
        xs = inputs.cpu().numpy()
        ys = labels.cpu().numpy().flatten()
        plt.scatter(xs[:, 0], xs[:, 1], c=ys, cmap="bwr", edgecolors="k", s=80)
        plt.title("Learned XOR decision surface (probability)")
        plt.xlabel("x1")
        plt.ylabel("x2")
        plt.xlim(-0.5, 1.5)
        plt.ylim(-0.5, 1.5)
        plt.show()
    except Exception as exc:  # If plotting fails, report and continue
        print("Plot skipped:", exc)


if __name__ == "__main__":
    main()
```

Execution notes:
- Requires PyTorch. Install with pip if needed:
  - pip install torch torchvision    (or follow instructions at pytorch.org for the correct package for your platform)
- To see the plotting output ensure matplotlib is installed: pip install matplotlib
- Save the code to a file (e.g., xor_pytorch_seq.py) and run: python xor_pytorch_seq.py
- Expected outcome: the network should reach ~100% accuracy on the four XOR inputs and the printed probabilities should be near 0 or 1. Adjust learning rate, optimizer, or epochs if training fails to converge.

# Testing the Generated Code

LLMs are not overachievers; they will implement the code you ask for and not provide much more. When we run the XOR approximator's first version, the results are only sometimes accurate, especially if we run the program multiple times.

In [5]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim

# Define the XOR network using a sequential container
model = nn.Sequential(
    nn.Linear(2, 2),
    nn.Sigmoid(),
    nn.Linear(2, 1),
    nn.Sigmoid()
)

# Initialize the loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)

# Training data for XOR
data = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
labels = torch.tensor([[0.0], [1.0], [1.0], [0.0]])

# Train the model
for epoch in range(10000):
    # Forward pass: Compute predicted y by passing x to the model
    pred = model(data)

    # Compute and print loss
    loss = criterion(pred, labels)
    if epoch % 1000 == 0:
        print(f'Epoch {epoch} Loss: {loss.item()}')

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

# Test the model
with torch.no_grad():
    test_pred = model(data)
    print("Predicted values:")
    print(test_pred)

Epoch 0 Loss: 0.2511032819747925
Epoch 1000 Loss: 0.24947082996368408
Epoch 2000 Loss: 0.2487175017595291
Epoch 3000 Loss: 0.24651208519935608
Epoch 4000 Loss: 0.24037021398544312
Epoch 5000 Loss: 0.22664645314216614
Epoch 6000 Loss: 0.2073187381029129
Epoch 7000 Loss: 0.19236287474632263
Epoch 8000 Loss: 0.18179592490196228
Epoch 9000 Loss: 0.1654074341058731
Predicted values:
tensor([[0.1578],
        [0.6766],
        [0.6521],
        [0.5166]])


If you receive an error or the output is not exactly what you like, it is effective to provide that output and any errors to the LLM. Here, we provide the output and ask the LLM if that seems correct. Sometimes, the LLM may insist that the output is correct, so you must "debate" the LLM, providing additional details.

In [6]:
generate_code(conversation, """The output was:

Predicted values:
tensor([[0.4843],
        [0.5800],
        [0.4278],
        [0.4623]])

Are you sure that is correct?
""")

Model response:


Notes about the code:
- Your reported outputs (~0.4–0.58) mean the network didn't converge to the XOR solution. This can happen with small networks and unlucky initial weights or optimizer settings.
- Remedies included here:
  - Use a slightly larger hidden layer (4 units) so it's easier to train.
  - Reinitialize weights using Xavier/Glorot initialization (breaks symmetry).
  - Run a few short restarts with different seeds if a run gets stuck.
  - Lower the learning rate a bit and include early stopping when accuracy hits 100%.

# Python code (save as xor_pytorch_seq_retry.py and run)
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim


def build_model(hidden_size: int = 4, device: torch.device = torch.device("cpu")) -> nn.Sequential:
    """Create an nn.Sequential model for XOR (2 -> hidden_size -> 1)
    and initialize linear layers with Xavier uniform initialization.
    """
    model = nn.Sequential(
        nn.Linear(2, hidden_size),  # input -> hidden
        nn.Tanh(),                  # classic XOR hidden nonlinearity
        nn.Linear(hidden_size, 1),  # hidden -> logit output
    ).to(device)

    # Xavier/Glorot initialization for better starting weights
    for m in model.modules():
        if isinstance(m, nn.Linear):
            nn.init.xavier_uniform_(m.weight)
            if m.bias is not None:
                nn.init.zeros_(m.bias)
    return model


def train_model(
    model: nn.Sequential,
    inputs: torch.Tensor,
    labels: torch.Tensor,
    lr: float = 0.05,
    n_epochs: int = 5000,
) -> tuple[nn.Sequential, float]:
    """Train model and return (model, final_accuracy)."""
    loss_fn = nn.BCEWithLogitsLoss()
    optimizer = optim.Adam(model.parameters(), lr=lr)

    for epoch in range(1, n_epochs + 1):
        model.train()
        optimizer.zero_grad()

        logits = model(inputs)
        loss = loss_fn(logits, labels)
        loss.backward()
        optimizer.step()

        # Early stop if solved
        if epoch % 200 == 0 or epoch == 1:
            with torch.no_grad():
                probs = torch.sigmoid(logits)
                preds = (probs > 0.5).float()
                accuracy = (preds == labels).float().mean().item()
            if accuracy == 1.0:
                # solved; break early
                break
    # final accuracy
    with torch.no_grad():
        logits = model(inputs)
        probs = torch.sigmoid(logits)
        preds = (probs > 0.5).float()
        accuracy = (preds == labels).float().mean().item()
    return model, accuracy


def main():
    # Reproducibility base seed (we still try a few randomized restarts)
    base_seed = 0
    np.random.seed(base_seed)

    # Choose device
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    # XOR dataset
    inputs = torch.tensor(
        [[0.0, 0.0],
         [0.0, 1.0],
         [1.0, 0.0],
         [1.0, 1.0]],
        dtype=torch.float32,
        device=device,
    )
    labels = torch.tensor(
        [[0.0], [1.0], [1.0], [0.0]],
        dtype=torch.float32,
        device=device,
    )

    # Training hyperparameters
    attempts = 5           # number of random restarts
    n_epochs = 5000        # epochs per attempt
    lr = 0.05              # learning rate
    hidden_size = 4        # hidden units (increase from 2 to make convergence easier)

    final_model = None
    final_accuracy = 0.0

    # Try a few random restarts with different seeds if a run gets stuck
    for attempt in range(1, attempts + 1):
        seed = base_seed + attempt
        torch.manual_seed(seed)

        model = build_model(hidden_size=hidden_size, device=device)
        model, accuracy = train_model(model, inputs, labels, lr=lr, n_epochs=n_epochs)

        print(f"Attempt {attempt}/{attempts}  Seed {seed}  Accuracy: {accuracy * 100:.1f}%")

        final_model = model
        final_accuracy = accuracy
        if accuracy == 1.0:
            break  # solved

    # Show final predictions (probabilities and binary predictions)
    final_model.eval()
    with torch.no_grad():
        logits = final_model(inputs)
        probs = torch.sigmoid(logits).cpu().numpy().flatten()
        preds = (probs > 0.5).astype(int)

    print("\nInputs | Probability | Pred")
    for inp, p, pr in zip(inputs.cpu().numpy(), probs, preds):
        print(f"{inp} -> {p:.4f} -> {pr}")

    # Optional: plot decision boundary
    try:
        grid_x = np.linspace(-0.5, 1.5, 200)
        grid_y = np.linspace(-0.5, 1.5, 200)
        xx, yy = np.meshgrid(grid_x, grid_y)
        grid = np.stack([xx.ravel(), yy.ravel()], axis=1).astype(np.float32)

        final_model.eval()
        with torch.no_grad():
            grid_t = torch.from_numpy(grid).to(device)
            logits_grid = final_model(grid_t)
            probs_grid = torch.sigmoid(logits_grid).cpu().numpy().reshape(xx.shape)

        plt.figure(figsize=(5, 4))
        plt.contourf(xx, yy, probs_grid, levels=50, cmap="RdYlBu", alpha=0.8)
        plt.colorbar(label="P(y=1)")
        xs = inputs.cpu().numpy()
        ys = labels.cpu().numpy().flatten()
        plt.scatter(xs[:, 0], xs[:, 1], c=ys, cmap="bwr", edgecolors="k", s=80)
        plt.title("Learned XOR decision surface (probability)")
        plt.xlabel("x1")
        plt.ylabel("x2")
        plt.xlim(-0.5, 1.5)
        plt.ylim(-0.5, 1.5)
        plt.show()
    except Exception as exc:
        print("Plot skipped:", exc)


if __name__ == "__main__":
    main()

Execution notes:
- Requires PyTorch (pip install torch torchvision) and optionally matplotlib (pip install matplotlib).
- Save as xor_pytorch_seq_retry.py and run: python xor_pytorch_seq_retry.py
- If you still see probabilities near 0.5:
  - Try increasing hidden_size (e.g., to 8), increasing epochs, or running more restarts.
  - You can also lower the learning rate further (e.g., 0.01) or try SGD with momentum instead of Adam.

## Test the Improved Version

We now receive much more accurate output when we test the neural network provided.

In [7]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim

# Define the XOR network using a sequential container
model = nn.Sequential(
    nn.Linear(2, 4),  # Increased the number of neurons in the hidden layer
    nn.Sigmoid(),
    nn.Linear(4, 1),
    nn.Sigmoid()
)

# Initialize the loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.1)  # Changed to Adam optimizer

# Training data for XOR
data = torch.tensor([[0.0, 0.0], [0.0, 1.0], [1.0, 0.0], [1.0, 1.0]])
labels = torch.tensor([[0.0], [1.0], [1.0], [0.0]])

# Train the model
for epoch in range(20000):  # Increased the number of epochs
    # Forward pass: Compute predicted y by passing x to the model
    pred = model(data)

    # Compute and print loss
    loss = criterion(pred, labels)
    if epoch % 1000 == 0:
        print(f'Epoch {epoch} Loss: {loss.item()}')

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

# Test the model
with torch.no_grad():
    test_pred = model(data)
    print("Predicted values:")
    print(test_pred)

Epoch 0 Loss: 0.2703022062778473
Epoch 1000 Loss: 7.112702587619424e-05
Epoch 2000 Loss: 2.0544111976050772e-05
Epoch 3000 Loss: 8.988646186480764e-06
Epoch 4000 Loss: 4.589944182953332e-06
Epoch 5000 Loss: 2.524477849874529e-06
Epoch 6000 Loss: 1.4446948171098484e-06
Epoch 7000 Loss: 8.453595228274935e-07
Epoch 8000 Loss: 5.010292056795151e-07
Epoch 9000 Loss: 2.991056078371912e-07
Epoch 10000 Loss: 1.7932310925061756e-07
Epoch 11000 Loss: 1.0772929925906283e-07
Epoch 12000 Loss: 6.481739944774745e-08
Epoch 13000 Loss: 3.9054452116715765e-08
Epoch 14000 Loss: 2.3564556528299363e-08
Epoch 15000 Loss: 1.4229989631076023e-08
Epoch 16000 Loss: 8.607841550656303e-09
Epoch 17000 Loss: 5.218104170978677e-09
Epoch 18000 Loss: 3.1716662629577286e-09
Epoch 19000 Loss: 1.9367694115146605e-09
Predicted values:
tensor([[1.5428e-05],
        [9.9997e-01],
        [9.9997e-01],
        [5.1830e-05]])


## Combining the Conversation into a Single Prompt

We should combine this entire conversation into a single prompt, especially if we wish to save the prompt along with the code. We can request the LLM to create this combined prompt for us.

In [8]:
generate_code(conversation, """Okay, that is great, can you suggest a single
prompt that would have resulted in this last code output?""")

Model response:


Write a single Python script (save as xor_pytorch_seq_retry.py) that trains a small feedforward network to learn XOR using PyTorch. Follow these exact requirements:

- Place brief notes about the code before the code block and place execution notes after the code block. If any notes must be mixed with the code, make them Python comments.
- Do NOT surround the code with Markdown fences in the output.
- Sort imports alphabetically and follow PEP 8 formatting.
- Use nn.Sequential (no custom nn.Module subclass) to build the model. The model should be 2 -> hidden_size -> 1, where hidden_size defaults to 4.
- Use nn.Tanh for the hidden activation and use BCEWithLogitsLoss for the loss (i.e., logits + stable BCE).
- Initialize all nn.Linear layers using Xavier/Glorot uniform initialization and zero biases.
- Provide these functions with type hints and clear comments:
  - build_model(hidden_size: int = 4, device: torch.device = torch.device("cpu")) -> nn.Sequential
  - train_model(model: nn.Sequential, inputs: torch.Tensor, labels: torch.Tensor, lr: float = 0.05, n_epochs: int = 5000) -> tuple[nn.Sequential, float]
  - main()
- In main():
  - Use a base_seed and attempt multiple random restarts (attempts = 5). For each attempt set torch.manual_seed(base_seed + attempt).
  - Create the XOR dataset as 4 float32 input vectors and labels.
  - For each attempt, build the model, train it, and print: Attempt i/N  Seed S  Accuracy: XX.X%
  - Stop early for an attempt if accuracy reaches 100% (early stopping).
  - After all attempts or when solved, print the final 4 inputs with probabilities and binary predictions.
- Training details:
  - Use Adam optimizer with lr default 0.05.
  - Train up to n_epochs per attempt (default 5000) but break early if solved.
  - During training occasionally check for accuracy (e.g., every 200 epochs) and break when accuracy == 1.0.
- Add an optional decision-boundary plot using matplotlib:
  - Evaluate the model on a grid in [-0.5, 1.5]^2 and contourf the probability surface.
  - Plot the 4 training points on top.
  - Wrap plotting in try/except and print "Plot skipped: <exception>" if it fails.
- After the code, include execution notes that mention how to install PyTorch and matplotlib (pip install torch torchvision, pip install matplotlib), how to run the script, and suggestions if the network still outputs probabilities near 0.5 (e.g., increase hidden_size, more restarts, lower lr, change optimizer).
- Use clear inline comments in the code and keep the code concise and readable.

Produce only the prompt above (no additional explanation).

The LLM's attempt at a consoldated prompt is incomplete. It skips several important details and does not provide precise requirements. I will manually make some improvements, which you can see here.

In [9]:
# Start a new conversation
conversation = start_conversation()
generate_code(conversation, """
Can you provide Python code using PyTorch to effectively learn the XOR function
with 4 hidden neurons, using the Adam optimizer, and 20K training epochs?
Use a sequence not a nn.Module class.""")

Model response:


Notes:
- The script trains a small feedforward neural network (nn.Sequential) to learn the XOR function.
- Network: Linear(2 -> 4) -> Sigmoid -> Linear(4 -> 1) -> Sigmoid (4 hidden neurons).
- Optimizer: Adam, 20_000 epochs, binary cross-entropy loss (BCELoss).
- A random seed is set for reproducibility.

```python
# Train a small feedforward network (nn.Sequential) to learn XOR using PyTorch.
# Network: 2 inputs -> 4 hidden neurons (sigmoid) -> 1 output (sigmoid).
# Optimizer: Adam. Loss: BCELoss. Training epochs: 20_000.

import torch
import torch.nn as nn
import torch.optim as optim

# Reproducibility
torch.manual_seed(0)

# Device (using CPU here)
device = torch.device("cpu")

# XOR dataset: inputs and targets
# Inputs shape: (4, 2), Targets shape: (4, 1)
inputs = torch.tensor(
    [[0.0, 0.0],
     [0.0, 1.0],
     [1.0, 0.0],
     [1.0, 1.0]],
    dtype=torch.float32,
    device=device,
)
targets = torch.tensor(
    [[0.0], [1.0], [1.0], [0.0]],
    dtype=torch.float32,
    device=device,
)

# Define the model using nn.Sequential (sequence, not a subclass of nn.Module)
model = nn.Sequential(
    nn.Linear(2, 4),  # input -> 4 hidden neurons
    nn.Sigmoid(),     # activation for hidden layer
    nn.Linear(4, 1),  # hidden -> output
    nn.Sigmoid(),     # output activation (probability)
).to(device)

# Loss function and optimizer
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Training loop
num_epochs = 20_000
print_every = 2_000

for epoch in range(1, num_epochs + 1):
    # Forward pass
    outputs = model(inputs)
    loss = criterion(outputs, targets)

    # Backward pass and optimization step
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # Periodically print progress
    if epoch % print_every == 0 or epoch == 1:
        print(f"Epoch {epoch:5d}/{num_epochs} - Loss: {loss.item():.6f}")

# Evaluate trained model
with torch.no_grad():
    final_outputs = model(inputs)
    predicted_probs = final_outputs.view(-1)  # flatten to (4,)
    predicted_labels = (predicted_probs >= 0.5).long()
    targets_long = targets.view(-1).long()
    accuracy = (predicted_labels == targets_long).float().mean().item()

    print("\nFinal outputs (probabilities):")
    for inp, prob, lbl in zip(inputs, predicted_probs, predicted_labels):
        print(f"  Input: {inp.tolist()} -> Prob: {prob:.4f}  Pred: {int(lbl.item())}")

    print(f"\nAccuracy on XOR dataset: {accuracy * 100:.1f}%")
```

Execution notes:
- Requires PyTorch to be installed (pip install torch).
- Training on CPU for 20k epochs is quick for this tiny model (a few seconds).
- If you want faster convergence or different behavior, try changing the learning rate (lr) or using nn.Tanh activations. If you prefer to avoid explicit Sigmoid at the output, use nn.BCEWithLogitsLoss and remove the final Sigmoid.

## Test the Final Prompt

Now, we test the final prompt. My prompt produces an acceptable result, but there are some opportunities for improvement. You can specify the exact format for the output. For example, sometimes code is generated to round the results, but other times it is not.

In [10]:
import torch
import torch.nn as nn
import torch.optim as optim

# Define the XOR inputs and outputs
inputs = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float)
targets = torch.tensor([[0], [1], [1], [0]], dtype=torch.float)

# Define the model using a sequential container
model = nn.Sequential(
    nn.Linear(2, 4),  # Input layer to hidden layer with 4 neurons
    nn.ReLU(),        # ReLU activation function
    nn.Linear(4, 1),  # Hidden layer to output layer
    nn.Sigmoid()      # Sigmoid activation function for binary output
)

# Define the loss function and the optimizer
criterion = nn.BCELoss()  # Binary Cross-Entropy Loss
optimizer = optim.Adam(model.parameters(), lr=0.01)  # Adam optimizer with learning rate of 0.01

# Training loop
for epoch in range(20000):  # 20,000 training epochs
    optimizer.zero_grad()   # Clear gradients for each training step
    outputs = model(inputs)  # Forward pass: compute predicted outputs by passing inputs to the model
    loss = criterion(outputs, targets)  # Compute loss
    loss.backward()  # Backward pass: compute gradient of the loss with respect to model parameters
    optimizer.step()  # Perform a single optimization step (parameter update)

    if (epoch + 1) % 1000 == 0:
        print(f'Epoch [{epoch + 1}/20000], Loss: {loss.item():.4f}')

# Testing the model
with torch.no_grad():  # Context-manager that disabled gradient calculation
    predicted = model(inputs).round()  # Forward pass and rounding off to get predictions
    print(f'Predicted tensor: {predicted}')
    print(f'Actual tensor: {targets}')

Epoch [1000/20000], Loss: 0.0034
Epoch [2000/20000], Loss: 0.0008
Epoch [3000/20000], Loss: 0.0003
Epoch [4000/20000], Loss: 0.0001
Epoch [5000/20000], Loss: 0.0001
Epoch [6000/20000], Loss: 0.0000
Epoch [7000/20000], Loss: 0.0000
Epoch [8000/20000], Loss: 0.0000
Epoch [9000/20000], Loss: 0.0000
Epoch [10000/20000], Loss: 0.0000
Epoch [11000/20000], Loss: 0.0000
Epoch [12000/20000], Loss: 0.0000
Epoch [13000/20000], Loss: 0.0000
Epoch [14000/20000], Loss: 0.0000
Epoch [15000/20000], Loss: 0.0000
Epoch [16000/20000], Loss: 0.0000
Epoch [17000/20000], Loss: 0.0000
Epoch [18000/20000], Loss: 0.0000
Epoch [19000/20000], Loss: 0.0000
Epoch [20000/20000], Loss: 0.0000
Predicted tensor: tensor([[0.],
        [1.],
        [1.],
        [0.]])
Actual tensor: tensor([[0.],
        [1.],
        [1.],
        [0.]])
