<a href="https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_02_1_dev.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-559: Applications of Generative Artificial Intelligence
**Module 2: Code Generation**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).

# Module 2 Material

* **Part 2.1: Prompting for Code Generation** [[Video]](https://www.youtube.com/watch?v=HVId6kYKKgQ) [[Notebook]](t81_559_class_02_1_dev.ipynb)
* Part 2.2: Handling Revision Prompts [[Video]](https://www.youtube.com/watch?v=APpV46tplXA) [[Notebook]](t81_559_class_02_2_multi_prompt.ipynb)
* Part 2.3: Using a LLM to Help Debug [[Video]](https://www.youtube.com/watch?v=VPqSNb38QK0) [[Notebook]](t81_559_class_02_3_llm_debug.ipynb)
* Part 2.4: Tracking Prompts in Software Development [[Video]](https://www.youtube.com/watch?v=oUFUuYfvXZU) [[Notebook]](t81_559_class_02_4_software_eng.ipynb)
* Part 2.5: Limits of LLM Code Generation [[Video]](https://www.youtube.com/watch?v=dKtRI0LZSyY) [[Notebook]](t81_559_class_02_5_code_gen_limits.ipynb)


# Google CoLab Instructions

The following code ensures that Google CoLab is running and maps Google Drive if needed.

In [1]:
import os

try:
    from google.colab import drive, userdata
    COLAB = True
    print("Note: using Google CoLab")
except:
    print("Note: not using Google CoLab")
    COLAB = False

# OpenAI Secrets
if COLAB:
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

# Install needed libraries in CoLab
if COLAB:
    !pip install langchain langchain_openai

Note: not using Google CoLab


# 2.1: Prompting for Code Generation

## OpenAI for Code Generation

LLMs are adept at generating code and can considerably boost programmers' productivity. This technical course requires you to create programs for the assignments. You might wonder if I consider it  "cheating" to utilize LLMs to help you write your homework assignments. For this course, I do not consider it cheating to use AI to help you with assignments; I expect such utilization in this course.

You can use the same OpenAI LLMs that your OpenAI grants access to for code generation. You also have other options, which may give you access to even greater code generation capabilities, though OpenAI should be sufficient for this class.

There are three possible LLM-based code generation tools. All three require additional fees for use.

* [GitHub CoPilot](https://github.com/features/copilot)
* [ChatGPT](https://chat.openai.com/)
* [Amazon CodeWhisperer](https://aws.amazon.com/codewhisperer/)

You can use the code below to access OpenAI for code generation.

In [2]:
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain_openai import ChatOpenAI
from IPython.display import display_markdown

MODEL = 'gpt-4o-mini'

def generate_code(prompt):
  messages = [
      SystemMessage(
          content="You are a helpful assistant that writes reliable computer program code."
      ),
      HumanMessage(content=prompt),
  ]

  # Initialize the OpenAI LLM with your API key
  llm = ChatOpenAI(
    model=MODEL,
    temperature= 0.0,
    n= 1)

  print(MODEL)
  print("Model response:")
  output = llm.invoke(messages)
  display_markdown(output.content,raw=True)

With the above function defined, you can now generate code. The code below generates a Python function to create a Fibonacci sequence.

In [4]:
generate_code("""Write Python code to return a fibonacci sequence of a length specified by the parameter l.""")

gpt-4o-mini
Model response:


Certainly! Below is a Python function that generates a Fibonacci sequence of a specified length `l`:

```python
def fibonacci_sequence(l):
    if l <= 0:
        return []
    elif l == 1:
        return [0]
    elif l == 2:
        return [0, 1]
    
    sequence = [0, 1]
    for i in range(2, l):
        next_value = sequence[-1] + sequence[-2]
        sequence.append(next_value)
    
    return sequence

# Example usage:
length = 10
print(fibonacci_sequence(length))
```

This function works as follows:
- It first checks if the length `l` is less than or equal to 0, in which case it returns an empty list.
- If `l` is 1, it returns a list containing only the first Fibonacci number, which is 0.
- If `l` is 2, it returns the first two Fibonacci numbers: 0 and 1.
- For lengths greater than 2, it initializes the sequence with the first two Fibonacci numbers and then iteratively calculates the next numbers in the sequence until it reaches the desired length. 

You can call the function with any positive integer to get the corresponding Fibonacci sequence.

## Generating Methods

In [5]:
generate_code("""
Write a Python function named loan_amortization that accepts these parameters.
loan_amount - The amount of the loan.
apr - The interest rate.
term - The number of months in the loan.
The function should return a Pandas dataframe that contains the following columns:
month - The current month.
amount - The amount left on the loan.
principal - The amount payed to the principal this month.
interest - The amount paid in interest this month.
payment - The total payment this month.
Additionally, build a dictionary of columns to create the Pandas dataframe.
""")

gpt-4o-mini
Model response:


To create a function named `loan_amortization` that calculates the loan amortization schedule and returns a Pandas DataFrame, you can follow the steps outlined below. This function will compute the monthly payment, the interest paid, the principal paid, and the remaining balance for each month of the loan.

Here's the complete code for the function:

```python
import pandas as pd

def loan_amortization(loan_amount, apr, term):
    # Convert annual percentage rate to a monthly interest rate
    monthly_interest_rate = apr / 100 / 12
    
    # Calculate the monthly payment using the formula for an amortizing loan
    if monthly_interest_rate > 0:
        monthly_payment = loan_amount * (monthly_interest_rate * (1 + monthly_interest_rate) ** term) / ((1 + monthly_interest_rate) ** term - 1)
    else:
        monthly_payment = loan_amount / term  # If interest rate is 0, just divide the loan amount by the term
    
    # Initialize lists to hold the amortization schedule data
    months = []
    amounts = []
    principals = []
    interests = []
    payments = []
    
    # Initialize the remaining balance
    remaining_balance = loan_amount
    
    for month in range(1, term + 1):
        # Calculate interest for the current month
        interest_payment = remaining_balance * monthly_interest_rate
        
        # Calculate principal payment for the current month
        principal_payment = monthly_payment - interest_payment
        
        # Update the remaining balance
        remaining_balance -= principal_payment
        
        # Append the data to the lists
        months.append(month)
        amounts.append(remaining_balance if remaining_balance > 0 else 0)
        principals.append(principal_payment)
        interests.append(interest_payment)
        payments.append(monthly_payment)
    
    # Create a dictionary to hold the data
    data = {
        'month': months,
        'amount': amounts,
        'principal': principals,
        'interest': interests,
        'payment': payments
    }
    
    # Create a Pandas DataFrame from the dictionary
    amortization_schedule = pd.DataFrame(data)
    
    return amortization_schedule

# Example usage:
# df = loan_amortization(10000, 5, 24)
# print(df)
```

### Explanation:
1. **Parameters**:
   - `loan_amount`: The total amount of the loan.
   - `apr`: The annual percentage rate (interest rate).
   - `term`: The total number of months for the loan.

2. **Monthly Payment Calculation**:
   - The monthly interest rate is calculated by dividing the APR by 100 and then by 12.
   - The monthly payment is calculated using the formula for an amortizing loan.

3. **Amortization Schedule Calculation**:
   - For each month, the function calculates the interest payment, principal payment, and updates the remaining balance.
   - The results for each month are stored in lists.

4. **DataFrame Creation**:
   - A dictionary is created to hold the lists, which is then converted into a Pandas DataFrame.

5. **Return Value**:
   - The function returns the DataFrame containing the amortization schedule.

You can test the function by calling it with appropriate parameters, as shown in the example usage.

In [6]:
import pandas as pd

def loan_amortization(loan_amount, apr, term):
    # Convert annual percentage rate to a monthly interest rate
    monthly_interest_rate = apr / 100 / 12
    
    # Calculate the monthly payment using the formula for an amortizing loan
    if monthly_interest_rate > 0:
        monthly_payment = loan_amount * (monthly_interest_rate * (1 + monthly_interest_rate) ** term) / ((1 + monthly_interest_rate) ** term - 1)
    else:
        monthly_payment = loan_amount / term  # If interest rate is 0, just divide the loan amount by the term
    
    # Initialize lists to hold the amortization schedule data
    months = []
    amounts = []
    principals = []
    interests = []
    payments = []
    
    # Initialize the remaining balance
    remaining_balance = loan_amount
    
    for month in range(1, term + 1):
        # Calculate interest for the current month
        interest_payment = remaining_balance * monthly_interest_rate
        
        # Calculate principal payment for the current month
        principal_payment = monthly_payment - interest_payment
        
        # Update the remaining balance
        remaining_balance -= principal_payment
        
        # Append the data to the lists
        months.append(month)
        amounts.append(remaining_balance if remaining_balance > 0 else 0)
        principals.append(principal_payment)
        interests.append(interest_payment)
        payments.append(monthly_payment)
    
    # Create a dictionary to hold the data
    data = {
        'month': months,
        'amount': amounts,
        'principal': principals,
        'interest': interests,
        'payment': payments
    }
    
    # Create a Pandas DataFrame from the dictionary
    amortization_schedule = pd.DataFrame(data)
    
    return amortization_schedule

# Example usage:
# df = loan_amortization(10000, 5, 24)
# print(df)
# Example usage:
loan_amount = 100000  # $100,000 loan
apr = 5  # 5% annual interest rate
term = 360  # 30 years, 360 months
df = loan_amortization(loan_amount, apr, term)
print(df.head())  # Print the first few rows of the DataFrame

   month        amount   principal    interest     payment
0      1  99879.845044  120.154956  416.666667  536.821623
1      2  99759.189442  120.655602  416.166021  536.821623
2      3  99638.031108  121.158334  415.663289  536.821623
3      4  99516.367948  121.663160  415.158463  536.821623
4      5  99394.197858  122.170090  414.651533  536.821623


```
Write a Python function named loan_amortization that accepts these parameters.
loan_amount - The amount of the loan.
apr - The interest rate.
term - The number of months in the loan.
The function should return a Pandas dataframe that contains the following columns:
month - The current month.
amount - The amount left on the loan.
principal - The amount payed to the principal this month.
interest - The amount paid in interest this month.
payment - The total payment this month.
```

## Generating Larger Programs



In [8]:
generate_code("""
Create a PyTorch GPU-enabled neural network for a Kaggle competition that asks me to predict the age of people in provided images.
The images are stored at this path: /kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age.
The training data is in the file: /kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/train.csv.
The training data has 3 columns, id, filename, and age. The field age is the target, to be predicted, numeric age in years of
the person. The file contains the filename of the image that corresponds to each row, the images are named 1.jpg, 2.jpg, etc,
which corresponds to both the id and the filename fields. There is also a test dataset that we must generate a submission
dataframe for. The test data is in the file /kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/test.csv,
and has the id and filename columns, but we need to generate a submit dataframe with just id and age(the prediction). Train the neural network, use early stopping and generate the submit dataframe.
""")

gpt-4o-mini
Model response:


To create a PyTorch GPU-enabled neural network for predicting the age of people from images, we will follow these steps:

1. Load the training and test datasets.
2. Preprocess the images.
3. Create a neural network model.
4. Train the model with early stopping.
5. Generate predictions for the test dataset.
6. Create a submission DataFrame.

Below is a complete implementation of the above steps:

```python
import os
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, models
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from tqdm import tqdm

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the datasets
train_df = pd.read_csv('/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/train.csv')
test_df = pd.read_csv('/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/test.csv')

# Define the image dataset
class AgeDataset(Dataset):
    def __init__(self, dataframe, img_dir, transform=None):
        self.dataframe = dataframe
        self.img_dir = img_dir
        self.transform = transform

    def __len__(self):
        return len(self.dataframe)

    def __getitem__(self, idx):
        img_name = os.path.join(self.img_dir, self.dataframe.iloc[idx, 1])  # filename
        image = Image.open(img_name).convert('RGB')
        age = self.dataframe.iloc[idx, 2]  # age

        if self.transform:
            image = self.transform(image)

        return image, age

# Define transformations
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

# Split the training data into training and validation sets
train_data, val_data = train_test_split(train_df, test_size=0.2, random_state=42)

# Create datasets and dataloaders
train_dataset = AgeDataset(train_data, '/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age', transform)
val_dataset = AgeDataset(val_data, '/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age', transform)
test_dataset = AgeDataset(test_df, '/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age', transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Define the neural network model
class AgeNet(nn.Module):
    def __init__(self):
        super(AgeNet, self).__init__()
        self.model = models.resnet18(pretrained=True)
        self.model.fc = nn.Linear(self.model.fc.in_features, 1)  # Change the output layer to 1 for age prediction

    def forward(self, x):
        return self.model(x)

# Initialize the model, loss function, and optimizer
model = AgeNet().to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training function with early stopping
def train_model(model, train_loader, val_loader, criterion, optimizer, num_epochs=25, patience=5):
    best_val_loss = float('inf')
    epochs_no_improve = 0
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        for images, ages in tqdm(train_loader):
            images, ages = images.to(device), ages.float().to(device).view(-1, 1)

            optimizer.zero_grad()
            outputs = model(images)
            loss = criterion(outputs, ages)
            loss.backward()
            optimizer.step()

            running_loss += loss.item()

        # Validation
        model.eval()
        val_loss = 0.0
        with torch.no_grad():
            for images, ages in val_loader:
                images, ages = images.to(device), ages.float().to(device).view(-1, 1)
                outputs = model(images)
                loss = criterion(outputs, ages)
                val_loss += loss.item()

        val_loss /= len(val_loader)
        print(f'Epoch {epoch+1}/{num_epochs}, Train Loss: {running_loss/len(train_loader)}, Val Loss: {val_loss}')

        # Early stopping
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            epochs_no_improve = 0
            torch.save(model.state_dict(), 'best_model.pth')
        else:
            epochs_no_improve += 1
            if epochs_no_improve >= patience:
                print("Early stopping!")
                break

# Train the model
train_model(model, train_loader, val_loader, criterion, optimizer)

# Load the best model
model.load_state_dict(torch.load('best_model.pth'))

# Generate predictions for the test dataset
model.eval()
predictions = []
with torch.no_grad():
    for images, _ in test_loader:
        images = images.to(device)
        outputs = model(images)
        predictions.extend(outputs.cpu().numpy())

# Create submission DataFrame
submission_df = pd.DataFrame({
    'id': test_df['id'],
    'age': np.squeeze(predictions)
})

# Save the submission DataFrame
submission_df.to_csv('submission.csv', index=False)
```

### Explanation:
1. **Dataset Class**: The `AgeDataset` class loads images and their corresponding ages.
2. **Transformations**: Images are resized and normalized.
3. **Model**: A pre-trained ResNet-18 model is used, modified to output a single value (age).
4. **Training Loop**: The model is trained with early stopping based on validation loss.
5. **Predictions**: After training, predictions are made on the test dataset.
6. **Submission**: A DataFrame is created for submission, containing the IDs and predicted ages.

### Note:
- Ensure that you have the necessary libraries installed in your Kaggle environment.
- Adjust the batch size and learning rate as needed based on your GPU memory and performance.

# Module 2 Assignment

You can find the first assignment here: [assignment 2](https://github.com/jeffheaton/app_generative_ai/blob/main/assignments/assignment_yourname_class2.ipynb)