# Introduction to Automation with LangChain, Generative AI, and Python
**1.2: Prompting for Code Generation**
* Instructor: [Jeff Heaton](https://youtube.com/@HeatonResearch), WUSTL Center for Analytics and Business Insight (CABI), [Washington University in St. Louis](https://olin.wustl.edu/faculty-and-research/research-centers/center-for-analytics-and-business-insight/index.php)
* For more information visit the [class website](https://github.com/jeffheaton/cabi_genai_automation).

## Bedrock for Code Generation

LLMs are adept at generating code and can considerably boost programmers' productivity. This technical course requires you to create programs for the assignments. You might wonder if I consider it  "cheating" to utilize LLMs to help you write your homework assignments. For this course, I do not consider it cheating to use AI to help you with assignments; I expect such utilization in this course.

You can use the same Bedrock LLMs that your Bedrock grants access to for code generation. You also have other options, which may give you access to even greater code generation capabilities, though OpenAI should be sufficient for this class.

There are three possible LLM-based code generation tools. All three require additional fees for use.

* [GitHub CoPilot](https://github.com/features/copilot)
* [ChatGPT](https://chat.openai.com/)
* [Amazon CodeWhisperer/Q Developer](https://aws.amazon.com/codewhisperer/)

You can use the code below to access OpenAI for code generation.

In [1]:
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts.chat import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from IPython.display import display_markdown
from langchain_aws import ChatBedrock


MODEL = 'anthropic.claude-3-sonnet-20240229-v1:0'

def generate_code(prompt):
  messages = [
      SystemMessage(
          content="You are a helpful assistant that writes reliable computer program code."
      ),
      HumanMessage(content=prompt),
  ]

  # Initialize bedrock, use built in role
  llm = ChatBedrock(
        model_id=MODEL,
        model_kwargs={"temperature": 0.1},
  )


  print("Model response:")
  output = llm.invoke(messages)
  display_markdown(output.content,raw=True)

With the above function defined, you can now generate code. The code below generates a Python function to create a Fibonacci sequence.

In [2]:
generate_code("""Write Python code to return a fibonacci sequence of a length specified by the parameter l.""")

Model response:


Here's a Python function that returns a list containing the Fibonacci sequence up to a specified length `l`:

```python
def fibonacci(l):
    if l <= 0:
        return []
    elif l == 1:
        return [0]
    elif l == 2:
        return [0, 1]
    else:
        fib_seq = [0, 1]
        for i in range(2, l):
            next_num = fib_seq[i-1] + fib_seq[i-2]
            fib_seq.append(next_num)
        return fib_seq
```

To use this function, simply call it with the desired length as an argument:

```python
print(fibonacci(10))  # Output: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
print(fibonacci(1))   # Output: [0]
print(fibonacci(5))   # Output: [0, 1, 1, 2, 3]
```

Here's how the `fibonacci()` function works:

1. First, it checks for invalid inputs (`l <= 0`) and returns an empty list `[]`.
2. If `l` is 1, it returns `[0]` since the first Fibonacci number is 0.
3. If `l` is 2, it returns `[0, 1]` since the first two Fibonacci numbers are 0 and 1.
4. For `l > 2`, it initializes the `fib_seq` list with `[0, 1]`.
5. Then, it uses a `for` loop to generate the remaining Fibonacci numbers by adding the previous two numbers in the sequence.
6. Finally, it returns the `fib_seq` list containing the Fibonacci sequence up to length `l`.

Note that this implementation generates the Fibonacci sequence using the iterative approach, which is more efficient than the recursive approach for larger values of `l`.

## Generating Methods

In [3]:
generate_code("""
Write a Python function named loan_amortization that accepts these parameters.
loan_amount - The amount of the loan.
apr - The interest rate.
term - The number of months in the loan.
The function should return a Pandas dataframe that contains the following columns:
month - The current month.
amount - The amount left on the loan.
principal - The amount payed to the principal this month.
interest - The amount paid in interest this month.
payment - The total payment this month.
Additionally, build a dictionary of columns to create the Pandas dataframe.
""")

Model response:


Here's a Python function named `loan_amortization` that calculates the loan amortization schedule and returns a Pandas DataFrame with the requested columns:

```python
import pandas as pd
import numpy as np

def loan_amortization(loan_amount, apr, term):
    # Convert APR to monthly interest rate
    monthly_rate = apr / (12 * 100)

    # Calculate the monthly payment
    monthly_payment = loan_amount * (monthly_rate * (1 + monthly_rate) ** term) / ((1 + monthly_rate) ** term - 1)

    # Create a dictionary to store the amortization schedule
    amortization_schedule = {
        'month': [],
        'amount': [],
        'principal': [],
        'interest': [],
        'payment': []
    }

    # Initialize the remaining loan amount
    remaining_loan = loan_amount

    # Loop through each month
    for month in range(1, term + 1):
        # Calculate the interest for this month
        interest = remaining_loan * monthly_rate

        # Calculate the principal for this month
        principal = monthly_payment - interest

        # Update the remaining loan amount
        remaining_loan -= principal

        # Append the values to the amortization schedule dictionary
        amortization_schedule['month'].append(month)
        amortization_schedule['amount'].append(remaining_loan)
        amortization_schedule['principal'].append(principal)
        amortization_schedule['interest'].append(interest)
        amortization_schedule['payment'].append(monthly_payment)

    # Create a Pandas DataFrame from the amortization schedule dictionary
    amortization_df = pd.DataFrame(amortization_schedule)

    return amortization_df
```

To use this function, you can call it with the loan amount, annual percentage rate (APR), and the term (number of months) as arguments:

```python
loan_amount = 200000
apr = 6.5
term = 360

amortization_df = loan_amortization(loan_amount, apr, term)
print(amortization_df)
```

This will output a Pandas DataFrame with the following columns:

- `month`: The current month
- `amount`: The amount left on the loan
- `principal`: The amount paid to the principal this month
- `interest`: The amount paid in interest this month
- `payment`: The total payment this month

The function first converts the APR to a monthly interest rate. It then calculates the monthly payment using the formula for the present value of an annuity. Next, it initializes a dictionary to store the amortization schedule.

The function then loops through each month, calculating the interest, principal, and remaining loan amount for that month. These values are appended to the corresponding lists in the amortization schedule dictionary.

Finally, the function creates a Pandas DataFrame from the amortization schedule dictionary and returns it.

Note that this implementation assumes a fixed monthly payment and does not account for any additional fees or charges that may be associated with the loan.

In [4]:
import pandas as pd
import numpy as np

def loan_amortization(loan_amount, apr, term):
    # Convert APR to monthly interest rate
    monthly_rate = apr / (12 * 100)

    # Calculate the monthly payment
    monthly_payment = loan_amount * (monthly_rate * (1 + monthly_rate) ** term) / ((1 + monthly_rate) ** term - 1)

    # Create a dictionary to store the amortization schedule
    amortization_schedule = {
        'month': [],
        'amount': [],
        'principal': [],
        'interest': [],
        'payment': []
    }

    # Initialize the remaining loan amount
    remaining_loan = loan_amount

    # Loop through each month
    for month in range(1, term + 1):
        # Calculate the interest for this month
        interest = remaining_loan * monthly_rate

        # Calculate the principal for this month
        principal = monthly_payment - interest

        # Update the remaining loan amount
        remaining_loan -= principal

        # Append the values to the amortization schedule dictionary
        amortization_schedule['month'].append(month)
        amortization_schedule['amount'].append(remaining_loan)
        amortization_schedule['principal'].append(principal)
        amortization_schedule['interest'].append(interest)
        amortization_schedule['payment'].append(monthly_payment)

    # Create a Pandas DataFrame from the amortization schedule dictionary
    amortization_df = pd.DataFrame(amortization_schedule)

    return amortization_df

# Example usage:
loan_amount = 100000  # $100,000 loan
apr = 5  # 5% annual interest rate
term = 360  # 30 years, 360 months

df = loan_amortization(loan_amount, apr, term)
print(df.head())  # Print the first few rows of the DataFrame

   month        amount   principal    interest     payment
0      1  99879.845044  120.154956  416.666667  536.821623
1      2  99759.189442  120.655602  416.166021  536.821623
2      3  99638.031108  121.158334  415.663289  536.821623
3      4  99516.367948  121.663160  415.158463  536.821623
4      5  99394.197858  122.170090  414.651533  536.821623


```
Write a Python function named loan_amortization that accepts these parameters.
loan_amount - The amount of the loan.
apr - The interest rate.
term - The number of months in the loan.
The function should return a Pandas dataframe that contains the following columns:
month - The current month.
amount - The amount left on the loan.
principal - The amount payed to the principal this month.
interest - The amount paid in interest this month.
payment - The total payment this month.
```

## Generating Larger Programs

* [Kaggle competition to predict ages](https://www.kaggle.com/competitions/applications-of-deep-learning-wustl-fall-2023)



In [6]:
generate_code("""
Create a PyTorch GPU-enabled neural network for a Kaggle competition that asks me to predict the age of people in provided images.
The images are stored at this path: /kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age.
The training data is in the file: /kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/train.csv.
The training data has 3 columns, id, filename, and age. The field age is the target, to be predicted, numeric age in years of
the person. The file contains the filename of the image that corresponds to each row, the images are named 1.jpg, 2.jpg, etc,
which corresponds to both the id and the filename fields. There is also a test dataset that we must generate a submission
dataframe for. The test data is in the file /kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/test.csv,
and has the id and filename columns, but we need to generate a submit dataframe with just id and age(the prediction). Train the neural network, use early stopping and generate the submit dataframe.
""")

Model response:


Sure, I can help you with that. Here's a PyTorch GPU-enabled neural network code for the Kaggle competition you described:

```python
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import pandas as pd
from PIL import Image
from torchvision import transforms
import os

# Set device to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define the dataset
class FaceAgeDataset(Dataset):
    def __init__(self, data_dir, csv_file, transform=None):
        self.data_dir = data_dir
        self.data = pd.read_csv(csv_file)
        self.transform = transform

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        filename = os.path.join(self.data_dir, self.data.iloc[idx, 1])
        image = Image.open(filename)
        age = self.data.iloc[idx, 2]
        if self.transform:
            image = self.transform(image)
        return image, age

# Define the neural network
class FaceAgeNet(nn.Module):
    def __init__(self):
        super(FaceAgeNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 8 * 8, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.pool(x)
        x = self.relu(self.conv2(x))
        x = self.pool(x)
        x = x.view(-1, 64 * 8 * 8)
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Define the data transformations
transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Load the data
train_dataset = FaceAgeDataset('/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age', '/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/train.csv', transform=transform)
test_dataset = FaceAgeDataset('/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age', '/kaggle/input/applications-of-deep-learning-wustl-spring-2024/faces-age/test.csv', transform=transform)

# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# Define the model, loss function, and optimizer
model = FaceAgeNet().to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
num_epochs = 100
early_stopping_patience = 10
min_val_loss = float('inf')
best_model_state_dict = None

for epoch in range(num_epochs):
    train_loss = 0.0
    val_loss = 0.0
    model.train()
    for images, ages in train_loader:
        images, ages