# Introduction to Automation with LangChain, Generative AI, and Python
**1.4: Use LLM to help debug**
* Instructor: [Jeff Heaton](https://youtube.com/@HeatonResearch), WUSTL Center for Analytics and Business Insight (CABI), [Washington University in St. Louis](https://olin.wustl.edu/faculty-and-research/research-centers/center-for-analytics-and-business-insight/index.php)
* For more information visit the [class website](https://github.com/jeffheaton/cabi_genai_automation).

LLMs can help you debug both the code you create and the code you generate to fulfill your requests. In this part, you will see how to use an LLM as an assistant to help debug a Python program.

## Conversational Code Generation

We will continue to use the conversational code generation function provided in Module 1.3.



In [1]:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferWindowMemory
from langchain_aws import ChatBedrock
from langchain_core.prompts.chat import PromptTemplate
from IPython.display import display_markdown

MODEL = 'meta.llama2-70b-chat-v1'
TEMPLATE = """The following is a friendly conversation between a human and an
AI to generate Python code. If you have notes about the code, place them before
the code. Any nots about execution should follow the code. If you do mix any
notes with the code, make them comments. Add proper comments to the code.
Sort imports and follow PEP-8 formatting.

Current conversation:
{history}
Human: {input}
Code Assistant:"""
PROMPT_TEMPLATE = PromptTemplate(input_variables=["history", "input"], template=TEMPLATE)

def start_conversation():
    # Initialize bedrock, use built in role
    llm = ChatBedrock(
        model_id=MODEL,
        model_kwargs={"temperature": 0.1},
    )

    # Initialize memory and conversation
    memory = ConversationBufferWindowMemory()
    conversation = ConversationChain(
        prompt=PROMPT_TEMPLATE,
        llm=llm,
        memory=memory,
        verbose=False
    )

    return conversation

def generate_code(conversation, prompt):
    print("Model response:")
    output = conversation.invoke(prompt)
    display_markdown(output['response'], raw=True)


## A Buggy Pi Approximator

To see an example of how you can make use of LLM-enabled debugging, consider the following code to use the [Monte Carlo](https://en.wikipedia.org/wiki/Monte_Carlo_method) method to estimate [Pi](https://en.wikipedia.org/wiki/Pi). We need to fix several issues with this code. We can request the LLM to help us debug. This code, when executed, produces the following error:

```
NameError: name 'xrange' is not defined
```

In [2]:
import random

def monte_carlo_pi(num_samples):
    inside_circle = 0

    for _ in xrange(num_samples):
        x, y = random.random(), random.random()  # Generate random point (x, y)
        if x*2 + y*2 <= 1:
            inside_circle += 1  # Check if the point is inside the quarter circle

    pi_approximation = 4 * inside_circle / num_samples  # Calculate approximation of Pi
    return pi_approximation

# Example usage
num_samples = 1000000  # Number of random points to generate
approximated_pi = monte_carlo_pi(num_samples)
print(f"Approximated Pi with {num_samples} samples: {approximated_pi}")

NameError: name 'xrange' is not defined

When we ask the LLM to help us debug this code, we should provide as much detail as possible. I usually like to produce a prompt in the following format:

```
I am trying to debug the following code:

... provide code here...

However, I am getting the following error:

... add the error here, provide stack trace ...

```

In [3]:
conversation = start_conversation()
generate_code(conversation, """
I am trying to debug the following code:

import random

def monte_carlo_pi(num_samples):
    inside_circle = 0

    for _ in xrange(num_samples):
        x, y = random.random(), random.random()  # Generate random point (x, y)
        if x*2 + y*2 <= 1:
            inside_circle += 1  # Check if the point is inside the quarter circle

    pi_approximation = 4 * inside_circle / num_samples  # Calculate approximation of Pi
    return pi_approximation

# Example usage
num_samples = 1000000  # Number of random points to generate
approximated_pi = monte_carlo_pi(num_samples)
print(f"Approximated Pi with {num_samples} samples: {approximated_pi}")

However, I am getting the following error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-c7b4356f1718> in <cell line: 16>()
     14 # Example usage
     15 num_samples = 1000000  # Number of random points to generate
---> 16 approximated_pi = monte_carlo_pi(num_samples)
     17 print(f"Approximated Pi with {num_samples} samples: {approximated_pi}")

<ipython-input-10-c7b4356f1718> in monte_carlo_pi(num_samples)
      4     inside_circle = 0
      5
----> 6     for _ in xrange(num_samples):
      7         x, y = random.random(), random.random()  # Generate random point (x, y)
      8         if x*2 + y*2 <= 1:

NameError: name 'xrange' is not defined

""")

Model response:


  Hello! I'm happy to help you debug your code. It looks like you're getting a NameError for 'xrange'. This is because 'xrange' is not a built-in function in Python 3. Instead, you can use the 'range' function to achieve the same result.

Here's a modified version of your code that should work:
```python
import random

def monte_carlo_pi(num_samples):
    inside_circle = 0

    for _ in range(num_samples):
        x, y = random.random(), random.random()  # Generate random point (x, y)
        if x*2 + y*2 <= 1:
            inside_circle += 1  # Check if the point is inside the quarter circle

    pi_approximation = 4 * inside_circle / num_samples  # Calculate approximation of Pi
    return pi_approximation

# Example usage
num_samples = 1000000  # Number of random points to generate
approximated_pi = monte_carlo_pi(num_samples)
print(f"Approximated Pi with {num_samples} samples: {approximated_pi}")
```
I've also taken the liberty of reformatting your code to follow PEP-8 formatting guidelines. I hope this helps! Let me know if you have any further questions or issues.

In this case, the LLM decided to be an overachiever because I only asked it about the specific error I was getting. However, the LLM provided me with two issues, one of which was the error I encountered. The LLM identified these two issues:

* It looks like you're using Python 3, where xrange has been replaced by range.

* Also, there's a mistake in the condition to check if the point is inside the quarter circle. It should be ```x ** 2 + y ** 2 <= 1``` instead ```of x*2 + y*2 <= 1```.

The LLM also provided a corrected code for me to copy/paste.

## Testing the Corrected Code

Now, we can test the corrected code and see that it works properly.

In [None]:
import random

def monte_carlo_pi(num_samples):
    """
    Estimate the value of Pi using the Monte Carlo method.

    Args:
    num_samples (int): Number of random samples to generate.

    Returns:
    float: Approximated value of Pi.
    """
    inside_circle = 0

    for _ in range(num_samples):  # Use range instead of xrange for Python 3
        x, y = random.random(), random.random()  # Generate random point (x, y)
        if x**2 + y**2 <= 1:  # Correct formula to check if inside the quarter circle
            inside_circle += 1

    pi_approximation = 4 * inside_circle / num_samples  # Calculate approximation of Pi
    return pi_approximation

# Example usage
num_samples = 1000000  # Number of random points to generate
approximated_pi = monte_carlo_pi(num_samples)
print(f"Approximated Pi with {num_samples} samples: {approximated_pi}")

## LLMs Explaining Code

LLMs are also very adept at explaining code. As you work through this course, you will see that the assignments use a submission function I named "submit." This submission function uses HTTP and API calling techniques that are not covered by this course. However, if you are interested in what the "submit" function does, you can ask the LLM.

In [4]:
# Start a new conversation
conversation = start_conversation()
generate_code(conversation, """
Could you please explain what the following code does?

import base64
import os
import numpy as np
import pandas as pd
import requests
import PIL
import PIL.Image
import io

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - List of pandas dataframes or images.
# key - Your student key that was emailed to you.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    payload = []
    for item in data:
        if type(item) is PIL.Image.Image:
            buffered = BytesIO()
            item.save(buffered, format="PNG")
            payload.append({'PNG':base64.b64encode(buffered.getvalue()).decode('ascii')})
        elif type(item) is pd.core.frame.DataFrame:
            payload.append({'CSV':base64.b64encode(item.to_csv(index=False).encode('ascii')).decode("ascii")})
    r= requests.post("https://api.heatonresearch.com/assignment-submit",
        headers={'x-api-key':key}, json={ 'payload': payload,'assignment': no, 'ext':ext, 'py':encoded_python})
    if r.status_code==200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))
""")

Model response:


  This code defines a function called `submit` that submits an assignment to a server. The function takes four parameters:

* `data`: a list of pandas dataframes or images
* `key`: a student key that was emailed to the user
* `no`: the assignment class number, which should be between 1 and 1 (inclusive)
* `source_file`: the full path to the Python or IPYNB file that contains the assignment. This file must have a name that includes "_class1" followed by the assignment number (e.g. "_class2" for assignment #2). If `source_file` is not provided, the function will use the current file (i.e. the file that contains the `submit` function).

The function first checks if `source_file` is provided and if it exists. If not, it raises an exception. It then reads the contents of the file using the `open` function in binary mode (`"rb"`), encodes it using base64, and decodes the result to get a string representation of the file contents.

Next, the function checks the file extension of `source_file` and raises an exception if it is not either `.ipynb` or `.py`.

The function then creates a list called `payload` that contains either PNG image data or CSV data, depending on the type of the item in the `data` list. If the item is a PIL image, it saves it to a BytesIO buffer using the `PNG` format and encodes the resulting bytes using base64. If the item is a pandas DataFrame, it converts it to a CSV string using the `to_csv` method with `index=False` and encodes the resulting string using base64.

Finally, the function makes a POST request to the API endpoint `https://api.heatonresearch.com/assignment-submit` with the `json` parameter set to a dictionary containing the `payload`, `assignment` number, `ext` (extension of the source file), and `py` (encoded Python file contents) values. If the API response status code is 200 (OK), the function prints a success message. Otherwise, it prints a failure message.

Here are some notes about the code:

* The function uses the `os` module

As you can see, the LLM explained my "submit" function.

## Improving Code with LLMs

You can also request that a LLM improve your code. You can mention specific improvements you seek, such as removing unused or redundant imports, sorting the imports, and adhering to PEP-8 for your code formatting. In the following code, I request that the LLM improve my submit function.

In [5]:
conversation = start_conversation()
generate_code(conversation, """
Could you please suggest and implement any improvements to the following code?

import base64
import os
import numpy as np
import pandas as pd
import requests
import PIL
import PIL.Image
import io

# This function submits an assignment.  You can submit an assignment as much as you like, only the final
# submission counts.  The paramaters are as follows:
# data - List of pandas dataframes or images.
# key - Your student key that was emailed to you.
# no - The assignment class number, should be 1 through 1.
# source_file - The full path to your Python or IPYNB file.  This must have "_class1" as part of its name.
# .             The number must match your assignment number.  For example "_class2" for class assignment #2.
def submit(data,key,no,source_file=None):
    if source_file is None and '__file__' not in globals(): raise Exception('Must specify a filename when a Jupyter notebook.')
    if source_file is None: source_file = __file__
    suffix = '_class{}'.format(no)
    if suffix not in source_file: raise Exception('{} must be part of the filename.'.format(suffix))
    with open(source_file, "rb") as image_file:
        encoded_python = base64.b64encode(image_file.read()).decode('ascii')
    ext = os.path.splitext(source_file)[-1].lower()
    if ext not in ['.ipynb','.py']: raise Exception("Source file is {} must be .py or .ipynb".format(ext))
    payload = []
    for item in data:
        if type(item) is PIL.Image.Image:
            buffered = BytesIO()
            item.save(buffered, format="PNG")
            payload.append({'PNG':base64.b64encode(buffered.getvalue()).decode('ascii')})
        elif type(item) is pd.core.frame.DataFrame:
            payload.append({'CSV':base64.b64encode(item.to_csv(index=False).encode('ascii')).decode("ascii")})
    r= requests.post("https://api.heatonresearch.com/assignment-submit",
        headers={'x-api-key':key}, json={ 'payload': payload,'assignment': no, 'ext':ext, 'py':encoded_python})
    if r.status_code==200:
        print("Success: {}".format(r.text))
    else: print("Failure: {}".format(r.text))
""")

Model response:


  Hello! I'm happy to help you with your code. Before we begin, I want to mention that the code you provided is quite long and complex, so it may take some time to go through it thoroughly. Additionally, I'll be suggesting improvements and optimizations to the code, but please feel free to ask me any questions or clarify any doubts you may have.

Firstly, I notice that you have a mix of both Python 2 and Python 3 syntax in your code. To avoid any potential issues, I suggest sticking to a single syntax throughout the code. Since you're using Python 3.x, let's go with that.

Here are my suggestions for improving the code:

1. Consistent indentation: Your code has inconsistent indentation in some places. It's essential to maintain consistent indentation throughout the code to make it easier to read and understand. I suggest using four spaces for each level of indentation.
2. Separate import statements: It's a good practice to separate import statements for different modules. This makes it easier to manage and maintain the code. For example, you can separate the import statements for `base64`, `os`, `numpy`, `pandas`, `requests`, `PIL`, and `io` into separate lines.
3. Use `requests` instead of `PIL` for image submission: Since you're already using `requests` to submit the assignment, it would be better to use it for submitting images as well. You can use the `requests.post` method to send the image data along with the other submission data.
4. Use a dictionary for submission data: Instead of creating a list of dictionaries for the submission data, you can create a single dictionary with all the necessary keys and values. This will make the code more concise and easier to manage.
5. Remove unnecessary variables: You don't need to store the `encoded_python` variable separately. You can directly use the `base64.b64encode` function in the `requests.post` method.
6. Use `os.path.join` instead of concatenating path strings: It's better to use the `os.path.join` method to concatenate path strings. This ensures that the paths are correctly joined, even if they contain separators.
7. Check if the submission file exists: Before submitting

As you can see, the LLM suggested several improvements that I will consider for future versions of this function.