# Efficient Coding Assistant with simpleaichat

_Updated using `gpt-3.5-turbo-0613`_

Many coders use ChatGPT for coding help, however the web interface can be slow and contain unnecessary discussion when you want code. With some system prompt engineering and simplechatapi streaming, you can cut down code time generation and costs significantly.

**DISCLAIMER: Your mileage may vary in terms of code quality and accuracy in practice, but this is a good, hackable starting point.**



In [7]:
from localaichat.localaichat import AIChat, AsyncAIChat
from getpass import getpass

For the following cell, input your OpenAI API key when prompted. **It will not be saved to the notebook**.

In [8]:
api_key = "token-abc123"

In [9]:
# params = {"temperature": 0.0}  # for reproducibility
# model = "gpt-3.5-turbo"  # in production, may want to use model="gpt-4" if have access

ai = AIChat(client_type="vLLM", api_key="token-abc123", model="/home/server_runner/Desktop/CloserModels/llama3_70B_awq", system="You are a helpful assistant.", console=False)

Let's start with a simple `is_palindrome()` function in Python, and track how long it takes to run. The output of this should be similar to what is shown in the ChatGPT webapp.

In [10]:
%%time
response = ai("Write an is_palindrome() function in Python.")
print(response)
print("\n\n")  # separate time from generated text for readability

Here is a simple implementation of an `is_palindrome()` function in Python:
```
def is_palindrome(s):
    """
    Returns True if the input string is a palindrome, False otherwise.
    """
    return s == s[::-1]
```
Here's an explanation of how it works:

* `s[::-1]` is a slice that reverses the input string `s`. For example, if `s` is `"hello"`, then `s[::-1]` would be `"olleh"`.
* We then compare the original string `s` with its reversed version using the `==` operator. If they are the same, then the string is a palindrome, and we return `True`. Otherwise, we return `False`.

Here's an example usage:
```
>>> is_palindrome("madam")
True
>>> is_palindrome("hello")
False
```
Note that this implementation is case-sensitive and considers spaces and punctuation as part of the string. If you want to ignore case, spaces, and punctuation, you can modify the function accordingly. For example:
```
def is_palindrome(s):
    s = s.lower().replace(" ", "").replace(",", "").replace(".", "")
    re

That's the typical implementation. However, there's a trick to cut the processing time in half, well known by technical hiring managers who want to trip up prospective candidates.

ChatGPT outputs the statistically most common implementation, but it's not necessairily the best. A second pass allows ChatGPT to refine its output.

In [11]:
%%time
response = ai("Make it more efficient.")
print(response)
print("\n\n")

Here's a more efficient implementation of the `is_palindrome()` function:
```
def is_palindrome(s):
    """
    Returns True if the input string is a palindrome, False otherwise.
    """
    left, right = 0, len(s) - 1
    while left < right:
        if s[left] != s[right]:
            return False
        left += 1
        right -= 1
    return True
```
Here's what's changed:

* Instead of creating a reversed copy of the entire string (`s[::-1]`), we use two pointers, `left` and `right`, to iterate through the string from both ends.
* We compare the characters at the `left` and `right` indices, and if they don't match, we immediately return `False`.
* If the characters match, we increment `left` and decrement `right` to move towards the center of the string.
* If the loop completes without finding a mismatch, we return `True`, indicating that the string is a palindrome.

This implementation has a time complexity of O(n/2), where n is the length of the input string, because we only nee

In [12]:
ai.total_length

980

In all, it took ~6 seconds and utilized 511 tokens. But there's a lot of unnecessary natter in the output:

- The conversational preamble before the code
- Docstrings and code comments
- A long explanation of the code which may be redundant to the above

All this natter adds latency and cost.

The easiest technique to guide AI text generation is to use **prompt engineering**, specifically to give it a new system prompt to say precisely what you want. As of June 27th 2023, the default ChatGPT API responds very well to commands.

Now, for the new `system` prompt:

In [15]:
system_optimized = """Write a Python function based on the user input.

You must obey ALL the following rules:
- Only respond with the Python function.
- Never put in-line comments or docstrings in your code."""

# ai_2 = AIChat(api_key=api_key, system=system_optimized, model=model, params=params)
ai_2 = AIChat(client_type="vLLM", api_key="token-abc123", model="/home/server_runner/Desktop/CloserModels/llama3_70B_awq", system=system_optimized, console=False)

In [16]:
%%time
response = ai_2("is_palindrome")
print(response)
print("\n\n")

def is_palindrome(s):
    return s == s[::-1]



CPU times: user 2.11 ms, sys: 89 μs, total: 2.2 ms
Wall time: 836 ms


In [17]:
%%time
response = ai_2("Make it more efficient.")
print(response)
print("\n\n")

def is_palindrome(s):
    i, j = 0, len(s) - 1
    while i < j:
        if s[i] != s[j]:
            return False
        i += 1
        j -= 1
    return True



CPU times: user 2.2 ms, sys: 0 ns, total: 2.2 ms
Wall time: 2.81 s


In [18]:
ai_2.total_length

214

~3 seconds total with 190 tokens used: that's 2x faster at 1/3 the cost!

## Create a Function

Now we can create a function to automate the two calls we did above for any arbitrary input.

For each call, we'll create an independent temporary session within simpleaichat and then clean it up. We'll also use a regex to strip unneded backticks.

In [29]:
from uuid import uuid4
import re

ai_func = AIChat(client_type="vLLM", api_key="token-abc123", model="/home/server_runner/Desktop/CloserModels/llama3_70B_awq", console=False)
def gen_code(query):
    id = uuid4()
    # ai_func.new_session(api_key=api_key, id=id, system=system_optimized, params=params, model=model)
    ai_func.new_session(client_type="vLLM", api_key="token-abc123", model="/home/server_runner/Desktop/CloserModels/llama3_70B_awq", system=system_optimized, console=False, id=id)
    _ = ai_func(query, id=id)
    response_optimized = ai_func("Make it more efficient.", id=id)

    ai_func.delete_session(id=id)
    return response_optimized

In [30]:
%%time
code = gen_code("is_palindrome")
print(code)
print("\n\n")

def is_palindrome(s):
    i, j = 0, len(s) - 1
    while i < j:
        if s[i] != s[j]:
            return False
        i += 1
        j -= 1
    return True



CPU times: user 0 ns, sys: 4.76 ms, total: 4.76 ms
Wall time: 3.62 s


In [31]:
%%time
code = gen_code("reverse string")
print(code)
print("\n\n")

def reverse_string(s):
    return ''.join(reversed(s))



CPU times: user 0 ns, sys: 2.66 ms, total: 2.66 ms
Wall time: 1.57 s


In [32]:
%%time
code = gen_code("pretty print dict")
print(code)
print("\n\n")

def pretty_print_dict(d, indent=0):
    for key, value in d.items():
        print('  ' * indent + str(key))
        if isinstance(value, dict):
            pretty_print_dict(value, indent + 1)
        else:
            print('  ' * (indent + 1) + str(value), end='\n\n')



CPU times: user 567 μs, sys: 2.11 ms, total: 2.68 ms
Wall time: 7.02 s


In [33]:
%%time
code = gen_code("load and flip image horizontally")
print(code)
print("\n\n")

```
import cv2

def load_and_flip_image_horizontally(image_path):
    return cv2.flip(cv2.imread(image_path), 1)
```



CPU times: user 3.01 ms, sys: 0 ns, total: 3.01 ms
Wall time: 3.96 s


In [34]:
%%time
code = gen_code("multiprocess hash")
print(code)
print("\n\n")

```
import hashlib
import multiprocessing

def hash_file(file_path):
    hash_object = hashlib.sha256()
    with open(file_path, 'rb') as f:
        for chunk in iter(lambda: f.read(4096), b""):
            hash_object.update(chunk)
    return file_path, hash_object.hexdigest()

def multiprocess_hash(file_paths):
    with multiprocessing.Pool() as pool:
        return pool.map(hash_file, file_paths)
```



CPU times: user 2.58 ms, sys: 0 ns, total: 2.58 ms
Wall time: 8.12 s


## Generating Optimized Code in a Single API Call w/ Structured Output Data

Sometimes you may not want to make two calls to OpenAI. One hack you can do is to define an expected structured output to tell it to sequentially generate the normal code output, then the optimized output.

This structure is essentially a different form of prompt engineering, but you can combine it with a system prompt if needed.

This will also further increase response speed, but may not necessairly result in fewer tokens used.

In [35]:
from pydantic import BaseModel, Field
import orjson

class write_python_function(BaseModel):
    """Writes a Python function based on the user input."""
    code: str = Field(description="Python code")
    efficient_code: str = Field(description="More efficient Python code than previously written")

In [36]:
# ai_struct = AIChat(api_key=api_key, console=False, model=model, params=params, save_messages=False)
ai_struct = AIChat(client_type="vLLM", api_key="token-abc123", model="/home/server_runner/Desktop/CloserModels/llama3_70B_awq", system="You are a helpful assistant.", console=False, save_messages=False)

In [25]:
%%time
response_structured = ai_struct("is_palindrome", output_schema=write_python_function)

# orjson.dumps preserves field order from the ChatGPT API
print(orjson.dumps(response_structured, option=orjson.OPT_INDENT_2).decode())
print("\n\n")

{
  "code": "def is_palindrome(s):\n    return s == s[::-1]",
  "efficient_code": "def is_palindrome(s):\n    n = len(s)\n    for i in range(n // 2):\n        if s[i] != s[n - i - 1]:\n            return False\n    return True"
}



CPU times: user 131 ms, sys: 1.39 ms, total: 133 ms
Wall time: 1.67 s


As evident, the output is a `dict` so you'd just return the `efficient_code` field.

In [26]:
print(response_structured["efficient_code"])

def is_palindrome(s):
    n = len(s)
    for i in range(n // 2):
        if s[i] != s[n - i - 1]:
            return False
    return True


In [27]:
ai_struct.total_length

161

Token-wise it's about the same, but there's a significant speedup in generation for short queries such as these.

Trying the other examples:

In [28]:
%%time
response_structured = ai_struct("reverse string", output_schema=write_python_function)
print(response_structured["efficient_code"])
print("\n\n")

def reverse_string(s):
    return ''.join(reversed(s))



CPU times: user 24.6 ms, sys: 250 µs, total: 24.9 ms
Wall time: 1.37 s


In [29]:
%%time
response_structured = ai_struct("load and flip image horizontally", output_schema=write_python_function)
print(response_structured["efficient_code"])
print("\n\n")

from PIL import Image

def load_and_flip_image_horizontally(image_path):
    return Image.open(image_path).transpose(Image.FLIP_LEFT_RIGHT)



CPU times: user 15.4 ms, sys: 2.15 ms, total: 17.6 ms
Wall time: 1.79 s


## MIT License

Copyright (c) 2023 Max Woolf

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
