## Building a Code Assistant with OpenAI & MLflow

### Overview

Welcome to this comprehensive tutorial, where you'll embark on a fascinating journey through the integration of OpenAI's powerful language models with MLflow, where we'll be building an actually useful tool that can, with the simple addition of a decorator to any function that we declare, get immediate feedback within an interactive environment on code under active development.

### Learning Objectives

By the end of this tutorial, you will:

1. **Master OpenAI's GPT-4 for Code Assistance**: Understand how to leverage OpenAI's GPT-4 model for providing real-time coding assistance. Learn to harness its capabilities for generating code suggestions, explanations, and improving overall coding efficiency.
2. **Utilize MLflow for Enhanced Model Tracking**: Delve into MLflow's powerful tracking systems to manage machine learning experiments. Learn how to adapt the `Python Model` from within MLflow to control how the output of an LLM is displayed from within an interactive coding environment.
3. **Seamlessly Combine OpenAI and MLflow**: Discover the practical steps to integrate OpenAI's AI capabilities with MLflow's tracking and management systems. This integration exemplifies how combining these tools can streamline the development and deployment of intelligent applications.
4. **Develop and Deploy a Custom Python Code Assistant**: Gain hands-on experience in creating a Python-based code assistant using OpenAI's model. Then, actually see it in action as it is used within a Jupyter Notebook environment to give helpful assistance during development.
5. **Improve Code Quality with AI-driven Insights**: Apply AI-powered analysis to review and enhance your code. Learn how an AI assistant can provide real-time feedback on code quality, suggest improvements, and help maintain high coding standards.
6. **Explore Advanced Python Features for Robust Development**: Understand advanced Python features like decorators and functional programming. These are crucial for building efficient, scalable, and maintainable software solutions, especially when integrating AI capabilities.


### Key Concepts Covered

1. **OpenAI's GPT-4 Model**: Dive into the capabilities of the state-of-the-art GPT-4 model, understanding its role in generating human-like text.
2. **MLflow's Model Management**: Explore MLflow's features for tracking experiments, packaging code into reproducible runs, and managing and deploying models.
3. **Python Decorators and Functional Programming**: Learn about advanced Python concepts like decorators and functional programming for efficient code evaluation and enhancement.
4. **Regular Expressions and Error Handling**: Understand the implementation of regular expressions for pattern matching and effective error handling techniques in Python.

### MLflow's Significance

MLflow stands out in this tutorial as a pivotal tool for managing the lifecycle of machine learning projects. Its capabilities in model tracking, versioning, and deployment are essential for maintaining a robust and scalable machine learning workflow. The tutorial emphasizes MLflow's utility in enhancing productivity, ensuring reproducibility, and facilitating collaboration among data science teams.

With a blend of theoretical insights and practical coding examples, this tutorial is designed to offer a well-rounded learning experience, catering to both beginners and seasoned practitioners in the field of machine learning. Let's dive into the world of AI model management and optimization with OpenAI and MLflow!


In [None]:
import warnings

# Disable a few less-than-useful UserWarnings from setuptools and pydantic
warnings.filterwarnings("ignore", category=UserWarning)

In [None]:
import functools
import inspect
from IPython.display import HTML
import os
import shutil
import textwrap

import openai
import pandas as pd

import mlflow
from mlflow.models.signature import ModelSignature
from mlflow.pyfunc import PythonModel
from mlflow.types.schema import ColSpec, ParamSchema, ParamSpec, Schema


# Run a quick validation that we have an entry for the OPEN_API_KEY within environment variables
assert "OPENAI_API_KEY" in os.environ, "OPENAI_API_KEY environment variable must be set"

In [None]:
mlflow.set_experiment("Code Helper")

In [None]:
instruction = [
    {
        "role": "system",
        "content": (
            "You are a helpful expert Software Engineer who is here to assist me with my code and teach me along the way. When I paste in code, you will "
            "provide a brief explanation of what the code is intending to accomplish and whether or not it is clearly understandable. If the code is incorrect, "
            "has logical flaws, or is estimated to be fairly difficult to read, you will provide a concise explanation of what to change and the reasoning behind "
            "these changes. Your primary goal is to provide explanations, justifications and suggestions in order that I learn a better way of writing code. "
            "Please focus on code simplicity, maintainability, readability, and conformance to widely accepted patterns within the language of code that I submit to you."
        ),
        "role": "user",
        "content": "Please check my code for errors and provide me with suggestions on how to improve it: {code}"
    }
]

In [2]:
model_path = "/tmp/code-helper"

In [3]:
# This path cleanup is used to remove the model path if it already exists, provided in case you need to re-run this notebook in its entirety. 

if os.path.exists(model_path):
    shutil.rmtree(model_path)

In [None]:
# Define the model signature that will be used for both the base model and the eventual custom pyfunc implementation later.
signature = ModelSignature(
        inputs=Schema([ColSpec(type="string", name=None)]),
        outputs=Schema([ColSpec(type="string", name=None)]),
        params=ParamSchema(
            [
                ParamSpec(name="max_tokens", default=500, dtype="long"),
                ParamSpec(name="temperature", default=0, dtype="float"),
            ]
        ),
    )

# Save the base OpenAI model with the included instruction set (prompt)
mlflow.openai.save_model(
    model="gpt-4",
    task=openai.ChatCompletion,
    path=model_path,
    messages=instruction,
    signature=signature,
)

In [None]:
# Custom pyfunc implementation that applies text and code formatting to the output results from the OpenAI model
class CodeHelper(PythonModel):

    def __init__(self):
        self.model = None
    
    def load_context(self, context):

        self.model = mlflow.pyfunc.load_model(context.artifacts["model_path"])

    @staticmethod
    def _format_response(response):
        formatted_output = ""
        in_code_block = False

        for item in response:
            lines = item.split('\n')
            for line in lines:
                # Check for the start/end of a code block
                if line.strip().startswith("```"):
                    in_code_block = not in_code_block
                    formatted_output += line + '\n'
                    continue

                if in_code_block:
                    # Don't wrap lines inside code blocks
                    formatted_output += line + '\n'
                else:
                    # Wrap lines outside of code blocks
                    wrapped_lines = textwrap.fill(line, width=80)
                    formatted_output += wrapped_lines + '\n'

        return formatted_output

    def predict(self, context, model_input, params):

        # Call the loaded OpenAI model instance to get the raw response
        raw_response = self.model.predict(model_input, params=params)

        # Return the formatted response so that it is easier to read
        return self._format_response(raw_response)

In [None]:
# Define the location that we'll be using to save (and load) our custom pyfunc implementation
final_model_path = "/tmp/my_code_helper"

In [None]:
# As before, we're cleaning up the destination location for the serialized custom model, in case you want to run this notebook several times.
if os.path.exists(final_model_path):
    shutil.rmtree(final_model_path)

In [None]:
# Define the location of the 
artifacts = {"model_path": model_path}

with mlflow.start_run():
    mlflow.pyfunc.save_model(
        path=final_model_path,
        python_model=CodeHelper(),
        input_example=["x = 1"],
        signature=signature,
        artifacts=artifacts,
    )

In [None]:
loaded_helper = mlflow.pyfunc.load_model(final_model_path)

In [None]:
def code_inspector(model):
    """
    Function decorator that will evaluate the implementation of any decorated function and provide feedback on it when called
    
    Args:
        model: The MLflow pyfunc model that will be used to evaluate the code
    """
    def decorator_check_my_function(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            try:
                parsed_func = inspect.getsource(func)
                response = model.predict(parsed_func)
                # Print the response so that even if the code doesn't execute properly, we'll get feedback about what to change
                print(response)
            except Exception as e:
                print("Error during model prediction or formatting:", e)
            return func(*args, **kwargs)
        return wrapper
    return decorator_check_my_function

In [None]:
@code_inspector(loaded_helper)
def summing_function(n):
    sum_result = 0

    intermediate_sums = {}

    for i in range(1, n + 1):
        intermediate_sums[str(i)] = sum([x for x in range(1, i + 1)])
        for key in intermediate_sums:
            if key == str(i):
                sum_result = intermediate_sums[key]
    
    final_sum = sum([intermediate_sums[key] for key in intermediate_sums if int(key) == n])

    return int(str(final_sum))

In [None]:
summing_function(1000)

In [None]:
@code_inspector(loaded_helper)
def one_liner(n):
    return (lambda f, n: f(f, n))(lambda f, n: n * f(f, n - 1) if n > 1 else 1, n) if isinstance(n, int) and n >= 0 else "Invalid input"


In [None]:
one_liner(10)

In [None]:
@code_inspector(loaded_helper)
def processData(data):
    a = []
    if len(data) == 0:
        print("No data")
        return
    else:
        for d in range(len(data)):
            if data[d] % 2 == 0:
                a.append(data[d] + 1)
            else:
                a.append(data[d])
        index = 0
        while index < len(a):
            if a[index] % 2 != 0:
                a[index] = a[index] * 2
            index = index + 1
        return a

In [None]:
processData([1,2,3,4,5,6,7,8,9,10])

In [None]:
@code_inspector(loaded_helper)
def find_phone_numbers(text):
    
    pattern = "(\d{3})-\d{2}-\d{4}"

    import re

    compiled_pattern = re.complie(pattern)

    phone_numbers = compiled_pattern.findall(text)
    first_number = phone_numbers[0]

    print(f"First found phone number: {first_number}")
    return phone_numbers

In [None]:
find_phone_numbers("Give us a call at 888-867-5309")