# Supervised Machine Learning & OpenAI: End-to-End Workflow

## 1. Dataset Preparation for Fine-Tuning (Code Generation Task)

In [5]:
!pip install pandas

import pandas as pd

# Example dataset for fine-tuning code generation (input: prompt, output: code)
data = [
    {
        "prompt": "Write a Python function to add two numbers.",
        "completion": "def add(a, b):\n    return a + b"
    },
    {
        "prompt": "Write a Python function to check if a number is even.",
        "completion": "def is_even(n):\n    return n % 2 == 0"
    },
    {
        "prompt": "Write a Python function to find factorial.",
        "completion": "def factorial(n):\n    return 1 if n == 0 else n * factorial(n-1)"
    },
]

df = pd.DataFrame(data)
df.to_csv("code_generation_dataset.csv", index=False)
df.head()

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 25.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Unnamed: 0,prompt,completion
0,Write a Python function to add two numbers.,"def add(a, b):\n return a + b"
1,Write a Python function to check if a number i...,def is_even(n):\n return n % 2 == 0
2,Write a Python function to find factorial.,def factorial(n):\n return 1 if n == 0 else...


## 2. Supervised Model: Train-Test Split and Fine-Tuning Prep
For OpenAI fine-tuning, you typically format your data as JSONL. Let's convert and show a basic split.

In [2]:
import json

# Train-test split (80-20 for demo)
train_df = df.sample(frac=0.8, random_state=42)
test_df = df.drop(train_df.index)

# Save as JSONL for OpenAI fine-tuning
def save_jsonl(df, filename):
    with open(filename, "w") as f:
        for _, row in df.iterrows():
            entry = {
                "messages": [
                    {"role": "user", "content": row['prompt']},
                    {"role": "assistant", "content": row['completion']}
                ]
            }
            f.write(json.dumps(entry) + "\n")

save_jsonl(train_df, "train.jsonl")
save_jsonl(test_df, "test.jsonl")

In [3]:
!pip install openai python-dotenv --quiet
from openai import OpenAI
from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')
client = OpenAI()


[notice] A new release of pip is available: 25.1 -> 25.1.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### A. Prompt Engineering for Code Generation

In [4]:
def generate_code(prompt):
    response = client.chat.completions.create(
        model="gpt-4o",  # Use "gpt-4-turbo" or "gpt-3.5-turbo" as fallback
        messages=[
            {"role": "system", "content": "You are a helpful Python code assistant."},
            {"role": "user", "content": prompt},
        ],
        max_tokens=150,
        temperature=0.2,
    )
    return response.choices[0].message.content

prompt = "Write a Python function to reverse a string."
print("Prompt:", prompt)
print("Generated Code:\n", generate_code(prompt))

Prompt: Write a Python function to reverse a string.


RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

### B. Code Optimization Using OpenAI

In [None]:
raw_code = """
def add_numbers(a, b):
    result = 0
    result = a + b
    return result
"""

opt_prompt = f"Optimize the following Python code for simplicity and efficiency:\n\n{raw_code}"

optimized_code = generate_code(opt_prompt)
print("Optimized Code:\n", optimized_code)


Optimized Code:
 The given function can be simplified by directly returning the sum of `a` and `b` without the need for an intermediate variable. Here's the optimized version:

```python
def add_numbers(a, b):
    return a + b
```

This version is both simpler and more efficient.


### C. Prompt Engineering for Debugging

In [None]:
buggy_code = """
def divide(a, b):
    return a / b
print(divide(10, 0))
"""

debug_prompt = f"Find and fix the bug in the following Python code:\n\n{buggy_code}"

fixed_code = generate_code(debug_prompt)
print("Fixed Code:\n", fixed_code)


Fixed Code:
 The code you provided will raise a `ZeroDivisionError` when attempting to divide by zero. To handle this situation gracefully, you can add a check to ensure that the divisor `b` is not zero before performing the division. Here's a modified version of the code that includes error handling:

```python
def divide(a, b):
    if b == 0:
        return "Error: Division by zero is not allowed."
    return a / b

print(divide(10, 0))
```

With this change, the function will return an error message instead of raising an exception when `b` is zero.


## 4. [Optional] Evaluate Generated Code with Automated Testing
You can run generated code snippets using exec, but always sanitize untrusted code. For demonstration:

In [None]:
import re
def extract_code_block(text):
    """
    Extracts the first python code block from LLM output.
    If no code block, returns raw text.
    """
    code_blocks = re.findall(r"```(?:python)?(.*?)```", text, re.DOTALL | re.IGNORECASE)
    if code_blocks:
        return code_blocks[0].strip()
    return text.strip()

def test_reverse_string():
    prompt = "Write a Python function named `reverse_string` that takes a string and returns it reversed."
    code_response = generate_code(prompt)
    print("LLM Response:\n", code_response)
    code = extract_code_block(code_response)
    print("Extracted Code:\n", code)
    exec(code, globals())  # Define function in notebook/global scope
    try:
        assert reverse_string("OpenAI") == "IAnepO"
        assert reverse_string("") == ""
        assert reverse_string("a") == "a"
        print("Test passed!")
    except Exception as e:
        print("Test failed:", e)

test_reverse_string()

LLM Response:
 Certainly! Below is a Python function named `reverse_string` that takes a string as input and returns the reversed version of that string.

```python
def reverse_string(s):
    """
    Reverses the given string.

    Parameters:
    s (str): The string to be reversed.

    Returns:
    str: The reversed string.
    """
    return s[::-1]

# Example usage:
# reversed_str = reverse_string("hello")
# print(reversed_str)  # Output: "olleh"
```

This function uses Python's slicing feature to reverse the string. The slice `[::-1]` starts from the end of the string and moves backwards, effectively reversing it.
Extracted Code:
 def reverse_string(s):
    """
    Reverses the given string.

    Parameters:
    s (str): The string to be reversed.

    Returns:
    str: The reversed string.
    """
    return s[::-1]

# Example usage:
# reversed_str = reverse_string("hello")
# print(reversed_str)  # Output: "olleh"
Test passed!


## Summary
*Dataset Preparation:* Created and exported a dataset for code generation tasks.

*Modeling & Fine-Tuning:* Demonstrated how to prepare data for OpenAI fine-tuning.

*Prompt Engineering:* Showed prompts for code generation, code optimization, and debugging.

*OpenAI API Usage:* Used the OpenAI API for advanced code assistance.