# **INTRODCTION**
<hr>

### **1. LIBRARIES**
- **Installing dependencies:** using `pip`, install `python-dotenv`,
-  **How to get an Open AI API Key:** <a href="https://platform.openai.com/">Open AI Platform</a> 
-  **How to store your API keys SAFELY:** `.env`, `.gitignore`

In [1]:
# Installing dependencies
%pip install python-dotenv

Note: you may need to restart the kernel to use updated packages.


In [1]:
# Importing Dependencies
from dotenv import load_dotenv
import os

In [None]:
# load_dotenv(), os.envrion, os.environ.get()
load_dotenv()

# Getting
os.environ['OPENAI_API_KEY']
# If not found it will error try add an A
os.environ['KEY_DONT_EXIST']

# By using .get() you can add a fallback value
os.environ.get("OPENAI_API_KEY", None)

# Can help deal with errors:
KEY = os.environ.get('KEY_DONT_EXIST', None)
if not KEY:
    print("KEY NOT FOUND")

In [None]:
# Assign OPENAI_API_KEY 
load_dotenv()
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", None)

### **2. OpenAI Library**
- **OpenAI Response API:** Refer to <a href="https://platform.openai.com/docs/api-reference/responses/create?lang=python">OpenAI Responses Docs</a>
- **OpenAI ChatCompletions API:** Refer to <a href="https://platform.openai.com/docs/api-reference/chat/create?lang=python">OpenAI Chat Completions Docs</a>

In [16]:
# Install openai
%pip install openai
from openai import OpenAI

### Initialising OpenAI Client and Using Responses API

In [18]:
# Call openai with your API key
client = OpenAI(api_key=OPENAI_API_KEY) # convention is to call it client

response = client.responses.create(
    model="gpt-4o-mini", # Try using restricted model to show restrictions
    input="Hi I'm a computer science student!"
)

Response(id='resp_68beee9c1f1481979a217d453732328c09be5d29cc5dfc3e', created_at=1757343388.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4o-mini-2024-07-18', object='response', output=[ResponseOutputMessage(id='msg_68beee9d14b08197b51275e7cef55b9609be5d29cc5dfc3e', content=[ResponseOutputText(annotations=[], text="That's great to hear! What are you currently studying or working on in computer science? Do you have any specific topics or projects you’d like to discuss?", type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=Reasoning(effort=None, generate_summary=None, summary=None), safety_identifier=None, service_tier='default', status='completed', text=ResponseTextConfig(format

In [38]:
# show response object
print(response)
# To extract content from response *Go through step by step
response.output[0].content[0].text

Response(id='resp_68beee9c1f1481979a217d453732328c09be5d29cc5dfc3e', created_at=1757343388.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-4o-mini-2024-07-18', object='response', output=[ResponseOutputMessage(id='msg_68beee9d14b08197b51275e7cef55b9609be5d29cc5dfc3e', content=[ResponseOutputText(annotations=[], text="That's great to hear! What are you currently studying or working on in computer science? Do you have any specific topics or projects you’d like to discuss?", type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=Reasoning(effort=None, generate_summary=None, summary=None), safety_identifier=None, service_tier='default', status='completed', text=ResponseTextConfig(format

"That's great to hear! What are you currently studying or working on in computer science? Do you have any specific topics or projects you’d like to discuss?"

#### Introducing ChatCompletions (What were using)

In [None]:
# ChatCompletions allows for control over chain of messages also simpler

# Message object roles: "system", "user", "assistant"
messages = []
messages.append({"role": "system", "content": "You are a helpful chat assistant"})

In [None]:
# Adding a user query
query = "Hi I'm a computer science student!"
messages.append({"role":"user", "content": query})

# How to call chat completions
response = client.chat.completions.create(
    model="gpt-4o-mini", # Try using restricted model to show restrictions
    messages=messages
)
response

In [46]:
# Inspect ChatCompletion object
print(response)

# ChatCompletion Object Text Extraction
response.choices[0].message.content

ChatCompletion(id='chatcmpl-CDXrn9z2RI0qYZxe4tqy0D3UiQfta', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Hi there! That's great to hear! What specific areas of computer science are you interested in, or do you have any questions or topics you'd like to discuss?", refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1757344483, model='gpt-4o-mini-2024-07-18', object='chat.completion', service_tier='default', system_fingerprint='fp_e665f7564b', usage=CompletionUsage(completion_tokens=32, prompt_tokens=24, total_tokens=56, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))


"Hi there! That's great to hear! What specific areas of computer science are you interested in, or do you have any questions or topics you'd like to discuss?"

### **3. CALLING LANGCHAIN**
- **Langchain OpenAI**: `llm = ChatOpenAI()`
- **Message Types:** `SystemMessage`, `HumanMessage`, `AIMessage`, Message chaining (mention later)
- **Invoking LLM**: `llm.invoke(messages)`
- **Prompting the AI:** Prompt engineering, giving AI instructions!
- **Langchain-OpenAI:** Documentation: <a href="https://python.langchain.com/docs/integrations/chat/openai/">Langchain Docs</a>
- **LangSmith Tracing *(Optional)*** 

In [None]:
# Install and import langchain
%pip install langchain langchain_openai
from langchain_openai import ChatOpenAI # import ChatCompletions

In [None]:
# Intialise our "llm" (ChatOpenAI) no API Key needed, auto reads
"""
What is Temperature?
Low temperature (e.g., 0 or 0.1):
- The model is more deterministic and focused. 
- It will give more predictable, safe, and repetitive answers.
- *What we want for data cleaning agent*
High temperature (e.g., 0.7 or 1):
- The model is more creative and random. 
- It may generate more diverse or unexpected responses.
- Higher temperatures: Creative marketing content, storytelling, chatbots
"""
llm = ChatOpenAI(temperature=0, model="gpt-4o-mini")

In [None]:
from langchain.schema import HumanMessage, SystemMessage # Import Message schemas

# Creating messages with Langchain's message schemas
messages=[]
# System Message (AI prompt/instructions)
messages.append(SystemMessage(content="You are a helpful chat assistant"))
# User Query
query = "Hi I'm a computer science student!"
messages.append(HumanMessage(content=query))

# inspect messages
print(messages)

In [None]:
# Invoking LLM with messages
response = llm.invoke(messages)

In [None]:
# Inspect Response
print(response)

# Extracting Response Text
response.content # Much simpler!!

### **4. RUNNING CODE WITH AI**
- **How to Prompt Engineer:** Prompt the AI to generate datacleaning code
- **`exec()` Function:** Running the code, assume existing variables

In [None]:
# Install and import pandas, numpy
%pip install pandas numpy

import pandas as pd
import numpy as np

In [95]:
# Import in a sample df (Life Expectancy Data.csv)
df = pd.read_csv("../datasets/Life Expectancy Data.csv")

# Intialise llm
llm = ChatOpenAI(temperature=0, model="gpt-4o-mini")

In [None]:
messages=[]

# Create a datacleaning agent prompt (Explain each rule each line)
prompt = """ You are a data cleaning agent

Help generate pandas code to deal with user data cleaning queries.
Rules:
- Return only executable Python code, no explanations, no markdown blocks
- Assume dataframe is stored in variable 'df', modify it INPLACE
- In order for user to see your message you have to print your commands/messages
"""
messages.append(SystemMessage(content=prompt))

# Give the AI context (dataset info)
dataset_info = f"Dataset Info: {df.shape} Sample Data: {df.head(5)}"
messages.append(SystemMessage(content=dataset_info))

In [100]:
# User query
"""
Sample Queries:
- how many missing values in dataset? (Easy, minimal prompt engineering)
- impute misssing values in dataset? (May error, modifying df, introduce rules)
"""

query="how many missing values in dataset?"
messages.append(HumanMessage(content=query))

In [None]:
# Call LLM and save output to responses
response = llm.invoke(messages)

# Save output content as generated_code
generated_code = response.content
print(generated_code)

In [None]:
# Run generated code
exec(generated_code)

# Run with exception handling to avoid errors stopping program
try:
    original_df = df.copy() # store a copy in case
    exec(generated_code)
except Exception as e:
    print(f"Error: {e}")
    print(f"Generated Code: {generated_code}")
    df = original_df # reverting df in function we would (return original_df)

### **5. CREATING HELPER FUNCTIONS**
- **Running Helper Functions:** Creating "helper docs"
- **Importing Functions:** How to import functions from other notebooks using `import-ipynb`


In [116]:
# Import in a sample df (Life Expectancy Data.csv)
df = pd.read_csv("../datasets/Life Expectancy Data.csv")

# Creating a helper function to assist AI
def impute_with_mean(series):
    series.fillna(series.mean(), inplace=True)
    return series

In [110]:
messages=[]

# Copy Paste previous prompt
prompt = """ You are a data cleaning agent

Help generate pandas code to deal with user data cleaning queries.
Rules:
- Return only executable Python code, no explanations, no markdown blocks
- Assume dataframe is stored in variable 'df', modify it INPLACE
- In order for user to see your message you have to print your commands/messages
"""
messages.append(SystemMessage(content=prompt))

# Give the AI dataset info
dataset_info = f"Dataset Info: {df.shape} Sample Data: {df.head(5)}"
messages.append(SystemMessage(content=dataset_info))

In [111]:
# Helper Docs (introduces AI to your functions)
helper_docs = """ HELPER FUNCTIONS AVAILABLE:
impute_with_mean(series)
- Takes a pandas series and imputes with mean
- Returns imputed series
""" 
messages.append(SystemMessage(content=helper_docs))

# Sample query to test out function
query="impute numeric columns with mean"
messages.append(HumanMessage(content=query))

In [None]:
# Call LLM and save output to responses
response = llm.invoke(messages)

# Save output content as generated_code
generated_code = response.content
print(generated_code)

In [None]:
# Try running the generated code with helper docs
try:
    original_df = df.copy()
    exec(generated_code)
except Exception as e:
    print(f"Error: {e}")
    print(f"Generated Code: {generated_code}")
    df = original_df

# Check number of means:
print(df.isna().sum())

### **OTHER POINTS**
- Message Chaining
- Langchain Agents with Tool Calling
- `.py` vs `.ipynb`