## Objective:   Building an AI Agent using OpenAI API
Building an AI Agent that will:     
    1. Understand the data context     
    2. Accept user queries in natural language       
    3. Use OpenAI GPT models to analyze and respond intelligently.

In [1]:
from openai import OpenAI

client = OpenAI(api_key='YOUR_OPENAI_API_KEY')

## Importing the Data
We are import a dataset based on Medical Insurance.   
We will use this data to build an AI Agent that can answer questions based on loan approvals.

In [2]:
import pandas as pd

df = pd.read_csv(r"C:\Users\NEELAM SUSHMITHA\OneDrive\Documents\RAMANASOFT INTERNSHIP\medical_insurance.csv")

Now, we will create a function to summarize the data:   
This summary will help the agent understand the structure without actually loading the entire data into the prompt (which would exceed token limits).

In [3]:
def create_data_summary(df):
    summary = f"The dataset has {df.shape[0]} rows and {df.shape[1]} columns.\n"
    summary += "Columns:\n"
    for col in df.columns:
        summary += f"- {col} (type: {df[col].dtype})\n"
    return summary

## Building the AI Agent Function  
we defined a function ai_agent that takes a user query and the dataset, summarizes the dataset structure, and creates a prompt combining both the context and the question.   
This prompt is then sent to OpenAI’s GPT-4o model using the client.chat.completions.create() method, and the model’s step-by-step, natural-language response is returned to the user.

In [4]:
def ai_agent(user_query, df):
    data_context = create_data_summary(df)

    prompt = f"""
You are a data expert AI agent.

You have been provided with this dataset summary:
{data_context}

Now, based on the user's question:
'{user_query}'

Think step-by-step. Assume you can access and analyze the dataset like a Data Scientist would using Pandas.

Give a clear, final answer.
    """

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2,
        max_tokens=500
    )

    answer = response.choices[0].message.content
    return answer

## Now, we will create an interactive loop where users can ask questions to the AI Agent:  
we have created a simple interactive loop that continuously prompts the user to ask questions.   
When the user inputs a query, it is passed to the ai_agent function, which processes it and returns a natural language answer based on the dataset.   
If the user types “exit,” the loop breaks and the program ends.

In [5]:
print("Welcome to Loan Review AI Agent!")
print("You can ask anything about the loan applicants data.")
print("Type 'exit' to quit.")

while True:
    user_input = input("\nYour question: ")
    if user_input.lower() == "exit":
        break
    response = ai_agent(user_input, df)
    print("\nAI Agent Response:")
    print(response)

Welcome to Loan Review AI Agent!
You can ask anything about the loan applicants data.
Type 'exit' to quit.



Your question:  Which feature has the strongest impact on the insurance charges?



AI Agent Response:
To determine which feature has the strongest impact on the insurance charges (expenses), we can follow these steps:

1. **Understand the Dataset**: We have a dataset with 1338 rows and 8 columns. The column of interest for this analysis is `expenses`, which represents the insurance charges.

2. **Identify Potential Features**: The features available in the dataset that might impact `expenses` are:
   - `age` (int64)
   - `gender` (object)
   - `bmi` (float64)
   - `children` (int64)
   - `discount_eligibility` (object)
   - `region` (object)
   - `premium` (float64)

3. **Data Preprocessing**: Before analysis, ensure that categorical variables (`gender`, `discount_eligibility`, `region`) are encoded appropriately, such as using one-hot encoding, to be used in correlation or regression analysis.

4. **Correlation Analysis**: Calculate the correlation coefficients between `expenses` and the numerical features (`age`, `bmi`, `children`, `premium`). This can be done usi


Your question:  Can you profile a typical low-cost and high-cost insurance applicant?



AI Agent Response:
To profile a typical low-cost and high-cost insurance applicant, we need to analyze the dataset by examining the characteristics associated with lower and higher insurance expenses. Here's a step-by-step approach to achieve this:

1. **Data Segmentation**: 
   - Divide the dataset into two segments: low-cost and high-cost applicants. This can be done by determining a threshold for expenses. For simplicity, we can use the median or a specific percentile (e.g., 25th percentile for low-cost and 75th percentile for high-cost) to segment the data.

2. **Descriptive Statistics**:
   - Calculate descriptive statistics (mean, median, mode) for each segment to understand the central tendency and distribution of each feature within the low-cost and high-cost groups.

3. **Feature Analysis**:
   - Analyze each feature to identify patterns or characteristics that differentiate low-cost from high-cost applicants. This includes:
     - **Age**: Determine if there is a significant


Your question:  exit


## Conclusion:
we can build an AI Agent using the OpenAI API.    
Using the OpenAI API, we can create agents that understand context, reason through information, and respond naturally to user queries.