# MetaPrompter

A tool that creates self prompts for dataframe interactions

In [1]:
system_prompt = """You are PrompterGPT
Yoy design prompt just inferring what is a database about
You will make the decision based on the first line of CSV or similar files
The output must be like the following one. It is just an example, the topic will be different:
---
You have a dataset about electric cars registered in Washington state, USA in 2020. It is available as a pandas DataFrame named `electric_cars`.

Each row in the dataset represents the count of the number of cars registered within a city, for a particular model.

The dataset contains the following columns.

- `city` (character): The city in which the registered owner resides.
- `county` (character): The county in which the registered owner resides.
- `model_year` (integer): The [model year](https://en.wikipedia.org/wiki/Model_year#United_States_and_Canada) of the car.
- `make` (character): The manufacturer of the car.
- `model` (character): The model of the car.
- `electric_vehicle_type` (character): Either "Plug-in Hybrid Electric Vehicle (PHEV)" or "Battery Electric Vehicle (BEV)".
- `n_cars` (integer): The count of the number of vehicles registered.
---

You goal is to provide a description of the dataset, and a description for the kind of content you might encounter in each column 
"""

In [2]:
import pandas as pd
from openai import OpenAI
import os

In [3]:
def column_names(csv_name):
    
    #creates a dataframe from the csv_file
    db = pd.read_csv(csv_name)
    #takes the columns names from the csv file as a single string
    column_names = db.columns.tolist()
    column_names_string = ','.join(column_names)
    
    return column_names_string

In [4]:
def messages(system_prompt, user_prompt):
    messages = [
        {"role":"system" , "content":system_prompt},
        {"role":"user"   , "content":user_prompt}
    ]
    return messages

In [5]:
def metaprompt(column_names, system_prompt=system_prompt):    
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
    client = OpenAI(api_key=OPENAI_API_KEY)
    
    metaprompt = client.chat.completions.create(
        model="gpt-3.5-turbo-1106",
        messages=messages(system_prompt, column_names),
        temperature=0.1)
    
    return metaprompt.choices[0].message.content

In [6]:
dublin_system_prompt = metaprompt(column_names("FrenchStockMarket.csv"))

In [9]:
dublin_user_prompt = "Suggest some data analysis questions that could be answered with this dataset."

messages = [
    {"role":"system" , "content":dublin_system_prompt},
    {"role":"user"   , "content":dublin_user_prompt}
]

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
client = OpenAI(api_key=OPENAI_API_KEY)

dublin_response = client.chat.completions.create(
    model="gpt-3.5-turbo-1106",
    messages=messages,
    temperature=0.4
)

In [10]:
print(dublin_response.choices[0].message.content)

Certainly! Here are some data analysis questions that could be answered using this dataset:

1. What is the average market capitalization for each sector?
2. Which company has the highest 52-week high price?
3. How many companies have a 52-week low price below a certain threshold?
4. What is the distribution of market capitalization across different sectors?
5. Which sector has the highest average last closing price?
6. Is there a correlation between the 52-week high price and market capitalization?
7. How many companies have a last closing price higher than their 52-week high price?
8. What is the range of market capitalization for companies in each sector?
9. How many companies have a last closing price lower than their 52-week low price?
10. What is the percentage change in stock price for each company over the past 52 weeks?

These questions can provide valuable insights into the stock performance, market capitalization, and sector-wise trends within the dataset.
