# Generate custom data using Azure OpenAI's ChatGPT
> In this session, I will demonstrate the power of Azure OpenAI's ChatGPT in generating custom data. This capability is highly advantageous as it allows for the creation of personalized datasets, which can greatly enhance personal learning or enable effective class training.

> Before we begin, there are a few considerations to keep in mind. To follow along with me, you will need to update the **config** file before executing notebook. In the **config file**, you should update the Azure OpenAI parameters for **azure_oai_endpoint**, **azure_oai_key**, **azure_oai_model**, **azure_openai_api_type**, and **azure_openai_api_version** along with **pd.read_csv("F:\demos\mttcommunity\config.csv")**. These parameters are necessary for the integration with Azure OpenAI services. Make sure to update the config file accordingly before proceeding.

### How to use ChatGPT API parameters?
In the below example, more parameters are added to openai.ChatCompletion.create() to generate a response. Here’s what each means:
- The **engine** parameter specifies which language model to use (“text-davinci-002” is the most powerful GPT-3 model at the time of writing)
- The **prompt** parameter is the text prompt to generate a response to
- The **max_tokens** parameter sets the maximum number of tokens (words) that the model should generate
- The **temperature** parameter controls the level of randomness in the generated text
- The **stop** parameter can be used to specify one or more strings that should be used to indicate the end of the generated text
- If you want to generate multiple responses, you can set **n** to the number of responses you want returned
- The **strip()** method removes any leading and trailing spaces from the text.

### Here, we are using the Python pandas library to read the config file. 

In [1]:
import pandas as pd

az_openai_conf = pd.read_csv("F:\demos\mttcommunity\config.csv")

### In this part of the code, we are assigning the parameter values from the config file to specific variables. By assigning the parameter values, we can make use of them throughout the code for our integration with Azure OpenAI services.

In [2]:
# Add OpenAI import
import openai

# Set OpenAI configuration settings
openai.api_type = az_openai_conf.azure_openai_api_type[0]
openai.api_base = az_openai_conf.azure_oai_endpoint[0]
openai.api_version = az_openai_conf.azure_openai_api_version[0]
openai.api_key = az_openai_conf.azure_oai_key[0]


azure_oai_model=az_openai_conf.azure_oai_model[0]
azure_openai_api_temperature=0
azure_openai_api_max_tokens=700

### This is optional as it will only return information about the model

In [None]:
models = openai.Model.list()
print(models)

### In this part of the code, we are defining the context by assigning a role to the system. The context represents the specific role or perspective that the system assumes in a conversation or interaction.

In [3]:
system_context = "You are a Data Engineer with expertise in Python. Your task is to create a sample dataset and use Python to load and analyze it."

### Here, we have the user input or prompts, which can be considered as requests or queries made by the user. These prompts are the statements or questions that the user wants the system or the AI model to respond to.

In [4]:
user_prompt = """
### Provide a 10 rows of sales history data file for bicycle accessories with details in the example format

# This data file should contain sales history data for at least 10 bicycle accessories.

Take this in consideration: 
order quantity (25-70) and prices (50-150 purchase, 125-400 selling). 

Example:
ProductId,ProductName,ProductType,Color,OrderQuantity,Size,Category,Country,Date,PurchasePrice,SellingPrice
001,Bike Lock,Accessory,Red,50,Large,Hydration,France,2023-01-01,59.78,128.98

# Provide results in a tabular format.

# dont provide any additional information
"""

### In this code snippet, we are using chatcompletion to generate a response from Azure OpenAI ChatGPT. We pass several parameters to guide the generation process:
- **System and user roles:** We assign specific roles to the system and the user to define their positions and guide their interactions within the conversation.

- **Azure OpenAI model:** We specify the Azure OpenAI model that will be used for generating the response. The model contains pre-trained knowledge and language understanding capabilities.

- **Temperature:** The temperature parameter controls the randomness of the generated response. Higher values like 0.8 make the response more diverse and creative, while lower values like 0.2 make it more focused and deterministic.

- **Max tokens:** This parameter sets the maximum length of the response generated by the model. It ensures that the response doesn't exceed a certain number of tokens or words.

- **Prompts:** The prompts are the user's input or queries that we discussed earlier. They provide the context and information for generating the appropriate response.

By providing these parameters to the chatcompletion function, we enable the Azure OpenAI ChatGPT model to generate a response based on the given prompts and the specified parameters.

In [5]:
# Build the messages array
prompt =[
    {"role": "system", "content": system_context},
    {"role": "user", "content": user_prompt}
]


# Call the Azure OpenAI model
response = openai.ChatCompletion.create(
    engine=azure_oai_model,
    temperature=azure_openai_api_temperature,
    max_tokens=azure_openai_api_max_tokens,
    messages=prompt
)

print('\n' + response.choices[0].message.content + '\n')



ProductId | ProductName | ProductType | Color | OrderQuantity | Size | Category | Country | Date | PurchasePrice | SellingPrice
--- | --- | --- | --- | --- | --- | --- | --- | --- | --- | ---
001 | Bike Lock | Accessory | Red | 50 | Large | Hydration | France | 2023-01-01 | 59.78 | 128.98
002 | Bike Bell | Accessory | Blue | 30 | Medium | Safety | Germany | 2023-01-02 | 72.45 | 175.60
003 | Bike Light | Accessory | Black | 45 | Small | Lighting | USA | 2023-01-03 | 89.99 | 225.00
004 | Bike Pump | Accessory | Silver | 60 | Large | Inflation | Canada | 2023-01-04 | 120.00 | 350.00
005 | Bike Helmet | Accessory | Yellow | 25 | Medium | Safety | Australia | 2023-01-05 | 99.99 | 199.99
006 | Bike Basket | Accessory | Brown | 70 | Large | Storage | UK | 2023-01-06 | 55.00 | 150.00
007 | Bike Gloves | Accessory | Green | 40 | Small | Apparel | France | 2023-01-07 | 45.99 | 99.99
008 | Bike Water Bottle | Accessory | Pink | 55 | Medium | Hydration | USA | 2023-01-08 | 65.00 | 149.99
009 | Bi

In this code snippet, we are modifying the prompt to generate a Python script for generating the Bicycle Accessories dataset. We provide examples of how the file should look, including the structure and format of the data.

By modifying the prompt in this way, we instruct the code to generate a Python script that can create the Bicycle Accessories dataset according to the provided examples and save it to the specified location. This allows for the efficient generation and storage of the dataset for further use.

In [6]:
user_prompt = """
Write a Python pandas script that generates a sales history csv data file for bicycle accessories with details in the example format

The file should contain 1000 rows of data in the {example} format, representing sales over a period of 5 years.

Consideration: 
1. order quantity (25-70) and prices (50-150 purchase, 125-400 selling). Provide results in a tabular format.
2. save the csv file to F:\demos\mttcommunity\datafile\sales_history.csv

example:
ProductId,ProductName,ProductType,Color,OrderQuantity,Size,Category,Country,Date,PurchasePrice,SellingPrice
001,Bike Lock,Accessory,Red,50,Large,Hydration,France,2023-01-01,59.78,128.98

# Provide results in a tabular format.

# please do not provide any feedback before and after when providing the python script.
"""

In this code snippet, we define a Python function to enhance reusability and enable the saving of the generated dataset as a CSV file.

By including this functionality within the function, we simplify the process of generating the dataset and automatically save it as a CSV file in the specified location. This makes it easier to manage and access the generated data for further analysis or usage in other applications.

In [7]:
# Build the messages array
prompt =[
    {"role": "system", "content": system_context},
    {"role": "user", "content": user_prompt}
]

# define a function to generate the sales data
def generate_bicycle_sales_data(
     azure_oai_model
    ,azure_openai_api_temperature
    ,azure_openai_api_max_tokens
    ,prompt 
):
  try:
    response = openai.ChatCompletion.create(
        engine=azure_oai_model,
        temperature=azure_openai_api_temperature,
        max_tokens=azure_openai_api_max_tokens,
        messages=prompt
    )
    # Create a global variable to store the response
    global generate_sales_history_data
    # Store the response in the global variable
    generate_sales_history_data = (response.choices[0].message.content).replace("```python","").replace("```","").strip()
    # execute the generated python script
    exec(generate_sales_history_data)
  except Exception as ex:
    print(ex)

In this part of the code, we execute the function that we defined previously. By calling the function with appropriate arguments, we trigger the execution of the code inside the function.

In [8]:
generate_bicycle_sales_data(azure_oai_model,azure_openai_api_temperature,azure_openai_api_max_tokens,prompt)

  ProductId  ProductName ProductType   Color  OrderQuantity    Size  \
0       002    Bike Pump   Accessory   Black             47   Large   
1       004    Bike Bell   Accessory    Blue             65   Small   
2       001    Bike Lock   Accessory    Blue             25   Large   
3       003   Bike Light   Accessory  Yellow             67  Medium   
4       005  Bike Helmet      Safety   Black             56   Large   

    Category Country       Date  PurchasePrice  SellingPrice  
0      Tools   Spain 2026-09-08         105.57        204.07  
1      Bells   Italy 2023-10-27          50.63        159.32  
2  Hydration      UK 2026-02-12          88.79        297.57  
3     Lights   Spain 2026-11-14          83.38        392.76  
4    Helmets      UK 2027-12-28          84.61        133.83  


This code is optional, but if you're interested, I can execute it to reveal the code generated by ChatGPT behind the scenes. 

In [9]:
print(generate_sales_history_data)

import pandas as pd
import numpy as np
import random
from datetime import datetime, timedelta

# Define product details
product_ids = ['001', '002', '003', '004', '005']
product_names = ['Bike Lock', 'Bike Pump', 'Bike Light', 'Bike Bell', 'Bike Helmet']
product_types = ['Accessory', 'Accessory', 'Accessory', 'Accessory', 'Safety']
colors = ['Red', 'Blue', 'Green', 'Yellow', 'Black']
sizes = ['Small', 'Medium', 'Large']
categories = ['Hydration', 'Tools', 'Lights', 'Bells', 'Helmets']
countries = ['France', 'Germany', 'Spain', 'Italy', 'UK']

# Define date range
start_date = datetime(2023, 1, 1)
end_date = datetime(2027, 12, 31)

# Generate random sales data
data = []
for i in range(1000):
    product_id = random.choice(product_ids)
    product_name = product_names[product_ids.index(product_id)]
    product_type = product_types[product_ids.index(product_id)]
    color = random.choice(colors)
    order_quantity = random.randint(25, 70)
    size = random.choice(sizes)
    category = cate