# **GPT Demonstration**

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/sekhansen/text_algorithms_econ/blob/main/notebooks/7_gpt_demonstration.ipynb)


This notebook introduces how to use [OpenAI's API](https://openai.com/blog/openai-api) to interact with any GPT model. As a illustrative example, we will show how to use ChatGPT to classify a sequence of text from a job posting according to the extent to which it allows remote work.

In [8]:
%%capture
!pip3 install transformers openai

## Setup

In [10]:
import openai
import pandas as pd
import numpy as np
from transformers import GPT2TokenizerFast
import time

# add your OpenAI key
openai.api_key = "[YOUR-OPENAI-KEY]"

In [12]:
## list all available models
#openai.Model.list()

## Prompt generation

In order to obtain adequate results from the models, it's important to carefully craft prompts that define the task the we want the model to perform. In this case, we will create a prompt that gives a broad context related to the text, defines our four categories of interest, and specifies a format for the generated text. 

In [19]:
def prompt_generator(text):
    wfh_prompt = f"""
    You are a data expert working for the Bureau of Labor Statistics specialized in analyzing job postings. 
    Your task is to read the text of fragments of job postings and classify them into one of four categories based on the degree of remote work they allow. 
    The four categories and their definitions are:
    
    1. No remote work: The text doesn’t offer the possibility of working any day of the week remotely.
    2. Hybrid work: The text offers the possibility of working one or more days per week remotely but not the whole week.
    3. Fully remote: The text offers the possibility of working all days of the week remotely.
    4. Unspecified remote: The text mentions the possibility of working remotely but doesn’t clearly specify the extent of this possibility.
    
    You always need to provide a classification. If classification is unclear, say “Could not classify”.
    Please provide the classification and an explanation for your choice in the following format:
    
    - Classification: [Category Number] . [Category Name]
    - Explanation: [Explanatory text]
    
    Text of job posting: "{text}"
    """

    return wfh_prompt

In [20]:
example_posting = "*This position is eligible for telework and other flexible work arrangements. Employee participation is at the discretion of the supervisor."
prompt = prompt_generator(example_posting)
print(prompt)


    You are a data expert working for the Bureau of Labor Statistics specialized in analyzing job postings. 
    Your task is to read the text of fragments of job postings and classify them into one of four categories based on the degree of remote work they allow. 
    The four categories and their definitions are:
    
    1. No remote work: The text doesn’t offer the possibility of working any day of the week remotely.
    2. Hybrid work: The text offers the possibility of working one or more days per week remotely but not the whole week.
    3. Fully remote: The text offers the possibility of working all days of the week remotely.
    4. Unspecified remote: The text mentions the possibility of working remotely but doesn’t clearly specify the extent of this possibility.
    
    You always need to provide a classification. If classification is unclear, say “Could not classify”.
    Please provide the classification and an explanation for your choice in the following format:
    
   

## Generate answers with GPT models

Now that we have a prompt and an example to test it on, we will use [OpenAI's API](https://openai.com/blog/openai-api) to generate a response. Notice that the class that is used for interacting with ChatGPT (i.e. ChatCompletion) is not the same as the one to interact with other GPT models (i.e. Completion).

In [18]:
# process prompt with chat-gpt
response = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",          # ChatGPT
                    messages=[{"role": "system", "content": "You are a data expert working for the Bureau of Labor Statistics specialized in analyzing job postings."},
                              {"role": "user", "content": prompt}],
                    n=1,
                    temperature=0.1,                 # Higher values means the model will take more risks.
                    max_tokens=50,
                    top_p=1,
                    frequency_penalty=0.0,
                    presence_penalty=0.0,
                    )

print(response['choices'][0]['message']['content'].strip())

Classification: 4. Unspecified remote
Explanation: The text mentions the possibility of telework and other flexible work arrangements, but it doesn't clearly specify the extent of this possibility. It only states that employee participation is at the discretion of the supervisor


In [15]:
# # process prompt with GPT-3
# response = openai.Completion.create(
#                     model="text-davinci-003",
#                     prompt=prompt,
#                     n=1,
#                     temperature=0,                  # Higher values means the model will take more risks.
#                     max_tokens=20,
#                     top_p=1,
#                     frequency_penalty=0.0,
#                     presence_penalty=0.0,
#                     stop=["\n"]                     # Up to 4 sequences where the API will stop generating further tokens.
#                     )

# print(response["choices"][0]["text"])


