# Prompt Engineering: Use OpenAI to Analyze Twitter Data 
This is a simple tutorial teaching prompt engineering basics and analyzing Twitter data with OpenAI large language models (LLM).
Please purchase an [OpenAI API](https://openai.com/index/openai-api/) and store it in a safe place. This tutorial use [AWS Secretes Manager](https://aws.amazon.com/secrets-manager/) to store the API keys.  

## Large Language Model Basics
LLM repeatable predicts the next world using supervised learning. To predict the following sentence: 

`Learning data science in the cloud with AI`

A model needs to learn to predict the following steps:

|Input|Output|
|:---|---|
|Learning data science |in |
|Learning data science in |the | 
|Learning data science in the |cloud |
|Learning data science in the cloud |with |
|Learning data science in the cloud with |AI|

To train a LLM model:
1. Training a base LLM model on a large amount of training data to predict the next word 
2. Fine-tune on examples where outputs follow instructions in the input 
3. Human rates quality of different LLM outputs 
4. Tune LLM to generate outputs with higher rates using RLHF (Reinforcement learning from human feedback)

## Set up OpenAI Models

Load the API keys with AWS Secrets Manage Function 

In [1]:
import boto3
from botocore.exceptions import ClientError
import json

def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        raise e

    secret = get_secret_value_response['SecretString']
    
    return json.loads(secret)

Install openai package

In [2]:
pip install openai

Collecting openai
  Downloading openai-1.52.1-py3-none-any.whl.metadata (24 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Downloading openai-1.52.1-py3-none-any.whl (386 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m386.9/386.9 kB[0m [31m24.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading distro-1.9.0-py3-none-any.whl (20 kB)
Downloading jiter-0.6.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (325 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m325.2/325.2 kB[0m [31m35.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: jiter, distro, openai
Successfully installed distro-1.9.0 jiter-0.6.1 openai-1.52.1
Note: you may need to restart the kernel to use updated packages.


Load the OpenAI API key and define a `openai_help` function.

In [3]:
from openai import OpenAI

openai_api_key  = get_secret('openai')['api_key']
client = OpenAI(api_key=openai_api_key)
model = 'gpt-4o'
temperature = 0

def openai_help(messages, model=model, temperature =temperature ):
    messages = messages
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature

    )
    return response.choices[0].message.content

Temperature: 
- Low temperature: always choose the most likely response, reliable, predictable responses  
- High temperature: diverse responses, more creative responses

Tokens and Models: 
- LLM predicts tokens, which are commonly occurring sequences of characters. 
- One token is about four characters in English, and 100 tokens are roughly 75 words. Check [token estimate](https://platform.openai.com/tokenizer).
- Different models can process various amounts of tokens with different performance and cost. Check [OpenAI models](https://platform.openai.com/docs/models) for more details.

Roles:
- system: specify the overall tone or behavior of the assistant 
- user: instruction given to the LLM
- assistant: LLM responsed content, we also can provide content in few-shot promoting or histories of conversations


A simple example using [gtp-4o](https://platform.openai.com/docs/models/gpt-4o) and temperature 0.

In [4]:
messages = [{"role": "user", "content": "What is the capital of USA"}]

print(openai_help(messages))

The capital of the United States is Washington, D.C.


Add a system message asking LLM to act as a high school teacher with different temperatures.

In [11]:
messages = [
    {"role": "system", "content": "use tone as a Trump supporter"},
    {"role": "user", "content": "What are your thoughts on Kamala Harris"}
    ]

print(openai_help(messages, temperature = 0.8))

As a Trump supporter, my perspective on Kamala Harris is shaped by a commitment to conservative values and a belief in strong leadership, which I see embodied in Donald Trump. I have concerns about Harris' policy positions, particularly those that lean towards more progressive agendas, such as her stance on healthcare and climate change. I also question her track record and effectiveness in office, both as a senator and now as vice president. Many conservatives feel that her policies do not align with the principles of limited government and personal freedom. Additionally, there are concerns about her role in the current administration's approach to issues like border security and economic management. It's important for us to critically evaluate her impact on the country's direction and advocate for policies that reflect conservative values.


Add assistant messages to teach LLM what `##` is.

In [12]:
messages = [
    {"role": "user", "content": "What is 1##1"},
    {"role": "assistant", "content": "it is 11"},
    {"role": "user", "content": "What is 2##2"},
    {"role": "assistant", "content": "it is 22"},
    {"role": "user", "content": "What is 3##3"},
    ]
print(openai_help(messages))

It is 33.


## Prompt Engineering Principles 
- Use delimiters to separate different parts of a prompt to provide clear instructions and prevent prompt injections.
- Structure outputs in JSON documents or other formats to use the outputs in subsequent steps 
- Few-shot promoting: provide successful examples of a task and then ask the model to perform a similar task. 
- Chain of thought reasoning: request a series of reasoning steps in prompts to help the model achieve correct answers
- Chain of prompts: split a task into multiple prompts where each prompt can focus on a sub-task at a time and take different actions at different stages. It saves tokens, is easier to test, can involve human input, or use external tools.
- Interactive process 
  1. Try something first 
  2. Analyses the result, identify errors, and redefine the prompt 
  3. Test the prompts with different datasets 


An example using delimiters, structured output and few-shot promoting:

In [6]:
delimiter = '###'
sentence1 = 'I love cat.'
sentence2 = 'I dont like how i dont how much i love dogs.'
messages = [
    {"role": "system", "content": f"""analyze the sentiment in a sentence delimitered by {delimiter},
                                     return the result as a JSON document"""},
    {"role": "user", "content": f"{delimiter}{sentence1}{delimiter}"},
    {"role": "assistant", "content": "{sentiment:positive}"},
    {"role": "user", "content": f"{delimiter}{sentence2}{delimiter}"}
    ]

print(openai_help(messages))

{sentiment:negative}


## Analyze Twitter data

Load some Twitter data from a text file. 

In [4]:
f =open("tweet_collection.txt", "r")
tweets = f.read()
f.close()

### Summarization 
- Analyze election tweets with delimiters 
- Change the size of the summarization 
- Summarize tweets and focus on different perspectives. 

In [7]:
tweet_sample = tweets.split("\n")[:10]
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimitered by {delimiter}"""},
    {"role": "user", "content": f"{delimiter}{tweet_sample}{delimiter}"},
    ]

print(openai_help(messages))

The tweets discuss various political topics related to an upcoming election. They include a pledge to donate to Kamala Harris and other Democratic candidates, a commentary on the local versus national nature of the election, concerns about voter fraud involving Glenn Youngkin's son, and criticism of Kamala Harris's campaign strategy. Additionally, there are mentions of conspiracy theories about election rigging by Republicans, the significance of the upcoming election, and skepticism about accepting the election results if Trump loses. There is also a brief mention of an election-related press conference in Haryana.


In [8]:
tweet_sample = tweets.split("\n")[:10]
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimitered by {delimiter},
                                    limit the summary to 20 words"""},
    {"role": "user", "content": f"{delimiter}{tweet_sample}{delimiter}"},
    ]

print(openai_help(messages))

Tweets focus on election concerns, voter fraud, Kamala Harris's strategy, and conspiracy theories, highlighting political tensions.


In [9]:
tweet_sample = tweets.split("\n")[:10]
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimitered by {delimiter},
                                    focuse on how people discuss about AI,
                                    limit the summary to 50 words"""},
    {"role": "user", "content": f"{delimiter}{tweet_sample}{delimiter}"},
    ]

print(openai_help(messages))

The tweets focus on political discussions, with no direct mention of AI. Conversations revolve around election concerns, voter fraud, and political strategies, highlighting the contentious nature of the upcoming election.


### Moderation 
- Iterate each tweet and use the [moeration endpoint](https://platform.openai.com/docs/api-reference/moderations) to identify flagged tweets
- Print flagged tweets


In [12]:
def flag_help(tweet):
    response = client.moderations.create(
        model="omni-moderation-latest",
        input=tweet)

    if response.results[0].flagged:
        print('===')
        cat_dict = response.results[0].categories.to_dict()
        for cat in cat_dict.keys():
            if cat_dict.get(cat):
                print (cat)
                print (tweet)

In [13]:
for tweet in tweets.split('\n')[60:70]:
    flag_help(tweet)

===
violence
RT @ecomarxi: “There’s an election in three weeks. We might lose because we’re committing genocide.”\n\n“I know! Let’s promise to do what we…
===
harassment
@RepSwalwell Your desperation is hilarious.  Election night will even be funnier.


### Transforming
- Translating to a different language 
- Transform tones, such as formal vs. informal.  


In [16]:
tweet_sample = tweets.split("\n")[:10]

for tweet in tweet_sample:
    messages = [
        {"role": "system", "content": f"""translate the tweets delimitered by {delimiter} into Spanish"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]

    print(openai_help(messages).strip(delimiter))

"RT @MikeNellis: Dije que haría esto una vez más antes del Día de las Elecciones, así que aquí vamos...\n\nDonaré a @KamalaHarris y a los candidatos demócratas en las elecciones locales…"
¿Toda la política es local? No en esta elección https://t.co/rzmTMWE3dc https://t.co/NYCmAYPTb9
"RT @NotHoodlum: Glenn Youngkin está muy preocupado por el fraude electoral. Sin embargo, no dijo ni una palabra al respecto cuando su hijo de 17 años fue atrapado…"
"LEE AHORA: '¿Por qué no está trabajando duro?': Expertos políticos cuestionan la estrategia de Harris — La agenda tranquila de Harris genera preocupaciones entre los expertos políticos, quienes cuestionan su falta de urgencia a menos de tres semanas para las elecciones...https://t.co/Mvhejh8Ajo"
"RT @CollinRugg: NUEVO: Joy Reid de MSNBC lanza una nueva teoría de conspiración, acusa a los *republicanos* de manipular las elecciones, dice que América ha sido..."
RT @BillieJeanKing: Estamos exactamente a 3 semanas de quizás la elección más trascen

In [19]:
tweet_sample = tweets.split("\n")[:10]

for tweet in tweet_sample:
    messages = [
        {"role": "system", "content": f"""rewrite the tweets delimitered by {delimiter} in the tone of Shrek """},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]

    print(openai_help(messages).strip(delimiter))

"RT @MikeNellis: Alright, listen up, folks! I promised I'd give it another go before the big day, so here we are... I'm gonna toss some coins to @KamalaHarris and the downballot D crew. Let's get this swamp a-stirrin'!"
Ah, politics, aye? They say all politics are local, but not in this election, laddie! Check out the link if ye dare! https://t.co/rzmTMWE3dc https://t.co/NYCmAYPTb9
"RT @NotHoodlum: Glenn Youngkin's all in a tizzy 'bout voter fraud, he is. But funny thing, he kept mum when his own lad, just a wee 17-year-old, got caught up in the mix. Ain't that a twist in the tale, eh?"
"LISTEN UP, LADS AND LASSIES: 'Why Ain't She Graftin' Hard Enough?': Political Gurus Ponder Harris's Game Plan — Harris's laid-back agenda's got the experts scratchin' their noggins, wonderin' why she's takin' it easy with less than three weeks till the big Election shindig...https://t.co/Mvhejh8Ajo"
"RT @CollinRugg: NEW: MSNBC's Joy Reid be spinnin' a new yarn, claimin' *Republicans* be meddlin' with t

### Inferring
- Use step-by-step instructions with delimiters to:
  1. Identify sentiments
  2. Identify emotions
  3. Extract mentioned people's names
  3. Identify whether a tweet supports Democratic, Republican, or unknown 
  4. Extract outputs into a structured JSON document. 
- Identify topics from Tweets. 


In [20]:
tweet_sample = tweets.split("\n")[:10]

for tweet in tweet_sample:
    messages = [
        {"role": "system", "content": f"""analyze the tweet delimitered by {delimiter} in the following steps:
                                        step 1 {delimiter} identify the tweet sentiment in a single word, either positive, negative or neutral;
                                        step 2 {delimiter} identify the emotions expressed in the tweet with a single word;
                                        step 3 {delimiter} extract the mentioned peoples;
                                        step 4 {delimiter} detect whether the tweet supports Democrats or Replublicans, return the result in a single word;
                                        step 5 {delimiter} organize the result in a json document with the keys <sentiment>, <emontion>,<mentioned>, <support>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]
    print(openai_help(messages))

{
  "sentiment": "positive",
  "emotion": "supportive",
  "mentioned": ["MikeNellis", "KamalaHarris"],
  "support": "Democrats"
}
{
  "sentiment": "neutral",
  "emotion": "indifference",
  "mentioned": [],
  "support": "neutral"
}
{
  "sentiment": "negative",
  "emotion": "concern",
  "mentioned": ["Glenn Youngkin"],
  "support": "Democrats"
}
{
  "sentiment": "negative",
  "emotion": "concern",
  "mentioned": ["Harris"],
  "support": "Republicans"
}
{
  "sentiment": "negative",
  "emotion": "suspicion",
  "mentioned": ["Collin Rugg", "Joy Reid"],
  "support": "Democrats"
}
{
  "sentiment": "neutral",
  "emotion": "anticipation",
  "mentioned": ["BillieJeanKing"],
  "support": "neutral"
}
{
  "sentiment": "neutral",
  "emotion": "cynicism",
  "mentioned": ["CallForCongress"],
  "support": "neutral"
}
{
  "sentiment": "negative",
  "emotion": "distrust",
  "mentioned": ["ScottAdamsSays", "Trump"],
  "support": "Republicans"
}
{
  "sentiment": "neutral",
  "emotion": "curiosity",
  "ment

In [21]:
tweet_sample = tweets.split("\n")[:10]

messages = [
        {"role": "system", "content": f"""analyze the tweet delimitered by {delimiter} to identify 10 topics, 
                                  Do not wrap the json codes in JSON markers """},
        {"role": "user", "content": f"{delimiter}{tweet_sample}{delimiter} "}]
print(openai_help(messages))

{
  "topics": [
    "Election Day Donations",
    "Local vs National Politics",
    "Voter Fraud Concerns",
    "Kamala Harris's Campaign Strategy",
    "Election Rigging Accusations",
    "Significance of Upcoming Election",
    "Election Ploys and Strategies",
    "Election Outcome Acceptance",
    "Election Results Speculation",
    "Election Process and Counting"
  ]
}


### Expanding with multiple prompts 
- Identify which party receives majority supports
- Provide contexts in the system message
- Create a chatbot to answer users’ inquiry  


In [22]:
tweet_sample = tweets.split("\n")[:100]
analysis_result = []
from tqdm import tqdm
for tweet in tqdm(tweet_sample):
    messages = [
        {"role": "system", "content": f"""analyze the tweet delimitered by {delimiter} in the following steps:
                                        step 1 {delimiter} identify the tweet sentiment in a single word, either positive, negative or neutral;
                                        step 2 {delimiter} identify the emotions expressed in the tweet with a single word;
                                        step 3 {delimiter} extract the mentioned peoples;
                                        step 4 {delimiter} detect whether the tweet support Democra or Replublican, return the resunt in a singple word;
                                        step 5 {delimiter} organize the result in a json document with the keys <sentiment>, <emontion>,<mentioned>, <support>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]
    analysis_result.append(openai_help(messages))


100%|██████████| 100/100 [01:40<00:00,  1.01s/it]


In [23]:
print(analysis_result)



In [25]:
messages = [
        {"role": "system", "content": f"""analyze the tweet analysis reuslt delimitered by {delimiter} in the following steps:
                                        step 1 {delimiter} count the number of tweets that support Democrat and Republican;
                                        step 2 {delimiter} identify the common sentiments and emotoions to each mentioned people;
                                        step 3 {delimiter} organize the result in a json document with keys <Democrat count>, <Republican count>, <people name>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{analysis_result}{delimiter} "}]
analysis_summary = openai_help(messages)
print(analysis_summary)

{
  "Democrat count": 16,
  "Republican count": 33,
  "people name": {
    "Kamala Harris": {
      "sentiments": ["positive", "negative", "neutral"],
      "emotions": ["supportive", "concern", "admiration", "satisfaction", "frustration", "curiosity", "anticipation"]
    },
    "Donald Trump": {
      "sentiments": ["negative", "neutral"],
      "emotions": ["anger", "concern", "humiliation", "informative", "outrage"]
    },
    "MikeNellis": {
      "sentiments": ["positive"],
      "emotions": ["supportive"]
    },
    "Glenn Youngkin": {
      "sentiments": ["negative"],
      "emotions": ["concern"]
    },
    "Scott Adams": {
      "sentiments": ["negative"],
      "emotions": ["distrust"]
    },
    "Joe Rogan": {
      "sentiments": ["neutral", "negative"],
      "emotions": ["curiosity", "frustration"]
    },
    "SenJohnKennedy": {
      "sentiments": ["negative"],
      "emotions": ["anger", "betrayal"]
    },
    "Biden": {
      "sentiments": ["negative", "positive"],
    

In [26]:
from openai import OpenAI

openai_api_key  = get_secret('openai')['api_key']
client = OpenAI(api_key=openai_api_key)
model = 'gpt-4o'
temperature = 0

chat_history = [

{"role": "system", "content": f"""you are an election chabot anwser user questions based on the tweets {delimiter} to answer user questions,
                                {delimiter}{tweet_sample}{delimiter} 
                                if user mentioned a people name in the {analysis_summary} people field,report the corresponding sentiment and emotion,
                            
                            """}
]

def chatbot(prompt):

    chat_history.append({"role": "user", "content": prompt})

    response = client.chat.completions.create(
        model=model,  # Use the model you prefer
        messages=chat_history
    )

    reply = response.choices[0].message.content

    chat_history.append({"role": "assistant", "content": reply})
    
    return reply

In [None]:
while True:
    user_input = input("You: ")
    if user_input.lower() in ['exit', 'quit']:
        print("Chatbot: Goodbye!")
        break
    reply = chatbot(user_input)
    print(f"Chatbot: {reply}")

You:  hello 


Chatbot: Hello! How can I assist you today with information about elections or related topics?


You:  can you give information on this upcoming election and which party recieves more support on twitter


Chatbot: Based on the analysis of the tweets, the Republican party appears to have more mentions with 33 counts, compared to the Democratic party with 16 counts. This suggests a higher level of discussion or attention on Twitter for Republicans in the context of these tweets. However, this doesn't necessarily indicate overall support, just the level of conversation. If you have any specific questions about candidates or topics related to the election, feel free to ask!


You:  what do people say about Harris 


Chatbot: Based on the tweets, Kamala Harris is associated with a mix of sentiments and emotions. She is mentioned with positive, negative, and neutral sentiments. The emotions related to Kamala Harris in the tweets include:

- **Supportive:** Some tweets express support for Kamala Harris.
- **Concern:** There are concerns particularly about her strategy and quiet schedule leading up to the election.
- **Admiration and Satisfaction:** Some tweets show admiration for her performance and actions.
- **Frustration:** There is frustration about the lack of negative stories found about her online.
- **Curiosity:** People are curious about her actions and their impact on the election.
- **Anticipation:** There is anticipation regarding her role and influence in the election.

If you want more detailed insights or specific questions addressed about Kamala Harris, feel free to ask!


You:  What sentiment do they have towards Harris


Chatbot: The sentiment towards Kamala Harris in the tweets is mixed. She is associated with positive, negative, and neutral sentiments. This indicates that there are varying opinions about her, ranging from supportive and satisfied to concerned and frustrated. The overall sentiment reflects a divisive perception of her role and actions in the context of the election. If you have any specific questions about the content or want to know more, feel free to ask!


You:  What is the sentiment towards Joe Rogan


Chatbot: The sentiment towards Joe Rogan in the tweets is primarily neutral and negative. The emotions related to him include curiosity and frustration. This suggests that while some people are curious about the topics he discusses or the points he raises, there is also a level of frustration associated with him or his content within these tweets. If you have any specific questions or need further details, feel free to ask!


You:  how many times is insurrection mentioned in the tweets


Chatbot: The term "insurrection" is not explicitly mentioned in the tweets provided. If you have any other questions or need information on a different topic, feel free to ask!


You:  how many times is Trump mentioned in positive sentiment? Negative sentiment?


Chatbot: Donald Trump is mentioned in the tweets with negative and neutral sentiments, but not with positive sentiments. The tweets express emotions such as anger, concern, and outrage related to him. If you have more questions or need further analysis, feel free to ask!


## Reference
- *“ChatGPT Prompt Engineering for Developers - DeepLearning.AI.”* n.d. DeepLearning.AI - Learning Platform. Accessed October 17, 2024. https://learn.deeplearning.ai/courses/chatgpt-prompt-eng/lesson/1/introduction.

- *“Building Systems with the ChatGPT API - DeepLearning.AI.”* n.d. DeepLearning.AI - Learning Platform. Accessed October 17, 2024. https://learn.deeplearning.ai/courses/chatgpt-building-system/lesson/1/introduction.

- *“OpenAI Documents.”* n.d. OpenAI Platform. Accessed October 18, 2024. https://platform.openai.com.
