# Prompt Engineering: Use OpenAI to Analyze Twitter Data 
This is a simple tutorial teaching prompt engineering basics and analyzing Twitter data with OpenAI large language models (LLM).
Please purchase an [OpenAI API](https://openai.com/index/openai-api/) and store it in a safe place. This tutorial uses [AWS Secretes Manager](https://aws.amazon.com/secrets-manager/) to store the API keys.  

## Large Language Model Basics
LLM repeatable predicts the next world using supervised learning. To predict the following sentence: 

`Learning data science in the cloud with AI`

A model needs to learn to predict the following steps:

|Input|Output|
|:---|---|
|Learning data science |in |
|Learning data science in |the | 
|Learning data science in the |cloud |
|Learning data science in the cloud |with |
|Learning data science in the cloud with |AI|

To train an LLM model:
1. Training a base LLM model on a large amount of training data to predict the next word 
2. Fine-tune on examples where outputs follow instructions in the input 
3. Human rates quality of different LLM outputs 
4. Tune LLM to generate outputs with higher rates using RLHF (Reinforcement learning from human feedback)

## Set up OpenAI Models

Load the API keys with AWS Secrets Manage Function 

In [1]:
import boto3
from botocore.exceptions import ClientError
import json

def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        raise e

    secret = get_secret_value_response['SecretString']
    
    return json.loads(secret)

## Install Python libraries.

- pymongo: manage the MongoDB database
- openai: call the OpenAI APIs.

In [2]:
pip install openai

Collecting openai
  Downloading openai-1.70.0-py3-none-any.whl.metadata (25 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting jiter<1,>=0.4.0 (from openai)
  Downloading jiter-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (5.2 kB)
Downloading openai-1.70.0-py3-none-any.whl (599 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m599.1/599.1 kB[0m [31m36.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading distro-1.9.0-py3-none-any.whl (20 kB)
Downloading jiter-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (352 kB)
Installing collected packages: jiter, distro, openai
Successfully installed distro-1.9.0 jiter-0.9.0 openai-1.70.0
Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install pymongo

Collecting pymongo
  Downloading pymongo-4.11.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (22 kB)
Collecting dnspython<3.0.0,>=1.16.0 (from pymongo)
  Downloading dnspython-2.7.0-py3-none-any.whl.metadata (5.8 kB)
Downloading pymongo-4.11.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m35.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dnspython-2.7.0-py3-none-any.whl (313 kB)
Installing collected packages: dnspython, pymongo
Successfully installed dnspython-2.7.0 pymongo-4.11.3
Note: you may need to restart the kernel to use updated packages.


Load the OpenAI API key and define a `openai_help` function.

In [4]:
from openai import OpenAI

openai_api_key  = get_secret('openai')['api_key']
client = OpenAI(api_key=openai_api_key)
model = 'gpt-4o'
temperature = 0

def openai_help(messages, model=model, temperature =temperature ):
    messages = messages
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=temperature

    )
    return response.choices[0].message.content

Temperature: 
- Low temperature: always choose the most likely response, reliable, predictable responses  
- High temperature: diverse responses, more creative responses

Tokens and Models: 
- LLM predicts tokens, which are commonly occurring sequences of characters. 
- One token is about four characters in English, and 100 tokens are roughly 75 words. Check [token estimate](https://platform.openai.com/tokenizer).
- Different models can process various amounts of tokens at different performance levels and costs. Check [OpenAI models](https://platform.openai.com/docs/models) for more details.

Roles:
- system: specify the overall tone or behavior of the assistant 
- user: instruction given to the LLM
- assistant: LLM responded content, we also can provide content in few-shot promoting or histories of conversations


A simple example using [gtp-4o](https://platform.openai.com/docs/models/gpt-4o) and temperature 0.

In [5]:
messages = [{"role": "user", "content": "What is the capital of USA"}]

print(openai_help(messages))

The capital of the United States is Washington, D.C.


Add a system message asking LLM to act as a high school teacher with different temperatures.

In [6]:
messages = [
    {"role": "system", "content": "use tone as a high school teacher"},
    {"role": "user", "content": "What is the capital of USA"}
    ]

print(openai_help(messages, temperature = 0.8))

The capital of the United States is Washington, D.C. Remember, D.C. stands for "District of Columbia." It's not part of any state and serves as the center of the U.S. federal government. If you have any more questions or need further explanation, feel free to ask!


Add assistant messages to teach LLM what `##` is.

In [7]:
messages = [
    {"role": "user", "content": "What is 1##1"},
    {"role": "assistant", "content": "it is 11"},
    {"role": "user", "content": "What is 2##2"},
    {"role": "assistant", "content": "it is 22"},
    {"role": "user", "content": "What is 3##3"},
    ]
print(openai_help(messages))

It is 33.


## Prompt Engineering Principles 
- Use delimiters to separate different parts of a prompt to provide clear instructions and prevent prompt injections.
- Structure outputs in JSON documents or other formats to use the outputs in subsequent steps 
- Few-shot promoting: provide successful examples of a task and then ask the model to perform a similar task. 
- Chain of thought reasoning: request a series of reasoning steps in prompts to help the model achieve correct answers
- Chain of prompts: split a task into multiple prompts where each prompt can focus on a sub-task at a time and take different actions at different stages. It saves tokens, is easier to test, can involve human input, or use external tools.
- Interactive process 
  1. Try something first 
  2. Analyses the result, identify errors, and redefine the prompt 
  3. Test the prompts with different datasets 


An example using delimiters, structured output and few-shot promoting:

In [9]:
delimiter = '###'
sentence1 = 'I love cat.'
sentence2 = 'I love dog.'
messages = [
    {"role": "system", "content": f"""analyze the sentiment in a sentence delimitered by {delimiter},
                                     return the result as a JSON document"""},
    {"role": "user", "content": f"{delimiter}{sentence1}{delimiter}"},
    {"role": "assistant", "content": "{sentiment:positive}"},
    {"role": "user", "content": f"{delimiter}{sentence2}{delimiter}"}
    ]

print(openai_help(messages))

{ "sentiment": "positive" }


## Analyze Twitter data

### Connect to the MongoDB cluster

In [10]:
import pymongo
from pymongo import MongoClient
mongodb_connect = get_secret('mongodb')['connection_string']

mongo_client = MongoClient(mongodb_connect)
db = mongo_client.demo # use or create a database named demo
tweet_collection = db.tweet_collection #use or create a collection named tweet_collection
tweet_collection.create_index([("tweet.id", pymongo.ASCENDING)],unique = True) # make sure the collected tweets are unique

'tweet.id_1'

### Extract Tweets

In [11]:
filter={

    
}
project={
    'tweet.text': 1, 
    'tweet.id': 1
}
#rename the client to mongo_client
result = mongo_client['demo']['tweet_collection'].find(
  filter=filter,
  projection=project
)

In [12]:
tweet_data = []
for tweet in result:
    tweet_data.append(tweet['tweet']['text'])
print (tweet_data)

['Hawardǝ: Nyiga wuzǝna\n\n⎐ٺون⎐\n⊵SAAS⊴\n\n⎐اىؤا⎐\n⊵ZZ900⊴\nJmu', 'Ya Kǝma ilmunǝmbe\n\n⎐ٺون⎐\n⊵SAAS⊴\n\n⎐ماماز▬اند▬باباز⎐\n⊵DF60⊴\n\n⎐شي▬ان⎐\n⊵artm15⊴\n\n⎐مفارش▬الحبىب⎐\n⊵C51⊴\nJMu', 'RT @JMULacrosse: We are 13th in the latest @IWLCA poll.\n\n📰 https://t.co/4upQqFoRsb\n\n#GoDukes https://t.co/bRnBRnSSZk', 'RT @jajuppe: I had a great time at JMU yesterday! Thanks coaches I can’t wait to come back!\n@coachdc34 @CoachBobChesney @JMUFootball \n@Coac…', 'RT @CaydenParker07: I had a great visit to JMU! Thank you to @Coach_DiMike for having me out!\n@CoachSamDaniels @JMUFBRecruiting @JMUFootbal…', 'تشايونق وحبيبها انفصلوا😭😭😭😭😭😭😭😭😭😭😭😭😭\n\nhttps://t.co/SYUmzQkuY9', "WATCH: Was at media availability for @RaginCajunsBSB fresh off of its first @SunBelt series sweep over JMU! Here's what HC Matt Deggs had to say in part of his opening statement, recapping last week's games\n\n#GeauxCajuns \n\nhttps://t.co/vgZQb4ZDad", 'RT @FastLaneEdLane: 4 Staples + 1 mystery guest = fun today in The Fast Lane. 

In [13]:
print('Number of tweets: ',len(tweet_data))

Number of tweets:  69


### Summarization 
- Analyze election tweets with delimiters 
- Change the size of the summarization 
- Summarize tweets and focus on different perspectives. 

In [14]:
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimited by {delimiter}"""},
    {"role": "user", "content": f"{delimiter}{tweet_data}{delimiter}"},
    ]

print(openai_help(messages))

The tweets primarily focus on various topics related to James Madison University (JMU). There are mentions of JMU's sports achievements, including their lacrosse team's ranking and a series sweep in baseball. Several tweets discuss visits and recruitment activities involving JMU's football team. Additionally, there are multiple mentions of individuals entering the transfer portal, including Paul Lewis, the brother of a former JMU player. Some tweets also reference JMU's involvement in shipbuilding and naval activities in Japan. Lastly, there are tweets about JMU police spreading festive cheer during Eid and cracking down on illegal mining.


In [15]:
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimited by {delimiter},
                                    limit the summary to 20 words"""},
    {"role": "user", "content": f"{delimiter}{tweet_data}{delimiter}"},
    ]

print(openai_help(messages))

JMU sports updates, player transfers, and visits dominate tweets, alongside Japanese maritime activities and police community engagement.


In [16]:
messages = [
    {"role": "system", "content": f"""provide a brief summary of the tweets delimited by {delimiter},
                                    focus on how people discuss AI,
                                    limit the summary to 50 words"""},
    {"role": "user", "content": f"{delimiter}{tweet_data}{delimiter}"},
    ]

print(openai_help(messages))

The tweets primarily focus on sports events and activities related to JMU, with no significant discussion on AI.


### Moderation 
- Iterate each tweet and use the [moeration endpoint](https://platform.openai.com/docs/api-reference/moderations) to identify flagged tweets
- Print flagged tweets


In [17]:
def flag_help(tweet):
    response = client.moderations.create(
        model="omni-moderation-latest",
        input=tweet)

    if response.results[0].flagged:
        print('===')
        cat_dict = response.results[0].categories.to_dict()
        for cat in cat_dict.keys():
            if cat_dict.get(cat):
                print (cat)
                print(tweet)

In [18]:
for tweet in tweet_data:
    flag_help(tweet)

===
illicit/violent
@N_Cocky @BBG17_JDSSURUGA 神戸を核攻撃されたら詰んでしまいますから、長崎でも作れるようにするか、技本の技術をJMUにも移転して建造拠点の分散化も必要だとは思います。
violence
@N_Cocky @BBG17_JDSSURUGA 神戸を核攻撃されたら詰んでしまいますから、長崎でも作れるようにするか、技本の技術をJMUにも移転して建造拠点の分散化も必要だとは思います。
===
violence
@mayan1969 @BBG17_JDSSURUGA ここは榛名や大鳳も建造したはずですから、水上艦に戻ってきてほしいですね。三菱重工とJMUの二極化ではちょっと…今治造船が輸送艦に入って来つつあるようですが。神戸を核攻撃されたら日本の潜水艦は詰んでしまいます。
三菱重工も香焼工場を縮小するようですし…


### Transforming
- Translating to a different language 
- Transform tones, such as formal vs. informal.  


In [19]:
for tweet in tweet_data:
    messages = [
        {"role": "system", "content": f"""translate the tweets delimited by {delimiter} into Chinese"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]

    print(openai_help(messages).strip(delimiter))

Hawardǝ: Nyiga wuzǝna

⎐ٺون⎐
⊵SAAS⊴

⎐اىؤا⎐
⊵ZZ900⊴
Jmu
抱歉，我无法将这条推文翻译成中文。
RT @JMULacrosse: 我们在最新的@IWLCA排名中位列第13。

📰 https://t.co/4upQqFoRsb

#GoDukes https://t.co/bRnBRnSSZk
RT @jajuppe: 昨天在JMU度过了愉快的时光！感谢教练们，我迫不及待想要再回来！@coachdc34 @CoachBobChesney @JMUFootball @Coac…
RT @CaydenParker07: 我在JMU的访问非常愉快！感谢@Coach_DiMike邀请我来！@CoachSamDaniels @JMUFBRecruiting @JMUFootbal…
蔡妍和她的男朋友分手了😭😭😭😭😭😭😭😭😭😭😭😭😭

https://t.co/SYUmzQkuY9
观看：在媒体见面会上，@RaginCajunsBSB刚刚完成了对JMU的首次@SunBelt系列赛横扫！以下是主教练Matt Deggs在开场声明中对上周比赛的部分回顾。

#GeauxCajuns

https://t.co/vgZQb4ZDad
RT @FastLaneEdLane: 4位常客加1位神秘嘉宾=今天在The Fast Lane的乐趣。#NASCAR @MartinsvilleSwy + #JMU (@Shane_DNRSports), #UVA (@Je…
RT @JohnathanMile55: 在JMU度过了美好的青少年日 @CoachhBarnes @CoachSamDaniels @JMUFootball @StPaulsFB 
@NooffseasonMD https://t.co/v5yX…
抱歉，我无法翻译这些内容。
4个常驻嘉宾加1位神秘嘉宾=今天在The Fast Lane的乐趣。#NASCAR @MartinsvilleSwy + #JMU (@Shane_DNRSports), #UVA (@JerryRatcliffe), #VirginiaTech #Hokies @therealdcunna, @TechSideline) 和 #Liberty (@JCManson, @ASeaofRed)。留下信息。

In [20]:
for tweet in tweet_data:
    messages = [
        {"role": "system", "content": f"""rewrite the tweets delimited by {delimiter} in the tone like Stewie """},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]

    print(openai_help(messages).strip(delimiter))

Ah, Hawardǝ, my dear fellow, what on earth are you babbling about? Is this some sort of cryptic code or a secret message meant for the likes of James Bond? Do enlighten me, for I am simply dying to know.
Ah, the delightful gibberish of the digital age. How utterly quaint. One can only imagine the profound insights hidden within such cryptic symbols. Do carry on, dear simpletons. JMu
Oh, splendid! The JMU Lacrosse team has ascended to the 13th position in the latest IWLCA poll. How delightfully impressive. #GoDukes
Ah, yes, a splendid time was had at JMU, wasn't it? My gratitude to the coaches for their hospitality. I eagerly anticipate my return! @coachdc34 @CoachBobChesney @JMUFootball @Coac…
Oh, splendid! I had a positively delightful visit to JMU, didn't I? A most gracious thank you to @Coach_DiMike for the invitation. And of course, a nod to @CoachSamDaniels, @JMUFBRecruiting, and @JMUFootball for their hospitality.
Oh, the melodrama! Chaeyoung and her paramour have parted ways. Cu

### Inferring
- Use step-by-step instructions with delimiters to:
  1. Identify sentiments
  2. Identify emotions
  3. Extract mentioned people's names
  3. Identify whether a tweet supports Democratic, Republican, or unknown 
  4. Extract outputs into a structured JSON document. 
- Identify topics from Tweets. 


In [21]:
for tweet in tweet_data:
    messages = [
        {"role": "system", "content": f"""analyze the tweet delimited by {delimiter} in the following steps:
                                        step 1 {delimiter} identify the tweet sentiment in a single word, either positive, negative or neutral;
                                        step 2 {delimiter} identify the emotions expressed in the tweet with a single word;
                                        step 3 {delimiter} extract the mentioned peoples;
                                        step 4 {delimiter} detect whether the tweet support Democratic or Replublican, return the resunt in a single word;
                                        step 5 {delimiter} organize the result in a json document with the keys <sentiment>, <emontion>,<mentioned>, <support>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]
    print(openai_help(messages))

{
  "sentiment": "neutral",
  "emotion": "none",
  "mentioned": [],
  "support": "none"
}
{
  "sentiment": "neutral",
  "emotion": "none",
  "mentioned": [],
  "support": "none"
}
{
  "sentiment": "neutral",
  "emotion": "pride",
  "mentioned": ["JMULacrosse"],
  "support": "neutral"
}
{
  "sentiment": "positive",
  "emotion": "excitement",
  "mentioned": [
    "coachdc34",
    "CoachBobChesney",
    "JMUFootball"
  ],
  "support": "neutral"
}
{
  "sentiment": "positive",
  "emotion": "gratitude",
  "mentioned": [
    "CaydenParker07",
    "Coach_DiMike",
    "CoachSamDaniels",
    "JMUFBRecruiting",
    "JMUFootbal"
  ],
  "support": "neutral"
}
{
  "sentiment": "negative",
  "emotion": "sadness",
  "mentioned": [],
  "support": "neutral"
}
{
  "sentiment": "positive",
  "emotion": "excitement",
  "mentioned": [
    "@RaginCajunsBSB",
    "@SunBelt"
  ],
  "support": "neutral"
}
{
  "sentiment": "positive",
  "emotion": "fun",
  "mentioned": [
    "FastLaneEdLane",
    "MartinsvilleSw

In [22]:

messages = [
        {"role": "system", "content": f"""analyze the tweet delimited by {delimiter} to identify 10 topics, 
                                  Do not wrap the json codes in JSON markers """},
        {"role": "user", "content": f"{delimiter}{tweet_data}{delimiter} "}]
print(openai_help(messages))

{
  "topics": [
    "JMU sports and athletics",
    "JMU student and alumni experiences",
    "JMU transfer portal and player movements",
    "JMU and Sun Belt Conference sports results",
    "JMU and Japanese Maritime Self-Defense Force (JMSDF) shipbuilding",
    "JMU and cultural events or celebrations",
    "JMU and media coverage",
    "JMU and academic or research achievements",
    "JMU and community engagement or outreach",
    "JMU and social media interactions"
  ]
}


### Expanding with multiple prompts 
- Identify which party receives majority supports
- Provide contexts in the system message
- Create a chatbot to answer users’ inquiry  


In [23]:
analysis_result = []
from tqdm import tqdm
for tweet in tqdm(tweet_data):
    messages = [
        {"role": "system", "content": f"""analyze the tweet delimited by {delimiter} in the following steps:
                                        step 1 {delimiter} identify the tweet sentiment in a single word, either positive, negative or neutral;
                                        step 2 {delimiter} identify the emotions expressed in the tweet with a single word;
                                        step 3 {delimiter} extract the mentioned peoples;
                                        step 4 {delimiter} detect whether the tweet support Democratic or Replublican, return the resunt in a singple word;
                                        step 5 {delimiter} organize the result in a json document with the keys <sentiment>, <emontion>,<mentioned>, <support>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{tweet}{delimiter} "}]
    analysis_result.append(openai_help(messages))


100%|██████████| 69/69 [01:09<00:00,  1.01s/it]


In [24]:
print(analysis_result)

['{\n  "sentiment": "neutral",\n  "emotion": "none",\n  "mentioned": [],\n  "support": "none"\n}', '{\n  "sentiment": "neutral",\n  "emotion": "none",\n  "mentioned": [],\n  "support": "none"\n}', '{\n  "sentiment": "neutral",\n  "emotion": "pride",\n  "mentioned": [\n    "JMULacrosse",\n    "IWLCA"\n  ],\n  "support": ""\n}', '{\n  "sentiment": "positive",\n  "emotion": "excitement",\n  "mentioned": [\n    "jajuppe",\n    "coachdc34",\n    "CoachBobChesney",\n    "JMUFootball"\n  ],\n  "support": "neutral"\n}', '{\n  "sentiment": "positive",\n  "emotion": "gratitude",\n  "mentioned": [\n    "CaydenParker07",\n    "Coach_DiMike",\n    "CoachSamDaniels",\n    "JMUFBRecruiting",\n    "JMUFootbal"\n  ],\n  "support": "neutral"\n}', '{\n  "sentiment": "negative",\n  "emotion": "sadness",\n  "mentioned": [],\n  "support": "neutral"\n}', '{\n  "sentiment": "positive",\n  "emotion": "excitement",\n  "mentioned": [\n    "@RaginCajunsBSB",\n    "@SunBelt"\n  ],\n  "support": "neutral"\n}', '{\n

In [25]:
messages = [
        {"role": "system", "content": f"""analyze the tweet analysis reuslt delimited by {delimiter} in the following steps:
                                        step 1 {delimiter} count the number of tweets that support Democratic and Republican;
                                        step 2 {delimiter} identify the common sentiments and emotoions to each mentioned people;
                                        step 3 {delimiter} organize the result in a json document with keys <Democratic count>, <Republican count>, <people name>
                                         Do not wrap the json codes in JSON markers and only return the json document"""},
        {"role": "user", "content": f"{delimiter}{analysis_result}{delimiter} "}]
analysis_summary = openai_help(messages)
print(analysis_summary)

{
  "Democratic count": 0,
  "Republican count": 0,
  "people name": []
}


## Create a chatbot

In [26]:
from openai import OpenAI

openai_api_key  = get_secret('openai')['api_key']
client = OpenAI(api_key=openai_api_key)
model = 'gpt-4o'
temperature = 0

chat_history = [

{"role": "system", "content": f"""you are a chabot answer user questions based on the tweets,
                                {delimiter}{tweet_data}{delimiter}, 
                                if user mentioned a people name in the {delimiter}{analysis_summary}{delimiter} people field,report the corresponding sentiment and emotion,
                            
                            """}
]

def chatbot(prompt):

    chat_history.append({"role": "user", "content": prompt})

    response = client.chat.completions.create(
        model=model,  # Use the model you prefer
        messages=chat_history
    )

    reply = response.choices[0].message.content

    chat_history.append({"role": "assistant", "content": reply})
    
    return reply

In [None]:
while True:
    user_input = input("You: ")
    if user_input.lower() in ['exit', 'quit']:
        print("Chatbot: Goodbye!")
        break
    reply = chatbot(user_input)
    print(f"Chatbot: {reply}")

## Reference
- Isa Fulford and Andrew Ng. n.d.-a. *“Building Systems with the ChatGPT API.”* DeepLearning.AI. Accessed October 25, 2024. https://www.deeplearning.ai/short-courses/building-systems-with-chatgpt/.
- ———. n.d.-b. *“ChatGPT Prompt Engineering for Developers.”* DeepLearning.AI. Accessed October 25, 2024. https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/.
- OpenAI. n.d. *“OpenAI Documents.”* OpenAI. Accessed October 18, 2024. https://platform.openai.com.
