In [1]:
from openai import OpenAI
import os

In [2]:
model_name = 'gpt-3.5-turbo'
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])

In [3]:
import pandas as pd
df = pd.read_csv('../EarthDayBlog/earth_day_tweets.csv')
df.head()

Unnamed: 0,text,hash_tags,account_tags,emoji_lists,sentiment,emotion
0,RT @POTUS: As we celebrate the progress we’ve ...,,POTUS,,POSITIVE,optimism
1,"This #EarthDay, I'm happy to be meeting with P...","EarthDay,GetTheLeadOut",PennEnvironment,,POSITIVE,optimism
2,RT @Khan__sir_patna: All of people wishes and ...,EarthDay,Khan__sir_patna,,POSITIVE,joy
3,RT @CapsCoalition: Biden Signs Executive Order...,EarthDay,CapsCoalition,,POSITIVE,optimism
4,RT @tamannaahspeaks: Animals source their food...,MyconnectwithSoil,"tamannaahspeaks,SadhguruJV,cpsavesoil",,POSITIVE,optimism


In [15]:
df_sample = df.sample(5)

In [16]:
pd.set_option('max_colwidth', 800)
df_sample

Unnamed: 0,text,hash_tags,account_tags,emoji_lists,sentiment,emotion
36053,"RT @420iloveweed: Happy earth day, remember if you shower with a friend, you can save water....",,420iloveweed,,POSITIVE,joy
31428,"RT @kucoincom: Happy Earth Day! 🌎\n\nAs we celebrate #EarthDay, #KuCoin is joining hands with @EarthFund_io to give away 20,000 1EARTH!\n\n🌍 Follow @kucoincom @EarthFund_io \n\n🌏 RT &amp; comment your thoughts on making Earth a better place\n\n🎁 5 winners will each receive 4,000 1EARTH! https://t.co/Mao0As00pD","EarthDay,KuCoin","kucoincom,EarthFund_io,kucoincom,EarthFund_io","['🌎', '🌍', '🌏', '🎁']",POSITIVE,joy
71038,Earth Day every day! https://t.co/nyC6hlPfuT,,,,NEGATIVE,joy
4534,"Climate change is very real but so is eco-anxiety. If you're feeling stressed this #EarthDay, try our @BBCRadio3 series Into the Wild: five city-dwelling writers head into the countryside to explore their relationship with nature and nature writing. https://t.co/4r8ds5Arl0",EarthDay,BBCRadio3,,POSITIVE,optimism
48403,"RT @TRF_Climate: OPINION: #EarthDay is a wake-up call for how our use of oil, coal and gas is driving a global #climate breakdown that violates the right to a clean and healthy environment, writes @RPearshouse of @hrw | @UN_HRC #humanrights #energy https://t.co/kwU2c54HGG","EarthDay,climate,humanrights,energy","TRF_Climate,RPearshouse,hrw,UN_HRC",,NEGATIVE,sadness


# Text Classification

## Sentiment Analysis

Let's start with the most common NLP task in social media analysis.

Data Source: https://www.kaggle.com/datasets/sbhatti/financial-sentiment-analysis

In [17]:
# ZERO SHOT

prompt = f"""Classify the following texts into positive, negative or neutral sentiments from the user's point of view:
---
{list(df_sample['text'])}
---

Provide explanations as well.
"""

response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    n=1  #default
)

print(response.choices[0].message.content)

1. Positive sentiment: 
- "Happy earth day, remember if you shower with a friend, you can save water...." - This tweet expresses a positive sentiment towards Earth Day and promotes water conservation in a light-hearted manner.

2. Positive sentiment: 
- "Happy Earth Day! 🌎 As we celebrate #EarthDay, #KuCoin is joining hands with @EarthFund_io to give away 20,000 1EARTH! Follow @kucoincom @EarthFund_io RT & comment your thoughts on making Earth a better place 5 winners will each receive 4,000 1EARTH!" - This tweet also expresses a positive sentiment towards Earth Day and promotes environmental awareness and action.

3. Neutral sentiment: 
- "Earth Day every day!" - This tweet simply states the importance of Earth Day without expressing a clear positive or negative sentiment.

4. Neutral sentiment: 
- "Climate change is very real but so is eco-anxiety. If you're feeling stressed this #EarthDay, try our @BBCRadio3 series Into the Wild: five city-dwelling writers head into the countryside 

## (General) Text Classification

Used Texts from Dataset: https://www.kaggle.com/datasets/dipankarmitra/natural-language-processing-with-disaster-tweets

- Class 1: Related to Natural Disasters
- Class 0: Not Related to Natural Disasters

In [50]:
# FEW SHOT

prompt = """ Following are a few examples of texts and coresponding classes to be used for text classification:
---
Text: #RockyFire Update => California Hwy. 20 closed in both directions due to Lake County fire - #CAfire #wildfires | Class: 1 \n
Text: What a goooooooaaaaaal!!!!!! | Class: 0 \n
Text: My car is so fast | Class: 0 \n
Text: London is cool ;) | Class: 0 \n
Text: #flood #disaster Heavy rain causes flash flooding of streets in Manitou, Colorado Springs areas | Class: 1 \n
Text: I'm afraid that the tornado is coming to our area. | Class: 1 \n
---
Classify the following and include an explanation:
Text: There's an emergency evacuation happening now in the building across the street | Class:  \n
"""

In [51]:
response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    max_tokens=100,
    n=1  #default
)

print(response.choices[0].message.content)

Class: 1

Explanation: The text mentions an emergency evacuation, which indicates a potential disaster or crisis situation. This falls under the category of text related to emergencies or disasters, which would be classified as Class 1.


**Observation**

In this example I deliberately did not pass the meaning of the classes. Despite their absence, the model can correctly identify the class.

## Multiclass opinion mining

Texts taken from Multiclass Emotion Classification Dataset: https://www.kaggle.com/datasets/praveengovi/emotions-dataset-for-nlp

This is where I got the idea of my first blog: [GenAI for Better NLP Systems I: A Tool for Generating Synthetic Data](https://medium.com/towards-data-science/genai-for-better-nlp-systems-i-a-tool-for-generating-synthetic-data-4b862ef3f88a)

In [19]:
prompt = f""" Assess the Texts and indicate the emotion from the following list:
- sadness, anger, fear, joy.

---
{list(df_sample['text'])}
---

Use this format:
text | emotion

"""

response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    n=1  #default
)

print(response.choices[0].message.content)

RT @420iloveweed: Happy earth day, remember if you shower with a friend, you can save water.... | joy
RT @kucoincom: Happy Earth Day! 🌎
As we celebrate #EarthDay, #KuCoin is joining hands with @EarthFund_io to give away 20,000 1EARTH!
🌍 Follow @kucoincom @EarthFund_io 
🌏 RT & comment your thoughts on making Earth a better place
🎁 5 winners will each receive 4,000 1EARTH! https://t.co/Mao0As00pD | joy
Earth Day every day! https://t.co/nyC6hlPfuT | joy
Climate change is very real but so is eco-anxiety. If you're feeling stressed this #EarthDay, try our @BBCRadio3 series Into the Wild: five city-dwelling writers head into the countryside to explore their relationship with nature and nature writing. https://t.co/4r8ds5Arl0 | sadness
RT @TRF_Climate: OPINION: #EarthDay is a wake-up call for how our use of oil, coal and gas is driving a global #climate breakdown that violates the right to a clean and healthy environment, writes @RPearshouse of @hrw | @UN_HRC #humanrights #energy https://t.co

## Text Summarization

Using the same customer review text for intent detection.

In [23]:
list(df_sample['text'])[-1]

'RT @TRF_Climate: OPINION: #EarthDay is a wake-up call for how our use of oil, coal and gas is driving a global #climate breakdown that violates the right to a clean and healthy environment, writes @RPearshouse of @hrw | @UN_HRC #humanrights #energy https://t.co/kwU2c54HGG'

In [27]:
prompt = f"""
{list(df_sample['text'])[-1]}
Provide the key concern the above in one sentence:"""

response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    n=1  #default
)

print(response.choices[0].message.content)

The key concern is how the use of oil, coal, and gas is driving a global climate breakdown that violates the right to a clean and healthy environment.


## Topic Extraction

In [33]:
topic_list = [
    'climate change', 'eco-anxiety', 'give-away'
]

prompt = f""" List of topics: {", ".join(topic_list)}
Tweets: '''{list(df_sample['text'])}'''

Assign one or more topics to the given tweets and enlist the results per tweet.
"""

response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    n=1  #default
)

print(response.choices[0].message.content)

Tweet 1: 
- Topic: eco-anxiety

Tweet 2: 
- Topics: give-away, climate change

Tweet 3: 
- Topic: climate change

Tweet 4: 
- Topic: eco-anxiety

Tweet 5: 
- Topics: climate change, eco-anxiety


## Keyword Extraction

In [35]:
prompt = f"""Extract keywords from these tweets:
---
Tweets:
{list(df_sample['text'])}
---

Enlist the keywords per tweet.

"""

response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    n=1  #default
)

print(response.choices[0].message.content)

1. earth day, shower, save water
2. earth day, kucoin, giveaway, 1EARTH, earthfund_io, winners, better place
3. earth day
4. climate change, eco-anxiety, BBCRadio3, into the wild, nature writing
5. earth day, oil, coal, gas, climate breakdown, human rights, energy, RPearshouse, HRW, UN_HRC


## Spell-Check

In [37]:
prompt = f"""Proofread and correct the following tweets and rewrite the corrected version per tweet. 
If found an error, write " Error corrected" after the corrected sentence.
If you don't find any errors, just say "No errors found" after the sentence for which no error was found:
---
Tweets:
{list(df_sample['text'])}
---
"""


response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    n=1  #default
)

print(response.choices[0].message.content)

1. Happy Earth Day! Remember, if you shower with a friend, you can save water. Error corrected

2. Happy Earth Day! 🌎 As we celebrate #EarthDay, #KuCoin is joining hands with @EarthFund_io to give away 20,000 1EARTH! Follow @kucoincom @EarthFund_io RT & comment your thoughts on making Earth a better place. 5 winners will each receive 4,000 1EARTH! https://t.co/Mao0As00pD. Error corrected

3. Earth Day every day! https://t.co/nyC6hlPfuT. No errors found

4. Climate change is very real, but so is eco-anxiety. If you're feeling stressed this #EarthDay, try our @BBCRadio3 series Into the Wild: five city-dwelling writers head into the countryside to explore their relationship with nature and nature writing. https://t.co/4r8ds5Arl0. No errors found

5. OPINION: #EarthDay is a wake-up call for how our use of oil, coal, and gas is driving a global #climate breakdown that violates the right to a clean and healthy environment, writes @RPearshouse of @hrw | @UN_HRC #humanrights #energy https://t.

In [38]:
prompt = f"""Proofread and rewrite the corrected version per tweet without hastags, urls and account tags.
If found an error, write " Error corrected" after the corrected sentence.
If you don't find any errors, just say "No errors found" after the sentence for which no error was found:
---
Tweets:
{list(df_sample['text'])}
---
"""


response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    n=1  #default
)

print(response.choices[0].message.content)

Happy Earth Day, remember if you shower with a friend, you can save water.
Error corrected

Happy Earth Day! As we celebrate Earth Day, KuCoin is joining hands with EarthFund_io to give away 20,000 1EARTH! Follow KuCoin EarthFund_io RT & comment your thoughts on making Earth a better place 5 winners will each receive 4,000 1EARTH!

Earth Day every day!
Error corrected

Climate change is very real but so is eco-anxiety. If you're feeling stressed this Earth Day, try our BBCRadio3 series Into the Wild: five city-dwelling writers head into the countryside to explore their relationship with nature and nature writing.
Error corrected

OPINION: Earth Day is a wake-up call for how our use of oil, coal and gas is driving a global climate breakdown that violates the right to a clean and healthy environment, writes RPearshouse of HRW UN_HRC humanrights energy
Error corrected


## Multilingual Applications

Data Source: https://www.kaggle.com/datasets/skylord/dutch-tweets/data

### Language Detection

In [46]:
prompt = f"""
What language this is: 
```
1. De droom van D66 wordt werkelijkheid: COVID-19 superdodelijk voor ouderen
2. Mag hier licht op gegrinnikt worden? Of is dat niet toegestaan?
```
"""

response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    n=1  #default
)

print(response.choices[0].message.content)

This appears to be Dutch.


### Translation

In [47]:
prompt = f"""
Translate the following into English:
```
1. De droom van D66 wordt werkelijkheid: COVID-19 superdodelijk voor ouderen
2. Mag hier licht op gegrinnikt worden? Of is dat niet toegestaan?
```
"""

response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    n=1  #default
)

print(response.choices[0].message.content)

1. The dream of D66 is coming true: COVID-19 super deadly for the elderly
2. Can we chuckle about this? Or is that not allowed?


### Transliteration

Dataset: https://www.kaggle.com/datasets/parthplc/hindi-to-hinglish

In [49]:
# This text is in Bengali but writen in using latin letters using phonetical similarity
# Massive application of this scenario in Social Media Analysis

prompt = f"""
Identify the language of the following texts and translate them to English:
1. tini niyonsign o signboard toirir byabsa korten.
2. Chobi mukti kom howar onyotomo karon chilo, nirbhorjogyo shilpir shonkote chilen projojokera.
3. beriliyam oxaid ek akaarbanik yaugik hai.
4. durdarshi kala - ernst fuchs, pole laifole, michael boven
5. prashikshan, pralekhan, prakashan evam antarashtriya sampark ityadi.

Use the following format:
text | language | translated text
"""

response = client.chat.completions.create(
    model = model_name,
    messages=[{
        "role": "user",
        "content": prompt
    }],
    temperature=0, # Deterministic Response
    n=1  #default
)

print(response.choices[0].message.content)

1. tini niyonsign o signboard toirir byabsa korten. | Bengali | They are doing business by making neon signs and signboards.
2. Chobi mukti kom howar onyotomo karon chilo, nirbhorjogyo shilpir shonkote chilen projojokera. | Bengali | The reason for the decline in the freedom of expression through images was the fear of artists in the totalitarian regime.
3. beriliyam oxaid ek akaarbanik yaugik hai. | Hindi | Beryllium oxide is a ceramic material.
4. durdarshi kala - ernst fuchs, pole laifole, michael boven | Hindi | Visionary art - Ernst Fuchs, Paul Laffoley, Michael Boven
5. prashikshan, pralekhan, prakashan evam antarashtriya sampark ityadi. | Hindi | Training, writing, publishing, and international communication, etc.
