## Text Classification

LLMs can replace other machine learning tasks for classification of texts. Unlike NLP models that require pre-labeled training data or a pre-defined vocabulary of words or n-grams (i.e., feature engineering), LLMs allow for zero-shot or few-shot learning to label text-like data.

Examples:
1) sentiment analysis of Yelp reviews
2) categorize customer support requests (refund, complaint, login issues, etc.)
3) SPAM filter
4) Hate speech or inappropriate speech detection on social media
5) Topic identification of articles

### Code walkthrough: labeling sentiment of Tweets regarding a specific event or hashtag

For this example, we are going to assume that the Tweets (or x-eets?) have already been mined from the API. The actual process of obtaining/purchacing an API key and interacting with the API for Twitter/X is beyond the scope of this example.

Below are some tweets pulled from around the time the season finale of Game of Thrones ended on May 19, 2011 that contain the hashtags #GOT or #GameOfThrones. This eight year HBO series was the cultural zeitgeist of the last decade and despite having fantastic reviews the first seven seasons, had a very controversial ending.



In [1]:
tweets = [
'''
Can’t believe #GameOfThrones is coming to an end 😭. 
This season will never take away how much love I have for this show man
''',
'''
Last #GameOfThrones episode tonight.  Nervous I’ll be disappointed. 
''',
'''
The more I ponder, the more I ADORE WITH A PASSION #LadyOlenna of House Tyrell.
This BADASSERY WILL NEVER BE SEEN AGAIN ON TV. #GameOfThronesFinale #GameofThrones
''',
'''
When you’re the only person at a GOT finale watch party 
that hasn’t seen one damn episode.  #me #GOT #sundaysareforwine
''',
'''
It wouldnt be so bad if they didnt make us wait an extra year. 
But they did and they fed us 6 episodes of TBS original programming quality poop! #GoT
''',
'''
Based on the uproar over the ending I'm glad I never watched #GameOfThrones
''',
'''
Rewatching #Gameofthrones finale. 
Danny’s speech was so awesome. So badass. 
And the unsullied with the Uruk-hai spear chant. Dope.
''',
'''
Bran=Dr Strange. Both knew what had to happen to save the world, 
but neither could interfere or it would disturb the timeline. 
They also used that knowledge to encourage certain situations to 
acquire the desired outcome. #GOT #AvengersEngame
''',
'''
“'Game of Thrones' and star Peter Dinklage are big wins for portrayal of little people.

Little people have always been stereotyped in movies and TV. "Game of Thrones," 
and the character Tyrion, is a breakthrough”
https://usatoday.com/story/life/2019/05/19/game-of-thrones-peter-dinklage-hero-little-people/3736538002/

#GameOfThrones  #RepresentationMatters
'''
]

What if we wanted to organize these tweets into different categories? For example, we can ask if the person writing the tweet is a die-hard fan of the show or someone who has never watched it. We can also ask if the person has a positive, negative, or neutral reaction to the series finale.

We'll use the [langchain](https://python.langchain.com/docs/get_started/introduction) software in Python to build our prompts. Langchain can interface with a number of different LLMs (including OpenAi, which we'll use in this example) and ways to chain together prompts to get the desired output.

You can install the langchain module and its dependent openai module using pip.

```bash
pip install langchain
pip install openai
```

You will also want to create an environment variable containing your OpenAI API token.

```bash
export OPENAI_API_KEY = YOUR-TOKEN-HERE
```

You can also do this in Python.
```python
import os
os.environ['OPENAI_API_KEY'] = 'YOUR TOKEN HERE'
```

In [2]:
## read in api key from a file and export it as an environment variable
import os
path_to_file = os.path.expanduser('~/openai-key.txt')
with open(path_to_file, 'r') as f:
    os.environ['OPENAI_API_KEY'] = f.read().strip()


##### Prompt template

We want to send a prompt to the AI that explains the task and the desired output. The exact format of the output can be a bit unpredictable, so sometimes it can be beneficial to provide the AI with an example or two to show it what we want.

Langchain provides a nice way of designing a template for prompts with one or more variables to input into the chat.

In [3]:
from langchain.prompts import PromptTemplate

template = PromptTemplate(template='''
Classify the following tweet into one of the following categories:

1. Positive 
2. Negative
3. Neutral

Return the answer as a number 1, 2, or 3.

===
Example:
Tweet: The ending of Game of Thrones was so bad. I can't believe they did that to us.

Result: 2
===

Here is the Tweet:
{tweet}
''',
input_variables=['tweet'],output_parser=None)

In [4]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(temperature=0,model_name='gpt-3.5-turbo')

In [5]:
sentiments = []
for tweet in tweets:
    prompt = template.format(tweet=tweet)
    result = llm.predict(prompt)
    sentiments.append(result)

In [6]:
print("These tweets are positive:")

for s,t, in zip(sentiments,tweets):
    if s == '1':
        print(t)
        
print("=========================================")
print("These tweets are negative:")
for s,t, in zip(sentiments,tweets):
    if s == '2':
        print(t)     

These tweets are positive:

Can’t believe #GameOfThrones is coming to an end 😭. 
This season will never take away how much love I have for this show man


The more I ponder, the more I ADORE WITH A PASSION #LadyOlenna of House Tyrell.
This BADASSERY WILL NEVER BE SEEN AGAIN ON TV. #GameOfThronesFinale #GameofThrones


Rewatching #Gameofthrones finale. 
Danny’s speech was so awesome. So badass. 
And the unsullied with the Uruk-hai spear chant. Dope.

These tweets are negative:

Last #GameOfThrones episode tonight.  Nervous I’ll be disappointed. 


It wouldnt be so bad if they didnt make us wait an extra year. 
But they did and they fed us 6 episodes of TBS original programming quality poop! #GoT


Based on the uproar over the ending I'm glad I never watched #GameOfThrones

