# Label Trump Tweets

Ingest a CSV of Trump's tweets, downloaded from Junkipedia.org, and use Ollama to identify which posts are text-based vs image/media-based.

After completing this basic exercise, you can expand the analysis to perform other classification tasks, such as:

* Assign sentiment (positive, negative, neutral)
* Assign one or more topics (free-form)
* Assign one of a set of pre-set categories (e.g. politics, business, etc.)
* Perform entity extraction

## Ways to experiment

Experiment with different prompts to see how well Ollama can classify the tweets.

- One big prompt that tries to do everything
- Multiple smaller prompts that do one thing each

As you perform the above work, store the results in a way that lets you link the LLM results with the original data. You can do this using a simple dictionary (demonstrated below), or by adding new columns (e.g. `text_based`, `sentiment`, `topics`, `category`). 

You may need multiple versions of each column type to capture the different results from different prompts. For example, you might have `sentiment_v1`, `sentiment_v2`, etc.

## Prompt Templates

You should be using chat [Prompt Templates](https://python.langchain.com/docs/concepts/prompt_templates/) as part of your workflow, as that will allow you to easily inject the text of posts into the prompts, for example in the context of a `for` loop.

*See LangChain's [Prompt Templates guide](https://python.langchain.com/docs/how_to/#prompt-templates) for more details.*

## Structured output

As your prompts get more sophisticated and the resulting output more complex, you may also want to use LangChain's [structured output](https://python.langchain.com/docs/how_to/structured_output/) feature to parse the results into a structured format.


## Ingest the data

We'll start with some basic imports and then load the CSV file containing Trump's tweets.


In [15]:
import pandas as pd
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama import OllamaLLM

In [16]:
df = pd.read_csv("trump_tweets_sample.csv")

In [17]:
df.head()

Unnamed: 0,PostId,PostUrl,PostEngagement,Platform,ChannelID,ChannelName,ChannelUid,ChannelUrl,ChannelEngagement,post_body_text,...,published_at,post_data,post_media_urls,LikesCount,SharesCount,CommentsCount,ViewsCount,post_media_file,embedded_post_text,search_data
0,517096897,https://twitter.com/realDonaldTrump/status/192...,,Twitter,14001198,Donald J. Trump,blank_for_now,blank_for_now,"{""follower_count"":104924160,""following_count"":...",🇦🇪🇺🇸 https://t.co/7fRFm4zCL5,...,2025-05-17T17:31:12.000Z,post data removed,https://www.junkipedia.org/rails/active_storag...,160509,22170,9539,27827090,,,
1,517096863,https://twitter.com/realDonaldTrump/status/192...,,Twitter,14001198,Donald J. Trump,blank_for_now,blank_for_now,"{""follower_count"":104926630,""following_count"":...",🇶🇦🇺🇸 https://t.co/v1NwTQPWLO,...,2025-05-17T17:29:44.000Z,post data removed,https://www.junkipedia.org/rails/active_storag...,271165,37855,10462,39772942,,,
2,517096839,https://twitter.com/realDonaldTrump/status/192...,,Twitter,14001198,Donald J. Trump,blank_for_now,blank_for_now,"{""follower_count"":104926227,""following_count"":...",🇸🇦🇺🇸 https://t.co/i5cRnVmaFv,...,2025-05-17T17:27:58.000Z,post data removed,https://www.junkipedia.org/rails/active_storag...,289191,41948,9824,37475355,,,
3,516224256,https://twitter.com/realDonaldTrump/status/192...,,Twitter,14001198,Donald J. Trump,blank_for_now,blank_for_now,"{""follower_count"":104926494,""following_count"":...",THE SUPREME COURT IS BEING PLAYED BY THE RADIC...,...,2025-05-16T17:37:33.000Z,post data removed,,333190,52517,30389,47870948,,,
4,516211192,https://twitter.com/realDonaldTrump/status/192...,,Twitter,14001198,Donald J. Trump,blank_for_now,blank_for_now,"{""follower_count"":104924513,""following_count"":...","Republicans MUST UNITE behind, “THE ONE, BIG B...",...,2025-05-16T17:16:46.000Z,post data removed,,238192,42204,17507,33365066,,,


## Classify tweets by type

Now we'll demonstrate how to classify the tweets as text-based or image/media-based using Ollama's LLM.

Prepare a `PromptTemplate` for classifying tweets as text-based or media-based. This template to be used to generate prompts for the LLM, and allows you to easily inject the text of each tweet into the prompt.

```python

In [18]:
prompt = ChatPromptTemplate.from_messages([
    ("system", (
    "You are a helpful assistant that classifies tweets. For each tweet, determine if it is:"
      "- Text-only - i.e. there is only text and no images or videos\n"
      "- Multimedia-only - i.e. there is no text but at least one image or video\n"
      "- Text and Multimedia - i.e. there is both text and at least one image or video\n"
      "\n"
      "Return a single word classification for each tweet: 'text', 'multimedia', or 'both'."),
      ),
    ("user", "{tweet_text}"),
])

chat = OllamaLLM(model="gemma3")

Test a small subset of tweets to verify that the prompts are working as expected before running the full dataset.

In [19]:
data = {}
counter = 0
for index, row in df.iterrows():
    if counter == 5:
        break
    post_id = row['PostId']
    tweet_text = row['post_body_text']
    compiled_prompt = prompt.format(tweet_text=tweet_text)
    response = chat.invoke(compiled_prompt)
    # Process the response and update the DataFrame
    data[post_id] = {
        'post_id': post_id,
        'post_body_text': tweet_text,
        'classification': response.strip()
    }
    counter += 1

In [14]:
from pprint import pprint
pprint(data)

{516211192: {'classification': 'text',
             'post_body_text': 'Republicans MUST UNITE behind, “THE ONE, BIG '
                               'BEAUTIFUL BILL!” Not only does it cut Taxes '
                               'for ALL Americans, but it will kick millions '
                               'of Illegal Aliens off of Medicaid to PROTECT '
                               'it for those who are the ones in real need. '
                               'The Country will suffer greatly without this '
                               'Legislation, with their Taxes going up 65%. It '
                               'will be blamed on the Democrats, but that '
                               'doesn’t help our Voters. We don’t need '
                               '“GRANDSTANDERS” in the Republican Party. STOP '
                               'TALKING, AND GET IT DONE! It is time to fix '
                               'the MESS that Biden and the Democrats gave us. '
                    