<a href="https://colab.research.google.com/github/elifdonmez/disaster_tweet_analysis/blob/main/Disaster_Tweet_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install Libraries

**Pandas:** It is used for data cleanup in the data frame <br>
**Tranformers:** It has BertTokenizer BertForSequenceClassification and pipeline <br>
**Torch:** Tensors and Dynamic neural networks in Python with GPU acceleration

In [1]:
pip install transformers pandas torch



In [2]:
# Import Libraries

import pandas as pd
from transformers import BertTokenizer, BertForSequenceClassification, pipeline
from datetime import datetime

In [3]:
# Load the CSV file
file_path = 'tweets/DisasterTweets.csv'
df = pd.read_csv(file_path)

Perform Tokenization

In [4]:
# Convert the Timestamp column to datetime
df['Timestamp'] = pd.to_datetime(df['Timestamp'])

# Rename the columns to match the original script
df.rename(columns={'Tweets': 'tweet', 'Timestamp': 'date'}, inplace=True)

# Remove duplicates
df = df.drop_duplicates()

# Remove retweets
df = df[~df['tweet'].str.startswith('RT')]

Define Hardcoded data

*   Current date-time
*   Current temperature

Define keywords

*   Help keywords
*   Disaster keywords
*  Cold keywords
*  Hot keywords





In [5]:
# Define current date and weather conditions
current_date = datetime(2024, 3, 4)  # Hardcoded for demonstration
current_date = current_date.replace(tzinfo=None)
current_temperature = 30  # Celsius degrees

# Keywords for classification
help_keywords = ["help", "needed", "require"]
disaster_keywords = ["drought", "flood", "wildfire", "hurricane", "disaster", "heatstroke", "earthquake",
                     "tornadoe", "aftershock", "hazard", "safe"]

# Define keywords for cold and hot weather help requests
cold_keywords = ['blanket', 'cold', 'freezing']
hot_keywords = ['drinkable water', 'heatstroke']

# **Preperation**

In [6]:
# Load BERT model and tokenizer for sentiment analysis
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)

# Sentiment analysis pipeline
sentiment_analysis = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


# **Classification**

In [7]:
# Function to classify tweet category
def classify_category(tweet):
    # Make all tweets lowercase
    tweet_lower = tweet.lower()
    # Check for help request keywords
    if any(keyword in tweet_lower for keyword in help_keywords):
        return "Help Request"

    # Check for disaster status information keywords
    if any(keyword in tweet_lower for keyword in disaster_keywords):
        return "Disaster Status Information"

    return "Irrelevant"


# Apply classification function
df['category'] = df['tweet'].apply(classify_category)

# Filter relevant categories
relevant_categories = ["Help Request", "Disaster Status Information"]
df_relevant = df[df['category'].isin(relevant_categories)]

# **Prioritization**

In [8]:
# Function to calculate priority based on time difference
def calculate_priority_for_post_date(row):
    date = row['date'].replace(tzinfo=None)
    days_since_posted = (current_date - date).days

    if days_since_posted > 5:
        return "High Priority"
    elif days_since_posted > 2:
        return "Medium Priority"
    else:
        return "Low Priority"


# Apply priority calculation
df_relevant['priority'] = df_relevant.apply(calculate_priority_for_post_date, axis=1)


# Increase the priority based on the weather information
# of already prioritized tweets based on their post dates
def increase_priority_for_wheather(row):
    date_priority = row['priority']
    tweet = row['tweet']

    # If temperature is high, increase the priority of tweets that have hot keywords
    if current_temperature > 15:
        if date_priority == "Medium Priority" and any(keyword in tweet.lower() for keyword in hot_keywords):
            return "High Priority"
        elif date_priority == "Low Priority" and any(keyword in tweet.lower() for keyword in hot_keywords):
            return "Medium Priority"
        else:
            return date_priority
    # If temperature is low, increase the priority of tweets that have cold keywords
    elif current_temperature < 15:
        if date_priority == "Medium Priority" and any(keyword in tweet.lower() for keyword in cold_keywords):
            return "High Priority"
        elif date_priority == "Low Priority" and any(keyword in tweet.lower() for keyword in cold_keywords):
            return "Medium Priority"
        else:
            return date_priority


# Apply priority calculation
df_relevant['priority'] = df_relevant.apply(increase_priority_for_wheather, axis=1)

def decrease_priority_for_wheather(row):
    priority = row['priority']
    tweet = row['tweet']

    # If temperature is high, decrease the priority of tweets that have cold keywords
    if current_temperature > 15:
        if priority == "High Priority" and any(keyword in tweet.lower() for keyword in cold_keywords):
            return "Medium Priority"
        elif priority == "Medium Priority" and any(keyword in tweet.lower() for keyword in cold_keywords):
            return "Low Priority"
        else:
            return priority
    # If temperature is low, decrease the priority of tweets that have hot keywords
    elif current_temperature < 15:
        if priority == "High Priority" and any(keyword in tweet.lower() for keyword in hot_keywords):
            return "Medium Priority"
        elif priority == "Medium Priority" and any(keyword in tweet.lower() for keyword in hot_keywords):
            return "Low Priority"
        else:
            return priority

# Apply priority calculation
df_relevant['priority'] = df_relevant.apply(decrease_priority_for_wheather, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_relevant['priority'] = df_relevant.apply(calculate_priority_for_post_date, axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_relevant['priority'] = df_relevant.apply(increase_priority_for_wheather, axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_relevant['priority'] = df_releva

# **Sentiment Analysis**

Use Bert's sentiment analysis.



In [9]:
# Function to get sentiment
def get_sentiment(tweet):
    result = sentiment_analysis(tweet)
    star_rating = int(result[0]['label'].split()[0])

    if star_rating == 1:
        return "Very Negative"
    elif star_rating == 2:
        return "Negative"
    elif star_rating == 3:
        return "Neutral"
    elif star_rating == 4:
        return "Positive"
    elif star_rating == 5:
        return "Very Positive"


# Apply sentiment analysis to relevant tweets
df_relevant['sentiment'] = df_relevant['tweet'].apply(get_sentiment)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_relevant['sentiment'] = df_relevant['tweet'].apply(get_sentiment)


# **Display Results**

In [10]:
# Select the desired columns
df_result = df_relevant[['date', 'tweet', 'category', 'priority', 'sentiment']]
df_result = df_result.merge(df[['date', 'Verified', 'Disaster']], on='date', how='left')

# Write the result to a new CSV file
output_file_path = 'tweets/DisasterTweets_Analyzed.csv'
df_result.to_csv(output_file_path, index=False)

print(f'Results have been saved to {output_file_path}')

Results have been saved to tweets/DisasterTweets_Analyzed.csv


In [11]:
# Display the cleaned, filtered, and analyzed DataFrame
print(df_relevant)

                                            Name          UserName  \
0                                 Drought Center    @DroughtCenter   
1                       Prabhakar Goud Kurmimdla  @PrabhakarGoud_K   
2                   Humanity First International          @HFI1995   
3     NCWQ Worldwide News And Disasters Explorer    @RTheExplorer1   
4                                  BestDealsEver  @MilwaukeeHotBuy   
...                                          ...               ...   
2553                                        JYHK         @JYHKeung   
2554                            Mark R. Sheridan  @DisasterLessons   
2556                           Earthquake Alerts      @QuakesToday   
2557                     Trader PhD Ag Marketing        @TraderPhD   
2558                             Giuseppe Forino     @G_leipheimer   

                          date  Verified  \
0    2024-02-29 13:30:07+00:00     False   
1    2024-02-27 05:20:43+00:00     False   
2    2024-03-03 07:03:34+00