# Overview

## Purpose

The Global Football Marketing Team has requested that we provide insights into the sentiment of consumer responses to football kit releases from NB and competitor teams. They requested two sentiment metrics: positive/negative/neutral ratings for the kit release on all platforms (Facebook, Instagram, and Twitter) and positive/negative/neutral ratings on only comments referring to the launch format.

## Limitations

Sprinklr, our social media tool, has the following limitations: 
1. Does not collect Instagram comments
2. Incomplete Facebook comments (missing at random)
3. You must use specific keywords that will appear in all responses, including replies to a post. Therefore, we cannot get all replies and comments for a particular topic; we only need posts containing those keywords.
4. We found the built-in sentiment analysis tool they used by comparative testing and found it far weaker than we would have liked. This model also needs help with multilingual inputs, as we require seven languages.

## Solutions

We complete the sentiment analysis and overcome limitations using the following steps:
1. We manually scrape all Instagram comments from announcement posts
2. We use a web scraper to pull all replies to Facebook posts found by Sprinklr
3. We perform two pulls for Twitter posts to ensure we get all replies. Pull 1 gets all posts, and pull 2 uses those post IDs to search for conversations related to those posts
4. We use a pre-trained multilingual transformer sentiment analysis model. We translate text when necessary for a holistic view of reviews

# Overall Sentiment Analysis

### Import modules

In [None]:
import pandas as pd
import warnings
from tqdm import tqdm
import yaml
from helpers import *

# display all pandas columns
pd.set_option("display.max_columns", None)

# Ignore warnings
warnings.filterwarnings("ignore")

# progress apply
tqdm.pandas()

In [None]:
# Read form yaml
with open("config.yaml", "r") as file:
    config = yaml.safe_load(file)

max_fb_comments = config["max_fb_comments"]
announcement_keywords = config["announcement_keywords"]

### Read Data

In [None]:
# Set the hardcoded constants of the current club
color = "green"
club = "chelsea"
twitter_post_1_link = "https://x.com/ChelseaFC/status/1812744086866088313"
twitter_post_2_link = "https://x.com/ChelseaFC/status/1817818267706155029"

In [None]:
# Read in and confirm all data
facebook_comments_link = f"./football_kits_data/{color}/{club}/facebook_comments.txt"
instagram_comments_link = f"./football_kits_data/{color}/{club}/instagram_comments.txt"
all_platforms = pd.read_excel(
    f"./football_kits_data/{color}/{club}/all_plats.xlsx"
)  # Sprinklr data

# seperate by platform
df_instagram = all_platforms[all_platforms["SocialNetwork"] == "INSTAGRAM"]
df_twitter = all_platforms[all_platforms["SocialNetwork"] == "TWITTER"]
df_facebook = all_platforms[all_platforms["SocialNetwork"] == "FACEBOOK"]

# check if the posts are in the data
platforms = {"Facebook": df_facebook, "Twitter": df_twitter, "Instagram": df_instagram}
for platform, df in platforms.items():
    if df.shape[0] == 0:
        print(f"No {platform} posts found")

### Format Twitter

In [None]:
# Get the replies query of the top tweets
replies = get_twitter_replies_query(
    df_twitter, twitter_post_1_link, twitter_post_2_link
)
print(replies)

In [None]:
# Clean and analyze the sentiment of the replies
twitter_comments = pd.read_excel(
    f"./football_kits_data/{color}/{club}/twitter_comments.xlsx"
)
twitter_comments = clean_twitter_replies(twitter_comments)
twitter_comments["Sentiment_XLM"] = twitter_comments["Message"].progress_apply(
    get_sentiment
)

### Format Instagram

In [None]:
# Read in the comments from plats and comments
df_instagram = df_instagram[df_instagram["MessageType"] == "Instagram Post"]
with open(f"./football_kits_data/{color}/{club}/instagram_comments.txt", "r") as file:
    instagram = file.readlines()

In [None]:
# Extract all comments
insta_comments_ann = set(format_instagram_comments(instagram))
insta_comments_up = set(df_instagram["Message"])
df_instagram_unique_comments = list(insta_comments_ann.union(insta_comments_up))

In [None]:
# Analyze the sentiment of the instagram comments
df_instagram_comments = pd.DataFrame(df_instagram_unique_comments, columns=["Comment"])
df_instagram_comments["Sentiment_XLM"] = df_instagram_comments[
    "Comment"
].progress_apply(get_sentiment)

### Format Facebook

In [None]:
# get cookie jar
cookie_jar = get_cookie_jar("cookies.txt")

In [None]:
# Get all facebook posts/messages
fb_post_comments = set(
    [
        comment
        for post_id in get_facebook_post_ids(df_facebook)
        for comment in get_facebook_post_comments(post_id, max_fb_comments, cookie_jar)
    ]
)
fb_posts = list(df_facebook["Message"].unique())
fb_comments = extract_facebook_comments(facebook_comments_link)
df_facebook_unique_comments = list(
    set(fb_post_comments.union(fb_posts).union(fb_comments))
)

In [None]:
# Analyze the sentiment of the facebook comments
fb_comments_df = pd.DataFrame(df_facebook_unique_comments, columns=["Comment"])
fb_comments_df.drop_duplicates(subset=["Comment"], inplace=True)
fb_comments_df["Sentiment_XLM"] = fb_comments_df["Comment"].progress_apply(
    get_sentiment
)

### Aggregate Sources

In [None]:
# Combine all comments and sentiment scores
twitter_comments["SocialNetwork"] = "TWITTER"
df_instagram_comments["SocialNetwork"] = "INSTAGRAM"
fb_comments_df["SocialNetwork"] = "FACEBOOK"


all_comments = pd.concat(
    [twitter_comments, df_instagram_comments, fb_comments_df], ignore_index=True
)

In [None]:
# print sample size and senitment breakdown of each platform and overall
print(f"\n\nTwitter Comments: {twitter_comments.shape[0]}")
print_stats_XLM(twitter_comments)

print(f"\n\nFacebook Comments: {fb_comments_df.shape[0]}")
print_stats_XLM(fb_comments_df)

print(f"\n\nInstagram Comments: {df_instagram_comments.shape[0]}")
print_stats_XLM(df_instagram_comments)

print(f"\n\nOverall Comments: {all_comments.shape[0]}")
print_stats_XLM(all_comments)

# Announcement Posts

## Announcement post sentiment

### Format Twitter

In [None]:
twitter = pd.read_excel(
    f"./football_kits_data/{color}/{club}/twitter_comments_announcement.xlsx"
)

twitter = clean_twitter_replies(twitter)

In [None]:
twitter["Message"] = twitter["Message"].progress_apply(clean_text)
twitter["Message_translated"] = twitter["Message"].progress_apply(translate_text)
twitter["Sentiment_XLM"] = twitter["Message"].progress_apply(get_sentiment)

### Format Instagram

In [None]:
# Read in insta comments
with open(
    f"./football_kits_data/{color}/{club}/instagram_comments_announcement.txt", "r"
) as file:
    instagram = file.readlines()

# Extract the comments from the instagram post
comments = format_instagram_comments(instagram)

In [None]:
# Translate and clean the comments
comments = list(
    set([clean_text(comment) for comment in tqdm(comments, desc="Cleaning text")])
)
translated_comments = [
    translate_text(comment) for comment in tqdm(comments, desc="Translating text")
]

In [None]:
# create a dataframe and get the sentiment
instagram_df = pd.DataFrame(comments, columns=["Message"])
instagram_df["Sentiment_XLM"] = instagram_df["Message"].progress_apply(get_sentiment)
instagram_df["Message_translated"] = translated_comments

### Format Facebook

In [None]:
facebook_comments_announcement_link = (
    f"./football_kits_data/{color}/{club}/facebook_comments_announcement.txt"
)
fb_comments = extract_facebook_comments(facebook_comments_announcement_link)
df_facebook_unique_comments = list(set(fb_comments))


df_facebook_unique_comments = [
    clean_text(comment)
    for comment in tqdm(df_facebook_unique_comments, desc="Cleaning text")
]

df_facebook_unique_comments_translated = [
    translate_text(comment)
    for comment in tqdm(df_facebook_unique_comments, desc="Translating text")
]

In [None]:
# Analyze the sentiment of the facebook comments
fb_comments_df = pd.DataFrame(df_facebook_unique_comments, columns=["Comment"])
fb_comments_df.drop_duplicates(subset=["Comment"], inplace=True)
fb_comments_df["Sentiment_XLM"] = fb_comments_df["Comment"].progress_apply(
    get_sentiment
)
fb_comments_df["Message_translated"] = df_facebook_unique_comments_translated

## Announcement Sentiment

In [None]:
# Twitter Sentiment
print(f"Twitter Volume: {twitter.shape[0]}")
print("Twitter Sentiment")
twitter["Sentiment_XLM"].value_counts() / twitter.shape[0] * 100

In [None]:
print(f"Insta Volume: {instagram_df.shape[0]}")
print("Insta Sentiment")
instagram_df["Sentiment_XLM"].value_counts() / instagram_df.shape[0] * 100

In [None]:
print(f"FB Volume: {fb_comments_df.shape[0]}")
print("FB Sentiment")
fb_comments_df["Sentiment_XLM"].value_counts() / fb_comments_df.shape[0] * 100

In [None]:
all_sentiment = pd.concat([twitter, instagram_df, fb_comments_df], ignore_index=True)
print(f"Aggregate Volume: {all_sentiment.shape[0]}")
print("Aggregate Sentiment")
all_sentiment["Sentiment_XLM"].value_counts() / all_sentiment.shape[0] * 100

In [None]:
all_sentiment.dropna(subset=["Message_translated"], inplace=True)

## ABSA

In [None]:
# Segment announcement comments
all_sentiment["Announcement"] = all_sentiment["Message_translated"].progress_apply(
    lambda x: perform_absa(x, announcement_keywords)
)
print("Announcement Volume")
print(all_sentiment["Announcement"].value_counts())

In [None]:
# Announcement Sentiment
print("Announcement Sentiment")
all_sentiment[all_sentiment["Announcement"]][
    "Sentiment_XLM"
].value_counts() / all_sentiment[all_sentiment["Announcement"]].shape[0] * 100

In [None]:
# Non-Announcement Sentiment
print("Non-Announcement Sentiment")
all_sentiment[~all_sentiment["Announcement"]][
    "Sentiment_XLM"
].value_counts() / all_sentiment[~all_sentiment["Announcement"]].shape[0] * 100

In [None]:
# Save comments
all_sentiment[all_sentiment["Announcement"]]["Message_translated"].to_csv(
    f"./football_kits_data/{color}/{club}/announcement_comments.csv", index=False
)