I recently got familiar with the Apple ecosystem due to receiving a new iPhone from my mom. (Thanks, mom!)  I have had a MacBook Pro since college and now have an iPhone to seamlessly do tasks together. So watching Apple's latest WWDC 2024 was only the next best thing in order to keep track their newest offerings. 

Goal: NLP/Sentiment Analysis on WWDC 2024 - what people liked, what were the most talked about releases, etc.

Performing this analysis will provide insight on how Apple is still in the game since people have been seeing it as behind on the AI front.

In [1]:
! pip install praw bertopic --quiet

In [2]:
import os
from pprint import pprint

# API's
import googleapiclient.discovery #YouTube
import googleapiclient.errors
import praw # Reddit
from praw.models import MoreComments
from kaggle_secrets import UserSecretsClient

# Data Manipulations
import numpy as np 
import pandas as pd
pd.set_option('display.max_colwidth', None)
import json
from datetime import datetime 

## Generate different embeddings
import tensorflow
import tensorflow_hub as hub

# Topic Modeling
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer, util
from umap import UMAP
from hdbscan import HDBSCAN
from bertopic.vectorizers import ClassTfidfTransformer
from bertopic.representation import MaximalMarginalRelevance

from sklearn.feature_extraction.text import CountVectorizer

# Sentiment Analysis
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

2024-06-28 04:21:30.433867: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-06-28 04:21:30.434036: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-06-28 04:21:30.616517: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


# Get the data (YouTube and Reddit)
If there is not enough data, augment data with synonyms

API Links:
* [YouTube](https://console.cloud.google.com/apis/credentials?project=festive-zoo-404500)
* [Reddit](https://ssl.reddit.com/prefs/apps/)

YouTube Videos:
1. https://www.youtube.com/watch?v=qkYoBNdcXBU
1. https://www.youtube.com/watch?v=p2dhZ3AoDDs


Reddit Sub's:
1. https://www.reddit.com/r/apple/comments/1dct23m/wwdc_2024_postevent_megathread/
1. https://www.reddit.com/r/iOSProgramming/comments/1dcmmsm/wwdc_2024_megathread/
1. https://www.reddit.com/r/apple/comments/1de4qkn/what_are_your_biggest_takeaways_from_wwdc_2024_so/

In [3]:
user_secrets = UserSecretsClient()

In [4]:
# YouTube credentials
api_service_name = "youtube"
api_version = "v3"
DEVELOPER_KEY = user_secrets.get_secret("youtube_apikey")

youtube = googleapiclient.discovery.build(
    api_service_name, api_version, developerKey=DEVELOPER_KEY)

In [5]:
def getcomments(video):
    request = youtube.commentThreads().list(
        part="snippet",
        videoId=video,
        maxResults=100
    )

    comments = []

    response = request.execute()

    for item in response['items']:
        comment = item['snippet']['topLevelComment']['snippet']
        comments.append([
            comment['authorDisplayName'],
            comment['publishedAt'],
            comment['likeCount'],
            comment['textOriginal'],
            comment['videoId']
        ])

    while (1 == 1):
        try:
            nextPageToken = response['nextPageToken']
        except KeyError:
            break
        nextPageToken = response['nextPageToken']
        # Create a new request object with the next page token.
        nextRequest = youtube.commentThreads().list(part="snippet", videoId=video, maxResults=100, pageToken=nextPageToken)
        # Execute the next request.
        response = nextRequest.execute()
        # Get the comments from the next response.
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']
            published_at = comment['publishedAt']
            comments.append([
                comment['authorDisplayName'],
                published_at,
                comment['likeCount'],
                comment['textOriginal'],
                comment['videoId']
            ])

    df0 = pd.DataFrame(comments, columns = ['author', 'published_at', 'like_count', 'text', 'video_id'])
    return df0

In [6]:
# Get all WWDC 2024 comments from YouTube

# vids = [MKBHD, Vox]
vids = ["qkYoBNdcXBU", "p2dhZ3AoDDs"]

df0 = pd.DataFrame()

for v in vids:
    df1 = getcomments(v)
    df0 = pd.concat([df0, df1]) # Combine all df's together into one big dataframe

df0.sample(5)

Unnamed: 0,author,published_at,like_count,text,video_id
9416,@finminder2928,2024-06-11T07:38:20Z,0,"30K views in 9 minutes, bro fell off",qkYoBNdcXBU
6709,@amoryerenhouse5535,2024-06-11T10:59:13Z,1,"Nahh, they did not just seriously copy Pixel Material UI but do it worse, I bet the iOS users are gonna call ✨revolutionary✨",qkYoBNdcXBU
515,@swapnilsen451,2024-06-14T07:45:32Z,0,"But why does my iPhone 15 Plus need the latest silicon to ping ChatGPT? If it cannot run the on device model, just let us ping ChatGPT at least.",qkYoBNdcXBU
6731,@speedyf40,2024-06-11T10:56:17Z,0,"I'm an Android user who is rounding up my 40s rapidly, I'm stoked for AImojis. I can't wait till we get them in Android",qkYoBNdcXBU
4652,@Kaotix_music,2024-06-11T14:50:32Z,0,"5:17 - i swear that was my first thought with hiding apps and requiring face ID for certain apps 😂😂😂 Like, Apple - theres only one reason why someone would want those features.\nI did download both iOS 18 Developer Beta last night and Siqoia and i was upset they left out the iphone mirroring feature in the new mac OS",qkYoBNdcXBU


In [7]:
# Proper date formats
df0['published_at'] = pd.to_datetime(df0['published_at'], format='%Y-%m-%dT%H:%M:%SZ')
df0['published_at_date'] = df0['published_at'].dt.date
df0['published_at_month'] = df0['published_at'].dt.month

In [8]:
# Rename video id's to appropriate video maker
df0['video_id'] = df0['video_id'].map({'qkYoBNdcXBU' : 'MKBHD', 'p2dhZ3AoDDs' : 'The Verge'})
df0.rename(columns = {'video_id':'youtube_channel'}, inplace=True)
df0.sample(5)

Unnamed: 0,author,published_at,like_count,text,youtube_channel,published_at_date,published_at_month
7572,@matthewsjardine,2024-06-11 09:25:54,0,"Possibly my favorite thing that came out this time is that Apple is getting over itself. Apple has a massive ""not invented here"" complex which for years would see them ignoring stuff that everyone knew they should do, but because it would be seen as them backing down, they wouldn't. Whether it be window snapping on Mac or the ability to move your icons on iOS, Apple is finally giving folks no brainer features.",MKBHD,2024-06-11,6
2708,@K.L.A.S,2024-06-11 21:10:53,0,They didn't want to until apple told them it yes you want to 😂😂😂,MKBHD,2024-06-11,6
297,@msp713,2024-06-10 21:18:03,2,"How the hell is the AI supposed to know who 7 year old ""Leo"" is? Either this is vaporware, or a step towards dystopia.",The Verge,2024-06-10,6
5408,@alexgreen9571,2024-06-11 13:21:53,0,"Really hate that you have to get a brand new phone for this to work, as I love my 13 pro in green. I now feel forced to upgrade, but honestly with how terrible Siri is, it’s worth the upgrade for all the AI stuff. Apple please make another green iPhone 🥲",MKBHD,2024-06-11,6
6208,@RichardMajor86,2024-06-11 11:53:13,0,“Do you want me to use ChatGPT to do that?” is going to be the new “here’s something I found on the web”,MKBHD,2024-06-11,6


In [9]:
df0['youtube_channel'].value_counts()

youtube_channel
MKBHD        9819
The Verge     483
Name: count, dtype: int64

In [10]:
df0['app'] = 'YouTube'

In [11]:
len(df0)

10302

In [12]:
# Reddit credentials
reddit = praw.Reddit(client_id=user_secrets.get_secret("reddit_client_id"),
                     client_secret=user_secrets.get_secret("reddit_client_secret"),
                     user_agent=user_secrets.get_secret("reddit_user_agent"))

In [13]:
# get all-level Reddit comments for all Reddit posts
def reddit_comments(url):

    all_level_comments = []

    submission = reddit.submission(url=url)

    submission.comments.replace_more(limit=None)
    for comment in submission.comments.list():
        published_at = str(datetime.fromtimestamp(comment.created)) #UTC
        published_at = pd.to_datetime(published_at, format='%Y-%m-%d %H:%M:%S')
        all_level_comments.append({
            "author": comment.author,
            "published_at": published_at, 
            "published_at_date": published_at.date(),
            "published_at_month": published_at.month,
            "like_count": comment.score,
            "text": comment.body
        })

    all_comments_df = pd.DataFrame(all_level_comments)
    return all_comments_df

In [14]:
# Define function to get dataframe for all Reddit comments
def combo_dataframe(links):
    df = pd.DataFrame()
   
    for l in links:
        df2 = reddit_comments(l)
        df = pd.concat([df, df2]) # Combine all df's together into one big dataframe
    return df

In [15]:
# get WWDC 2024 posts' all-level Reddit comments

reddit_links = ['https://www.reddit.com/r/apple/comments/1dct23m/wwdc_2024_postevent_megathread/',
                 'https://www.reddit.com/r/iOSProgramming/comments/1dcmmsm/wwdc_2024_megathread/',
                 'https://www.reddit.com/r/apple/comments/1de4qkn/what_are_your_biggest_takeaways_from_wwdc_2024_so/'] 

reddit_df = combo_dataframe(reddit_links)

In [16]:
reddit_df.head()

Unnamed: 0,author,published_at,published_at_date,published_at_month,like_count,text
0,BCDragon3000,2024-06-10 18:45:13,2024-06-10,6,813,never forgetting the yay sound effect after the calculator app reveal 😭😭😭
1,Oulixonder,2024-06-10 18:45:23,2024-06-10,6,533,Every email and text in the future will just be AI talking to one another
2,hammerheadtiger,2024-06-10 18:44:49,2024-06-10,6,873,"Tim Cook is shouting from the roof, Craig and the exec team skydiving, Mike Rockwell in a Vision Pro, Apple video team budget at an all time high. Seat belts on folks, its a big one this year\n\nImpressions:\n\n**visionOS 2**\n\n- 2k native apps, 1.5m compatible apps\n- Photos: ML to turn 2D photos into 3D spatial photos, Shareplay\n- Quick access hand gesture menu\n- Mac Virtual Display supports different screen sizes including ultra wide which simulates two 4k monitors side by side\n- Travel mode adds train support\n- Cannon will sell a spatial lens for their cameras\n- Other - rearrange home screen, mouse support, new APIs, available in 8 more countries \n\n**iOS 18**\n\n- Home Screen\n\t- App icons and widgets can be placed anywhere\n\t- Adds ability to tint all apps by color, dark mode darkens app icons\n\t- Lock or hide apps\n- Control Center\n\t- Multiple pages including full screen home and music widgets\n\t- developers can build for control center too\n- Swap lock screen controls for other actions - finally!\n- Secure Bluetooth pairing for apps that looks like how AirPods pair\n- Messages: \n\t- Tapbacks support any emoji or sticker now\n\t- Scheduled messages\n\t- Text formatting\n\t- Text effects \n- iMessage and SMS via satellites - I feel like this would have been a huge feature any other year\n- Mail: \n\t- On device categorization\n\t- AI powered digest of emails\n- Photos App has been redesigned entirely with filters for screenshots, pinning, grouped people photos\n- Other\n\t- Maps - Topographic maps\n\t- Tap to Cash - pay each other by tapping phones\n\t- Better event tickets\n\t- Journal app adds features like stats and streaks\n\t- Game mode in iPhone\n\t- Reminders in calendar\n\t- RCS support launching\n\n**AirPods**\n\n- Nod yes or shake for no to respond to Siri\n- Voice isolation for Windy or loud environment \n\n**tvOS**\n\n- Insight in video to identify actors and music\n- Adjust voice in video to make them clear - now we can finally know what Michael Caine said in Interstellar\n- Supports 21:9 projectors for those of you with full theater set ups in this economy\n\n**watchOS 11**\n\n- Training Mode - measure intensity, duration, effort, training load\n- Fitness app - cards can be reorganized\n- Vitals App - check key metrics like heart rate and insights over time\n- Cycle Tracking app - supports pregnancy and gestational metrics - Now Apple can sell a million watches to anxious first time parents\n- Widget stack sorts itself based on context like weather changes and ongoing Uber rides\n- New watch faces created by an AI selecting your good photos and reframing them\n\n**iPad OS**\n\n- Redesigned apps with a more Vision Pro like animated tab bar \n- Shareplay - draw on or even remotely control other peoples screen - big day for \n- HOLD THE PRESSES - CALCULATOR APP IS HERE\n\t- Math Notes - handwrite expressions with variables with Apple Pencil and it will automatically solve them. supports graphs as well\n\t- Math in Notes app too\n\t- Now what will this subreddit complain about anymore?\n- Notes Smart Script \n\t- makes your unreadable handwriting look good. I know some people who need this badly\n\t- Spell check for handwriting\n\t- Automatically shifting words around as you write\n\n**macOS Sequoia**\n\n- Were not even halfway in and absolutely blowing through these platforms, hmmm I wonder what they are saving time for, what a mystery\n- Continuity\n\t- iPhone Mirroring - a lot of Android manufacturers have been doing this for a while now and its a very welcome addition here, sometimes you just want to quickly access a thin on your phone\n\t- iPhone notifications can go to mac now and automatically trigger iPhone Mirroring\n- Tiling and snapping to edges/corners - finally!\n- Presenter preview for screens sharing, background replacement\n- Passwords App - surfaces iCloud Keychain features - imo much needed as 1Password is one of my most used apps because it is surfaced\n- Safari\n\t- Highlights - identify and surface key info from a webpage\n\t- Reader - summarizes websites - still no mention of AI\n\t- Automatic picture in picture\n- Gaming\n\t- Metal 3\n\t- Game Porting Toolkit 2 with better support for Windows Games - we got a mention of MS Windows before AI\n\t- Coming to Mac: Frostpunk 2, Control, Assassins Creed Shadows, \n\n**Apple Intelligence**\n\n- ""AI for the rest of us"" - Craig\n- Writing tools available across the system\n\t- Proofreading\n\t- Write emails and notes\n\t- Summarization for emails as well as email snippets\n\t- Asks you questions to generate a response\n\t- Inbox summaries\n- Notification summaries, selective surface only important notifications to reduce interruptions\n- Can use your OS for you to do tasks like pulling up apps to play music and creating folders\n- Understands personal context from aggregated data across the system\n- GenMoji to create a custom emoji\n- Image Playground\n\t- Create images across the system including of people in iMessages\n\t- Makes it easier to create images based on traits and styles\n\t- Happens entirely on device\n\t- Image Wand can convert a rough sketch into a better image\n- Can create videos based on concepts and find photos from a long time period to tell a story\n- Record and transcribe audio in notes and phone\n- Privacy\n\t- On device processing on A17 and all M chips\n\t- Private Cloud Compute for large server based models\n\t- Data is never stored or shared with Apple\n\t- Verifiable software for independent researchers\n\n**Siri**\n\n- Better language understanding that understands corrections and context\n- Type to Siri\n- Siri can help with tech support for Apple products\n- On-screen awareness - can ask for things to be done based on info on the screen\n- Command Siri to take photos, take notes, or search for something across the system including actions in videos\n- Can do compound understanding like finding a drivers license number and typing it into a form for you or finding a persons flight number and finding its live status to determine if it works with existing lunch plans\n- Rest in peace standalone AI gadgets like Rabbit R1 and Humane Ai Pin, we hardly knew ye\n\n**ChatGPT 4o**\n- Siri can go ask ChatGPT for things like answer general knowledge questions and generate more complex images\n- Free and info not logged, chatGPT subscribers can access paid features\n- Other AI models will be added in the future - sure, throw Google a bone too I guess\n\nWhew! What a day, this was probably the most jam packed WWDC I have watched in a long time. They definitely went deeper on integration that I thought they would. Hats off to the developers working crazy hours to make the features announced a reality."
3,ConflictedRedbird186,2024-06-10 18:45:04,2024-06-10,6,134,I’m here to overreact. And also download a beta I have no business downloading.
4,BeefIsForDinner,2024-06-10 18:44:24,2024-06-10,6,89,Gimme the betaaaaaaaaaaaaa


In [17]:
len(reddit_df)

1344

In [18]:
reddit_df['app'] = 'Reddit'

In [19]:
# Before concating YouTube and Reddit df's. see if their columns are equal.
list(reddit_df)

['author',
 'published_at',
 'published_at_date',
 'published_at_month',
 'like_count',
 'text',
 'app']

In [20]:
youtube_df = df0.drop(columns=['youtube_channel'], inplace=True)
youtube_df = df0

In [21]:
list(youtube_df)

['author',
 'published_at',
 'like_count',
 'text',
 'published_at_date',
 'published_at_month',
 'app']

In [22]:
# Combine YouTube df and Reddit df together
df = pd.concat([youtube_df, reddit_df])
len(df)

11646

In [23]:
df.head()

Unnamed: 0,author,published_at,like_count,text,published_at_date,published_at_month,app
0,@Manny0404,2024-06-28 00:59:38,1,Seems like ios18 is finally getting a worthwhile update,2024-06-28,6,YouTube
1,@ThomasTsukiyama,2024-06-27 23:26:30,0,"The variety of textures in the kislux pack is impressive. From smooth leather to textured suede, there's something for everyone.",2024-06-27,6,YouTube
2,@VrbaShelvy,2024-06-27 23:19:04,0,I really love your taste and style always so chic. My faves would be the kislux leather backpack and the Swarovski pave diamond ring so gorg. Thank you for your recommendations.,2024-06-27,6,YouTube
3,@carmody90,2024-06-27 17:11:01,0,You could always type to siri. The feature was found in the accessibility menu in settings,2024-06-27,6,YouTube
4,@mogleymogley9587,2024-06-27 13:31:39,0,5:35 Bro .. Youre reading iPhone Specs from an Android,2024-06-27,6,YouTube


# With only 12,000 samples, augment data using synonyms?

# Perform Bertopic Modeling

In [24]:
# No text pre-processing needed
docs = df['text']

# Step 1 - Extract embeddings
embedding_model = SentenceTransformer("all-MiniLM-L6-v2")

# Step 2 - Reduce dimensionality
umap_model = UMAP(n_neighbors=15, n_components=5, min_dist=0.0, metric='cosine')

# Step 3 - Cluster reduced embeddings
hdbscan_model = HDBSCAN(min_cluster_size=15, metric='euclidean', cluster_selection_method='eom', prediction_data=True)

# Step 4 - Tokenize topics
vectorizer_model = CountVectorizer(stop_words="english")

# Step 5 - Create topic representation
ctfidf_model = ClassTfidfTransformer()

# All steps together

representation_model = MaximalMarginalRelevance(diversity=0.5)
topic_model = BERTopic(
  embedding_model=embedding_model,    # Step 1 - Extract embeddings
  umap_model=umap_model,              # Step 2 - Reduce dimensionality
  hdbscan_model=hdbscan_model,        # Step 3 - Cluster reduced embeddings
  vectorizer_model=vectorizer_model,  # Step 4 - Tokenize topics
  ctfidf_model=ctfidf_model,          # Step 5 - Extract topic words # Diversify topic words
  calculate_probabilities=True,        
  verbose=True,
  representation_model=representation_model
)

topics, probs = topic_model.fit_transform(docs)
topic_model.get_topic_info()

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

2024-06-28 04:22:58,119 - BERTopic - Embedding - Transforming documents to embeddings.


Batches:   0%|          | 0/364 [00:00<?, ?it/s]

2024-06-28 04:24:45,208 - BERTopic - Embedding - Completed ✓
2024-06-28 04:24:45,211 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2024-06-28 04:25:25,427 - BERTopic - Dimensionality - Completed ✓
2024-06-28 04:25:25,429 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-06-28 04:25:37,178 - BERTopic - Cluster - Completed ✓
2024-06-28 04:25:37,190 - BERTopic - Representation - Extracting topics from clusters using representation models.
2024-06-28 04:25:57,420 - BERTopic - Representation - Completed ✓


Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,-1,2756,-1_ai_apple_features_google,"[ai, apple, features, google, video, ios, years, really, devices, updates]","[So after like 10 years they finally do things other phones have been doing for years? Well done Apple 😂, Safari don’t load websites very annoying but with chrome straight away, every year people expecting multitasking capabilities and split screen features but never came out,again Android phones have it for years, basically iPhone is trying to coping Android features that they have for years like for example widgets home screen but they do it so poorly not even close…By the way color change icons is a joke is another desperate way to allow the iPhone user to do something on icons when you have it for years a possibility the chance to change the entire icons design on Android phones. Basically what they are doing is put ther phones with the same features but take ages for a consumer get does features on iPhone that’s why people move to Android. People tired to wait years for developments, basically Apple software customization and innovation departments in the last 10 years don’teven close what Android phones offer, even a Android phone released 10 years ago the software is more advanced compared to a brand new iPhone. All this years im still thinking IPhone can have the advanced software that Android phones have it, but Apple is doing this transition very slow so that away not being very noticeable. To be honest every person that have both phones straight away agree with me, basically if you only use Android phone you don’t understand that your phone Operating System is so blast advanced.\nOn the other hand a person that uses only IPhone don’t understand how basic the IOS iPhone operating system it is compared to Android Operating system software looks like living in a stone age, honestly just to put you in perspective when you use Android phone feels like the Android phone user is already in 2035 at least . But don’t forget when you buy a Android phone don’t buy the cheapest one because you need at least a medium performance processor so you good on average price don’t need to be a top line or a flagship Android phone but if you can affordit better., I think the icon redesign is great, let me do what I want . If I want a dinosaur theme for example let me be 😂. Of course this has existed on android for years but this is “ the apple way” as they call it. I never had an issue with Home Screen I thought it was fine but hey I might just a boring person I guess. Interesting changes, in all honesty I don’t know if this will change my experience on my phone including the Mac and iPad version but hey! It’s a software update I can’t say no]"
1,0,997,0_calculator_ipad_handwriting_maths,"[calculator, ipad, handwriting, maths, note, students, high, pencil, homework, lol]","[Math notes🤯, Calculator App to iPad. 😂😂, iPad calculator app!!!]"
2,1,462,1_siri_typing_accessibility_settings,"[siri, typing, accessibility, settings, voice, button, assistant, feature, updated, older]","[type to siri was already there..., That it, so we can type to Siri now. 😴, You could already type to Siri]"
3,2,378,2_samsung_mkbhd_event_updates,"[samsung, mkbhd, event, updates, android, reads, talking, review, ios, note]","[He was reading it off a samsung😂😂😂😂😭😭, Reading notes of a Samsung.., reading notes about apple on samsung😂😂]"
4,3,270,3_apple_catch_companies_boring,"[apple, catch, companies, boring, innovation, wow, customers, tech, say, feels]","[Only apple can do..., Apple ... what ?, well done Apple!]"
...,...,...,...,...,...
120,119,16,119_updates_underwhelming_huuuuuuuuugggggeeee_lamest,"[updates, underwhelming, huuuuuuuuugggggeeee, lamest, gimmicky, 22, excited, imo, innovative, boring]","[Most overhyped underwhelming set of updates till now., These updates are very underwhelming...., Huuuuuuuuugggggeeee update. I’m very impressed, if everything works as demonstrated this is the biggest update ever]"
121,120,16,120_thanks_summerize_neat_dope,"[thanks, summerize, neat, dope, awesome, hope, man, pretty, work, good]","[Thanks, Thanks!, Thanks!]"
122,121,16,121_tracking_segues_waveform_cube,"[tracking, segues, waveform, cube, tiktok, eyes, visionos, swipe, underwhelming, view]","[Have you tried using the new eye tracking, So no eye tracking update?😢, Where is the eye tracking thing please]"
123,122,16,122_jobs_steve_grave_slydell,"[jobs, steve, grave, slydell, paycheck, porter, lumbergh, antics, professor, department]","[Steve Jobs rolling around in his grave over the colour change feature., Steve Jobs is rolling in his grave on the colors, Steve Jobs turning in his grave.]"


* The first topic is -1 and contains the most records. This is the outliers topic and should typically be ignored during analysis.

Let's see if different embeddings create more clear topics.


In [25]:
%%time

#load the universal sentence encoder model
use4 = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

#generate embeddings
use4_embeddings = use4(df['text'])
use= np.array(use4_embeddings)

#create list from np arrays to store the embeddings in the dataframe
df['use4'] = use.tolist()

#pass the embeddings into BERTopic
topic_model.fit_transform(docs, use)

#get topic info
topic_model.get_topic_info()

2024-06-28 04:26:27,939 - BERTopic - Dimensionality - Fitting the dimensionality reduction algorithm
2024-06-28 04:26:38,302 - BERTopic - Dimensionality - Completed ✓
2024-06-28 04:26:38,304 - BERTopic - Cluster - Start clustering the reduced embeddings
2024-06-28 04:26:46,930 - BERTopic - Cluster - Completed ✓
2024-06-28 04:26:46,940 - BERTopic - Representation - Extracting topics from clusters using representation models.
2024-06-28 04:27:03,788 - BERTopic - Representation - Completed ✓


CPU times: user 1min 47s, sys: 23.4 s, total: 2min 10s
Wall time: 1min 6s


Unnamed: 0,Topic,Count,Name,Representation,Representative_Docs
0,-1,4820,-1_apple_like_ai_features,"[apple, like, ai, features, new, users, thing, apps, calculator, make]","[So are all the Apple Intelligence features only for iPhone 15 pro, and all iPad and Mac with M chips? Is the regular 15 not even getting the new Siri? Or is the generative features only for the new devices., Okay so the reason why the The calculator is the coolest shit you've ever seen because it's the coolest shit apple's done in a really long time like that's game changing. You just essentially put a TI-84 plus in your phone like I can't think of a single job where this wouldn't be useful. Honestly if you can combine this with the notes app in general you have like the perfect notes The app like actually worth buying the whole thing just for that one app type good as someone who has to count down drawers and do a lot of like mental math, I see that paying for itself day one like my only request for now is let it be able to read a graph or anything else I make, If I want to use the AI stuff, can I buy an iPhone 15 or should I wait for the 16?\nAlso, I'm going to buy the newest current iPad and MacBook Pro. Will the AI stuff be available there? \nThanks!]"
1,0,683,0_samsung_s24_ultra_notes,"[samsung, s24, ultra, notes, marques, mkbhd, read, updates, ios, savage]","[Reading notes of a Samsung.., Reading IOS 18 new features from Samsung galaxy S24 Ultra's Notes app 😅😂😂, As Marques is reading his notes of a Samsung Ultra. 😂]"
2,1,567,1_siri_type_accessibility_voice,"[siri, type, accessibility, voice, chatgpt, carplay, bixby, languages, button, homepods]","[AI as in Siri?, Siri at last., What have they done to Siri...]"
3,2,311,2_ios_users_features_update,"[ios, users, features, update, years, basically, finaly, turning, lol, welcome]","[iOS 18: Android 7, So iOS 18 is.. Android?, iOS 18 = iOS + Android :)]"
4,3,278,3_genmoji_grammarly_iai_12,"[genmoji, grammarly, iai, 12, ipados, chatgpt, lol, homepod, hyped, boring]","[I’m kinda excited for genmoji lol, ""just basic useless stuff"" at 12:43 😂, 12:40 did he just say ""Just basic, useless stuff""? ]"
...,...,...,...,...,...
99,98,18,98_bed_4am_100k_renovating,"[bed, 4am, 100k, renovating, ranges, surgery, hell, score, jk, stayed]","[y so late i was going to sleep :(, Bro let me sleep 💀, bro its 4am. SLEEP!]"
100,99,17,99_tinder_hide_grindr_wdym,"[tinder, hide, grindr, wdym, closeted, spouse, cheating, download, cringe, seriously]","[i can finally hide tinder from my wife haha, All I need is to hide Tinder from my wife., Finally I can hide Tinder from my wife!]"
101,100,17,100_math_motion_notes_lol,"[math, motion, notes, lol, syllabus, chapter, revolutionizing, plane, need, kindergarten]","[Math notes looks like the death of effective math homework in school 😂 kids won't need to engage their brains anymore, Kids will never know how to math now, Kids are going to love Math notes.]"
102,101,16,101_notes_integrals_stochastic_simplify,"[notes, integrals, stochastic, simplify, stokes, logs, financial, mathematics, sophisticated, helped]","[8:50 OK, Math Notes is extremely impressive. If only I had had this during Calculus and Differential Equations in university 😢, Can the math thing do calculus?, Can math notes do calculus?]"


In [26]:
topic_model.get_topic(0) # top topic

[('samsung', 0.09494059549661003),
 ('s24', 0.05238044530252093),
 ('ultra', 0.0492850942874262),
 ('notes', 0.03717395562106634),
 ('marques', 0.03035262163875986),
 ('mkbhd', 0.030158972606124358),
 ('read', 0.02992618828244206),
 ('updates', 0.01959269878502213),
 ('ios', 0.013267978043708262),
 ('savage', 0.01101475449731044)]

These topics seem more interpretable due to better embeddings.

#### BERTopic Visualizations 
Reference: https://maartengr.github.io/BERTopic/getting_started/visualization/visualization.html#visualize-probablities-or-distribution

In [27]:
topic_model.visualize_topics()

The biggest topic after the outlier topic (0) was topic 1 which was about Siri. I used the slider to highlight the bubble. When you hover over the bubble, you see the words associated with the topic.

In [28]:
topic_model.visualize_barchart(top_n_topics=8)

Looks like some other huge topics were criticism of Apple being behind in development including the AI frontier. Though a huge win was the showcasing of the interactive/predictive calculator.

In [29]:
topic_model.visualize_heatmap()

* Not sure what the benchmark is for good separation between topics. 
* Considering topics not along the diagonal and not associated with topic 0.  
* Most topics are 0.4 or below in similarity score. This means there are nice distinct topics discovered under better embeddings and BERTopic model.

# Perform Sentiment and Emotion Classification

In [30]:
# using Hugging Face sentiment classifier trained on pre-trained on natural language inference (NLI)
# Info pg: https://huggingface.co/lxyuan/distilbert-base-multilingual-cased-sentiments-student

# Set up the inference pipeline using a model from the 🤗 Hub

sentiment_analysis = pipeline(model="lxyuan/distilbert-base-multilingual-cased-sentiments-student")

config.json:   0%|          | 0.00/759 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/541M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/373 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/996k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.92M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

In [31]:
# %%time

# df = (
# df.assign(sentiment = lambda x: x['text'].apply(lambda s: sentiment_analysis(s)))
#     .assign(
#          label = lambda x: x['sentiment'].apply(lambda s: (s[0]['label'])),
#          score = lambda x: x['sentiment'].apply(lambda s: (s[0]['score']))
#     )
# )

# df.head()

# Aspect Based Sentiment Analysis (ASBA) Using PyABSA

# Named Entity Recognition in order to isolate products?

# Radar Graph between 2 similar, close in proximity restaurants 