# Data Story Telling on the 50 most-followed TikTok accounts

In this notebook, we will be focusing on data visualization and drawing insights from the top 50 most-followed TikTok accounts. Due to the infinite evolution of e-commerce and digital marketing strategies, social media reputation have proved to be a vital part in every organizations. In fact, most medium to large organizations now resort to social media influencers as their primary marketing strategy. Due to this, knowing which influencer and how their funnels can help the organization is incredibly important. 

## 1.1 Import

Importing necessary libraries

In [1]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

The following table lists the 50 most-followed accounts on TikTok, with each follower total rounded down to the nearest hundred thousand, as well as a description of each account and their country of origin.

In [2]:
tiktok_most_followed = pd.read_csv('List of most-followed TikTok accounts.csv')
tiktok_most_followed.head()

#Since rank represents the account ranking from 1-50, we cannot set it as the index

Unnamed: 0,Rank,Username,Owner,Followers\n(millions),Description,Country,Brand\nAccount
0,1,@khaby.lame,Khabane Lame,153.3,Social media personality,Italy Senegal,-
1,2,@charlidamelio,Charli D'Amelio,149.3,Dancer and social media personality,United States,-
2,3,@bellapoarch,Bella Poarch,92.5,Singer and social media personality,Philippines United States,-
3,4,@addisonre,Addison Rae,88.8,Social media personality and dancer,United States,-
4,5,@willsmith,Will Smith,72.9,Actor and film producer,United States,-


In [3]:
tiktok_hashtag = pd.read_excel('tiktok_hashtags.xlsx')
tiktok_hashtag.head()

Unnamed: 0,authorMeta/avatar,authorMeta/digg,authorMeta/fans,authorMeta/following,authorMeta/heart,authorMeta/id,authorMeta/name,authorMeta/nickName,authorMeta/signature,authorMeta/verified,...,musicMeta/playUrl,playCount,searchHashtag/name,searchHashtag/views,shareCount,text,videoMeta/duration,videoMeta/height,videoMeta/width,webVideoUrl
0,https://p16-sign-sg.tiktokcdn.com/aweme/720x72...,420,13200000,28,150400000,6713126981665686530,miso_ara,미소아라 Miso_Ara,soonent@soon-ent.co.kr\n.\n Miso Ara IG,True,...,https://sf16-ies-music-sg.tiktokcdn.com/obj/ti...,200600000,meme,556B,377700,Ara Woah #woah #woahchallenge #foryou #fyp...,17,1280,720,https://www.tiktok.com/@miso_ara/video/6797294...
1,https://p16-sign-va.tiktokcdn.com/tos-maliva-a...,10800,1300000,182,35100000,6929583089811522566,crinka11,Chris Rinker,insta: chrisrinker73,False,...,https://sf16-ies-music-va.tiktokcdn.com/obj/mu...,79600000,meme,556B,205400,#fyp #meme #funny #meme #vine,11,1024,576,https://www.tiktok.com/@crinka11/video/6958603...
2,https://p16-sign-va.tiktokcdn.com/tos-maliva-a...,251,696800,42,15900000,7083448802635596842,iampets_com,IamPéts,"Pet supplies, toys online store. All products ...",False,...,https://sf16-ies-music-va.tiktokcdn.com/obj/ie...,106100000,meme,556B,323000,The end #funny #funnyvideos #animals #haha #me...,25,1024,576,https://www.tiktok.com/@iampets_com/video/7083...
3,https://p16-sign-va.tiktokcdn.com/tos-maliva-a...,218,510200,56,32100000,7087287470497645573,dailydosevideos_,dailydosevideos,Daily dose of videos/memes \nsupport the page ...,False,...,https://sf16-ies-music-va.tiktokcdn.com/obj/mu...,72500000,meme,556B,133400,Try not to laugh hard #meme #trynottolaughtik...,62,1024,576,https://www.tiktok.com/@dailydosevideos_/video...
4,https://p19-sign.tiktokcdn-us.com/tos-useast5-...,47600,3300000,690,137300000,6621521206107717638,jakeypoov,Jake Sherman,-.-- --- ..- .-. . .-.. --- ...- . -.. ...,True,...,https://sf16-ies-music-va.tiktokcdn.com/obj/mu...,62700000,meme,556B,252900,HE DIDN’T HAVE HIS MASK ON @abbysherm (Follow...,36,1024,576,https://www.tiktok.com/@jakeypoov/video/681538...


### 1.1.2 Cleaning Hashtag Dataset

In [4]:
#Renaming columns
tiktok.columns = ['Rank', 'Username', 'Owner', 'Followers', 'Account_type', 'Country', 'Brand']

NameError: name 'tiktok' is not defined

In [None]:
#Like every other datasets, checking for null values is a must
tiktok.info()

In [None]:
#Checking for unique values for each column
tiktok.nunique()

## 1.2 Exploration

If the Company can only choose 1 out of these 50 TikTok account to market their product, who would it be and why? The below exploratory data analysis will answer a series of question that could help stakeholders choose which accounts to contact.

### 1.2.1 What is the most popular account type among the 50 most-followed accounts?

To answer this, we will be plotting a bar chart of Type of Account vs. the Number of Followers

In [None]:
x_axis = tiktok['Account_type']
y_axis = tiktok['Followers']

plt.figure(figsize=(15,3))
plt.bar(x_axis, y_axis)
plt.title('Most popular account type')
plt.xlabel('Account type')
plt.ylabel('Followers (thousands)')
plt.xticks(rotation=90)
plt.show()

We can see that "Social media personality" is the most popular account type, but that doesn't really tell us what makes the account so appealing. We can also see that the word "Social media" appears in almost every categories. To confirm this, let's generate a word cloud in the next sub-section to see what's the most popular word.

### 1.2.2 Does the number followers correlate with the account identity?

In [None]:
text = " ".join(cat for cat in tiktok.Account_type)
word_cloud = WordCloud(collocations = False, background_color = 'white').generate(text)
plt.figure()
plt.imshow(word_cloud, interpolation="bilinear")
plt.axis("off")
plt.show()

The wordcloud above has proved that "social media" and "personality" are the most common descriptions for these accounts. Thus, having the account type as social media or personality, or both, do not generate much insights about the account holder and their potential customer reach. The Company needs to find an account that carries a certain niche, for example, we need a sport practicioner to advocate for sport drink, not a comedian.

In [None]:
#Dropping "social media" and "personality" from "Account type"
tiktok['Account_type'] = tiktok['Account_type'].str.lower()
tiktok['Account_type'] = tiktok['Account_type'].str.replace('social media','')
tiktok['Account_type'] = tiktok['Account_type'].str.replace('personality','')

#Dropping the stopword "and"
tiktok['Account_type'] = tiktok['Account_type'].str.replace('and','')

tiktok.head()

Generating the wordcloud again to see the next most-followed account type

In [None]:
text = " ".join(cat for cat in tiktok.Account_type)
word_cloud = WordCloud(collocations = False, background_color = 'white').generate(text)
plt.figure()
plt.imshow(word_cloud, interpolation="bilinear")
plt.axis("off")
plt.show()

Now that we know singer, dancer, actress, and songwriter are the most-followed account type, let's find their locations!

### 1.2.3 Which country has the most-followed accounts?

In [None]:
#Look at the countries that these account-holders live in
popular_country = pd.concat([tiktok.Country.value_counts(), 100 * tiktok.Country.value_counts()/len(tiktok.Country)], axis=1)
popular_country.columns=['count','%']
popular_country.sort_values(by=['%'], ascending=False) 