## BRI Social Network Analytics
2022 Fajar Krisna Jaya

Dataset:
Tweets  from @BANKBRI_ID December 2022 , scraped using snscrape


This Jupyter Notebook contains multiple cells with code for analyzing sentiment and other metrics related to tweets about BRI Bank Indonesia. 

The notebook imports packages such as pandas, numpy, seaborn, and plotly to manipulate and visualize the data. It also uses the Indonesia BERT Sentiment Classification model to classify the sentiment of the tweets. 

The notebook includes cells for plotting the data frame, calculating the max and min date, sentiment analysis, exploratory data analysis (EDA), topic modeling, and more. 

The code is organized into separate cells to allow for independent execution, and variables and modules are imported only once to avoid repetition. 




# Data Preprocessing 

In [1]:
#Import Packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from transformers import pipeline

#SNSCrape : 


In [84]:
df = pd.read_excel('Bri_dec_2022.xlsx')
df = df[['date', 'replyCount', 'retweetCount', 'likeCount','viewCount','coordinates','hashtags','rawContent',]]
df



Unnamed: 0,date,replyCount,retweetCount,likeCount,viewCount,coordinates,hashtags,rawContent
0,2022-12-31,1,1,0,930.0,,"['UMKM', 'bankbri']",Sejak dulu @BANKBRI_ID selalu konsisten menduk...
1,2022-12-31,0,0,0,800.0,,,@mrsyahri4 @danawallet @BANKBRI_ID Live Chat s...
2,2022-12-31,3,0,1,1370.0,,,@danawallet @danasupport @BANKBRI_ID tolong in...
3,2022-12-31,0,0,0,,,,@MissyRo95856010 @BANKBRI_ID Rispek banget pok...
4,2022-12-31,0,0,0,,,,@zulfaayuashary @BANKBRI_ID membantu banget ni...
...,...,...,...,...,...,...,...,...
11141,2022-12-01,0,0,0,,,,"@diverxent @Telkomsel kartu, kode CVC, EXP Dat..."
11142,2022-12-01,0,0,0,,,,"@diverxent @Telkomsel Hai Sobat BRI, terima ka..."
11143,2022-12-01,4,0,0,,,,Banyak banget yang gini @Telkomsel @BANKBRI_ID...
11144,2022-12-01,0,0,0,,,,@Nadyadwiamelia resmi BRI yang terdapat centan...


In [3]:
#max and min date
print("Max date: ", df['date'].max())
print("Min date: ", df['date'].min())


Max date:  2022-12-31 00:00:00
Min date:  2022-12-01 00:00:00


# IndoBERT Sentiment classification

In [8]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
rawContent = df['rawContent'].tolist()
pretrained_name = "mdhugol/indonesia-bert-sentiment-classification"
model = AutoModelForSequenceClassification.from_pretrained(pretrained_name)
tokenizer = AutoTokenizer.from_pretrained(pretrained_name)
tokenizer = AutoTokenizer.from_pretrained(pretrained_name)
nlp = pipeline(
    "sentiment-analysis",
    model=model,
    tokenizer=tokenizer,
    device=0 # use GPU
)
label = nlp(rawContent)

label = nlp(rawContent)

#get the label only
label = [i['label'] for i in label]
label



In [9]:
label
#save label to dataframe and save to csv
#df['label'] = label
#df.to_csv('Bri_dec_2022.xlsx', index=False)


[{'label': 'LABEL_1', 'score': 0.9002847075462341},
 {'label': 'LABEL_1', 'score': 0.9852083921432495},
 {'label': 'LABEL_1', 'score': 0.9845927357673645},
 {'label': 'LABEL_1', 'score': 0.6695913076400757},
 {'label': 'LABEL_1', 'score': 0.7107312679290771}]

# EDA

In [71]:
df = pd.read_excel('Bri_dec_2022.xlsx')
df = df[['date', 'replyCount', 'retweetCount', 'likeCount','viewCount','coordinates','hashtags','rawContent','sentiment']]
df

Unnamed: 0,date,replyCount,retweetCount,likeCount,viewCount,coordinates,hashtags,rawContent,sentiment
0,2022-12-31,1,1,0,930.0,,"['UMKM', 'bankbri']",Sejak dulu @BANKBRI_ID selalu konsisten menduk...,positive
1,2022-12-31,0,0,0,800.0,,,@mrsyahri4 @danawallet @BANKBRI_ID Live Chat s...,neutral
2,2022-12-31,3,0,1,1370.0,,,@danawallet @danasupport @BANKBRI_ID tolong in...,neutral
3,2022-12-31,0,0,0,,,,@MissyRo95856010 @BANKBRI_ID Rispek banget pok...,positive
4,2022-12-31,0,0,0,,,,@zulfaayuashary @BANKBRI_ID membantu banget ni...,positive
...,...,...,...,...,...,...,...,...,...
11141,2022-12-01,0,0,0,,,,"@diverxent @Telkomsel kartu, kode CVC, EXP Dat...",neutral
11142,2022-12-01,0,0,0,,,,"@diverxent @Telkomsel Hai Sobat BRI, terima ka...",neutral
11143,2022-12-01,4,0,0,,,,Banyak banget yang gini @Telkomsel @BANKBRI_ID...,neutral
11144,2022-12-01,0,0,0,,,,@Nadyadwiamelia resmi BRI yang terdapat centan...,neutral


In [58]:
#count of sentiment 
fig = px.histogram(df, x="sentiment", title='Sentiment Analysis', width=700, height=500)
fig.show()


In [76]:
#time series of sentiment

df_sentiment = df.groupby(['date','sentiment']).size().reset_index(name='counts')

#time series of sentiment
fig = px.line(df_sentiment, x="date", y="counts", color='sentiment', title='December 2022 Sentiment', width=700, height=500, color_discrete_map={'negative': 'red', 'positive': 'green', 'neutral': 'blue'})

fig.show()


In [90]:
#plot hashtags
df_hashtags = df['hashtags'].value_counts().reset_index()
df_hashtags.columns = ['hashtags', 'counts']
df_hashtags = df_hashtags.sort_values(['counts'], ascending=False)
df_hashtags = df_hashtags.head(20)
df_hashtags

fig = px.bar(df_hashtags, x="counts", y="hashtags", orientation='h', title='Hashtags', color_discrete_sequence=['#D63230'])

fig.update_layout(
    width=1500,
    height=600
)

fig.show()



In [111]:
#coordinates
df_coordinates = df['coordinates'].value_counts().reset_index()
df_coordinates.columns = ['coordinates', 'counts']
df_coordinates = df_coordinates.sort_values(['counts'], ascending=False)
df_coordinates = df_coordinates.head(20)
df_coordinates


Unnamed: 0,coordinates,counts
0,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 106.857762, 'latitude': -6.218042}",4
1,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 134.505904, 'latitude': -9.1457534}",4
2,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 108.214309, 'latitude': -7.369047}",3
8,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 119.4320284, 'latitude': -5.1968268}",2
12,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 106.655517, 'latitude': -6.404858}",2
11,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 111.0577942, 'latitude': -7.402658}",2
10,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 110.306834, 'latitude': -7.728047}",2
9,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 108.15522, 'latitude': 3.6549446}",2
13,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 106.742423, 'latitude': -6.607452}",2
7,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 110.38546996894488, 'latitude': -7.755167032163128}",2


In [112]:
#get value from key longitude and latitude using regex
df_coordinates['longitude'] = df_coordinates['coordinates'].str.extract(r'(\d+\.\d+)')
df_coordinates['latitude'] = df_coordinates['coordinates'].str.extract(r'(\d+\.\d+)')
df_coordinates


Unnamed: 0,coordinates,counts,longitude,latitude
0,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 106.857762, 'latitude': -6.218042}",4,106.857762,106.857762
1,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 134.505904, 'latitude': -9.1457534}",4,134.505904,134.505904
2,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 108.214309, 'latitude': -7.369047}",3,108.214309,108.214309
8,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 119.4320284, 'latitude': -5.1968268}",2,119.4320284,119.4320284
12,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 106.655517, 'latitude': -6.404858}",2,106.655517,106.655517
11,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 111.0577942, 'latitude': -7.402658}",2,111.0577942,111.0577942
10,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 110.306834, 'latitude': -7.728047}",2,110.306834,110.306834
9,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 108.15522, 'latitude': 3.6549446}",2,108.15522,108.15522
13,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 106.742423, 'latitude': -6.607452}",2,106.742423,106.742423
7,"{'_type': 'snscrape.modules.twitter.Coordinates', 'longitude': 110.38546996894488, 'latitude': -7.755167032163128}",2,110.38546996894488,110.38546996894488


In [123]:
#plot viewCount
df_viewCount = df[['date', 'viewCount']]
df_viewCount = df_viewCount.groupby(['date']).sum().reset_index()
df_viewCount
fig = px.line(df_viewCount, x="date", y="viewCount", title='View Count', width=700, height=500)
fig



In [124]:
#plot tweet count
df_tweetCount = df[['date', 'rawContent']]
df_tweetCount = df_tweetCount.groupby(['date']).count().reset_index()
df_tweetCount
fig = px.line(df_tweetCount, x="date", y="rawContent", title='Tweet Count', width=700, height=500)
fig


# Topic Modelling


In [77]:
df = df[df['sentiment'] == 'negative']
df


Unnamed: 0,date,replyCount,retweetCount,likeCount,viewCount,coordinates,hashtags,rawContent,sentiment
52,2022-12-31,3,0,0,590.0,,,Brimo lagi maintenance kah ka? Mau tf gak bisa...,negative
94,2022-12-30,7,0,0,1600.0,,,@BANKBRI_ID Kemarin udah takut pake brimo lagi...,negative
108,2022-12-30,0,0,0,80.0,,,@BANKBRI_ID Baca DM lagi,negative
113,2022-12-30,8,0,0,5020.0,,,@danawallet @BANKBRI_ID minn ini kenapa saldo ...,negative
123,2022-12-30,0,0,0,70.0,,,@xygrth @BANKBRI_ID kebantu banget kalo lupa b...,negative
...,...,...,...,...,...,...,...,...,...
11077,2022-12-01,4,0,0,,,,Dua kali dikontak @PrivyID. Tengah malam menga...,negative
11097,2022-12-01,5,0,1,,,,@BANKBRI_ID @Anggi_Natuna Kenapa gagal terus m...,negative
11101,2022-12-01,13,0,3,,,,Kenapa saya kena limit? Padahal bulan ini saya...,negative
11126,2022-12-01,119,1,39,,,['ValidNoDebat'],lagi ribet kerjaan awal bulan malah dirusuhin ...,negative


In [82]:
import pandas as pd
from gensim import corpora, models
#use tweet-preprocessor to clean the tweet
import preprocessor as p
from nltk.corpus import stopwords

# download the stopwords if not already downloaded
import nltk
nltk.download('stopwords')

# read the stopwords file
with open('id.stopwords.02.01.2016.txt', 'r') as f:
    stop_words = set(f.read().split())

# convert the content column to string data type
df['rawContent'] = df['rawContent'].astype(str)

def clean_tweet(row):
    text = row['rawContent']
    text = p.clean(text)
    # remove the stopwords
    text = ' '.join([word for word in text.split() if word.lower() not in stop_words])
    #lowercase
    text = text.lower()
    #remove non-alphanumeric characters
    text = re.sub(r'[^a-zA-Z0-9]', ' ', text)
    #remove one character words
    text = re.sub(r'\b\w\b', '', text)
    #add more stopwords
    text = ' '.join([word for word in text.split() if word.lower() not in ['bri', 'bank', 'indonesia', 'resmi','min','silakan','mohon','semoga','pakai','sy','']])
    #more indonesian stopwords
    text = ' '.join([word for word in text.split() if word.lower() not in ['yg','nya','yg','dan','nih','dengan','emang','banget','ga','sih','si','kalo','klo','kali','aja','media','sosial','tks','hai','gak','ya','sobat','pake','udah','kak']])
    
    
    return text

df['clean_tweet'] = df.apply(clean_tweet, axis=1)
df

# create a dictionary and corpus
texts = [[word for word in document.lower().split() if word not in stop_words] for document in df['clean_tweet']]
dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# train the LDA model
lda_model = models.LdaModel(corpus=corpus, id2word=dictionary, num_topics=3, passes=5)

# analyze the results
for idx, topic in lda_model.print_topics(-1):
    print('Topic: {} \nWords: {}'.format(idx, topic))

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\fajar\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Topic: 0 
Words: 0.022*"brimo" + 0.017*"atm" + 0.014*"kartu" + 0.010*"nasabah" + 0.008*"uang" + 0.008*"orang" + 0.005*"cs" + 0.005*"nomor" + 0.005*"bikin" + 0.005*"gagal"
Topic: 1 
Words: 0.011*"hati" + 0.006*"penipuan" + 0.006*"sampe" + 0.005*"kartu" + 0.005*"akun" + 0.004*"tunai" + 0.004*"iya" + 0.004*"modus" + 0.004*"kredit" + 0.004*"langsung"
Topic: 2 
Words: 0.011*"kartu" + 0.011*"atm" + 0.010*"brimo" + 0.010*"uang" + 0.008*"gimana" + 0.007*"saldo" + 0.007*"transaksi" + 0.007*"rekening" + 0.006*"cs" + 0.006*"masuk"


In [79]:
df['clean_tweet']

52                             brimo maintenance kah ka tf
94       kemarin takut brimo lagi tp temen kepaksa tf r...
108                                                baca dm
113      minn saldo danaku jugaa kepake padahal pembaya...
123                                kebantu lupa bawa kartu
                               ...                        
11077    dikontak malam mengakui kesalahan mendelete ak...
11097                               gagal bikin akun brimo
11101    kena limit transaksi pun dm slow respon kena l...
11126    ribet kerjaan dirusuhin adek suruh bayar inter...
11130    lambat menangani aduan nasabah nasabah unit ra...
Name: clean_tweet, Length: 1229, dtype: object

In [80]:
clean_tweet

<function __main__.clean_tweet(row)>

In [83]:
import plotly.graph_objs as go
from plotly.subplots import make_subplots

# visualize the topics
fig = make_subplots(rows=2, cols=2, subplot_titles=("Topic 0", "Topic 1", "Topic 2"))

for i in range(3):
    topic = lda_model.show_topic(i)
    x = [word[0] for word in topic]
    y = [word[1] for word in topic]
    fig.add_trace(go.Bar(x=y, y=x, orientation='h'), row=(i//2)+1, col=(i%2)+1)

fig.update_layout(height=800, width=800, showlegend=False)
fig.show()


**