## **Social Network Analysis - Storytelling Similarities**

**Background:**

This notebook aim to perform the next step of our project. We will plot a similarity network according to the storytelling of each company. To do that, we will use the Tweet posts extracted and processed during the Sentiment Analysis.

**Dataset:**

We will start from the Market Similarities Dataset and then add Twitter Data for each Brand according to their Twitter Account. The Twitter Data contains the text of the tweets as well as additional features such as the location, the number of likes, etc.

**Resources:**

*   Pandas documentation: https://pandas.pydata.org/docs/#
*   DataCamp course: https://app.datacamp.com/learn/courses/analyzing-social-media-data-in-python
*   Data Science for Business Applications course - Copenhagen Business School
*   YouTube: https://www.youtube.com/watch?v=ujId4ipkBio&t=512s
*   Medium: https://towardsdatascience.com/nlp-part-3-exploratory-data-analysis-of-text-data-1caa8ab3f79d


## **Introduction: Libraries and Credentials** 

In [None]:
# Import all needed libraries
import tweepy                   # Python wrapper around Twitter API
from google.colab import drive  # to mount Drive to Colab notebook

import pandas as pd
pd.set_option('display.max_colwidth', None) #to see more text
import json 
import csv
from datetime import date
from datetime import datetime
import time
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import seaborn as sns
sns.set()

from textblob import TextBlob
from wordcloud import WordCloud
import string
import itertools
from collections import Counter
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [None]:
# Connect Google Drive to Colab
drive.mount('/content/gdrive')
# Create a variable to store the data path on your drive
path = './gdrive/My Drive/path/to/data'

Mounted at /content/gdrive


In [None]:
# Twitter API credentials

api_key = 'vpBT5zCco120S5VkWGeq0jbU3'
api_secret_key = 'nIk59NQikAS9kug5GcOkWuEqOqbAr1UgRhn5H77T9OiCdeEo7m'
access_token = '1362326919157583873-3NVcDUiiEBUsZGe06A7HzCTAs1tEDb'
access_token_secret = 'Y0Wrzi23jYd0TiAe9tNqQHax3PFqBrPRg2N2KQCl0rVtJ'

In [None]:
# Connect to Twitter API using the secrets
auth = tweepy.OAuthHandler(api_key, api_secret_key)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

In [None]:
# Helper function to save data into a JSON file
# file_name: the file name of the data on Google Drive
# file_content: the data you want to save
def save_json(file_name, file_content):
  with open(path + file_name, 'w', encoding='utf-8') as f:
    json.dump(file_content, f, ensure_ascii=False, indent=4)

In [None]:
# Helper function to handle twitter API rate limit
def limit_handled(cursor, list_name):
  while True:
    try:
      yield cursor.next()
    # Catch Twitter API rate limit exception and wait for 15 minutes
    except tweepy.RateLimitError:
      print("\nData points in list = {}".format(len(list_name)))
      print('Hit Twitter API rate limit.')
      for i in range(3, 0, -1):
        print("Wait for {} mins.".format(i * 5))
        time.sleep(5 * 60)
    # Catch any other Twitter API exceptions
    except tweepy.error.TweepError:
      print('\nCaught TweepError exception' )

## **Get the account tweets**

First, we will import the previous dataset from the Market Similarities Analysis.

In [None]:
df = pd.read_csv('/content/gdrive/MyDrive/Final Project/Network Analysis/Market Similarities.csv')
df = df.iloc[:,[1,2,3,13,14]]
df.head()

Unnamed: 0,Brand,Brand Owner,Twitter,Partition,Centrality
0,Cartier,Richemont SA,Cartier,0,0.538462
1,Van Cleef & Arpels,Richemont SA,vancleefarpels,1,0.615385
2,Burberry,Richemont SA,Burberry,1,0.487179
3,Baume & Mercier,Richemont SA,baumeetmercier,1,0.487179
4,IWC,Richemont SA,IWC,1,0.487179


Here is the list we will send to the API function to get all the tweets posts.

In [None]:
Brands = list(df['Twitter'])
Brands

['Cartier',
 'vancleefarpels',
 'Burberry',
 'baumeetmercier',
 'IWC',
 'jaegerlecoultre',
 'Piaget',
 'RalphLauren',
 'Roger_Dubuis',
 'Vacheron1755',
 'montblanc_world',
 'alfreddunhill',
 'chloefashion',
 'petermillar',
 'Gucci',
 'YSL',
 'McQueen',
 'BALENCIAGA',
 'ulysse_nardin',
 'Boucheron',
 'GirardPerregaux',
 'jrwatches',
 'sergiorossi',
 'LouisVuitton',
 'Dior',
 'LoeweOfficial',
 'kenzo',
 'givenchy',
 'marcjacobs',
 'Fendi',
 'EmilioPucci',
 'NKirkwoodLondon',
 'Guerlain',
 'BenefitBeauty',
 'Makeupforever',
 'TAGHeuer',
 'ZenithWatches',
 'Hublot',
 'Chaumet',
 'Bulgariofficial']

**The program below will retrieve the maximum amount of tweet posts for each brand thanks to Twitter API**

In [None]:
data = []

for Brand in Brands:
  # initialize a list to hold all the Tweets
  alltweets = []
  # make initial request for most recent tweets 
  # (200 is the maximum allowed count)
  new_tweets = api.user_timeline(Brand,count=200)
  # save most recent tweets
  alltweets.extend(new_tweets)
  # save the id of the oldest tweet less one to avoid duplication
  oldest = alltweets[-1].id - 1
  # keep grabbing tweets until there are no tweets left
  while len(new_tweets) > 0:
      print("getting tweets before %s" % (oldest))
      # all subsequent requests use the max_id param to prevent
      # duplicates
      new_tweets = api.user_timeline(Brand,count=200,max_id=oldest)
      # save most recent tweets
      alltweets.extend(new_tweets)
      # update the id of the oldest tweet less one
      oldest = alltweets[-1].id - 1
      print("...%s tweets downloaded so far" % (len(alltweets)))
      ### END OF WHILE LOOP ###

  # transform the tweepy tweets into a list 
  # populate the list
  outtweets = [[Brand, tweet.id_str, tweet.created_at, tweet.text, tweet.favorite_count,tweet.in_reply_to_screen_name, tweet.retweeted] for tweet in alltweets]
  data = data + outtweets
  dftweets = pd.DataFrame(data, columns=["brand","id","created_at","text","likes","in reply to","retweeted"])
  print(dftweets["brand"].value_counts())

getting tweets before 1412441240122138627
...400 tweets downloaded so far
getting tweets before 1365576674172166146
...600 tweets downloaded so far
getting tweets before 1275885767064199167
...800 tweets downloaded so far
getting tweets before 1184016092827664383
...1000 tweets downloaded so far
getting tweets before 1092336559565750271
...1200 tweets downloaded so far
getting tweets before 989504953935908863
...1400 tweets downloaded so far
getting tweets before 921400482651832320
...1600 tweets downloaded so far
getting tweets before 865977593853620223
...1800 tweets downloaded so far
getting tweets before 819212291958636543
...2000 tweets downloaded so far
getting tweets before 761977776366882817
...2200 tweets downloaded so far
getting tweets before 703258914784092160
...2400 tweets downloaded so far
getting tweets before 641268058095415295
...2600 tweets downloaded so far
getting tweets before 575282862469087231
...2800 tweets downloaded so far
getting tweets before 49229245189876

**Now let's have a look at our dataset!**

In [None]:
# Check the head of the dftweets
dftweets

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted
0,Cartier,1458479709046648839,2021-11-10 17:00:34,Immerse yourself in the studio of Damien Hirst with this 360° experience of his work on the “Cherry Blossoms” serie… https://t.co/C5R0nD0WOu,36,,False
1,Cartier,1458117376847880198,2021-11-09 17:00:47,"Cartier and Islamic Art: In Search of Modernity’, now open @madparisfr until 20th February 2022, highlights the inf… https://t.co/4SD8PihHC9",56,,False
2,Cartier,1458045858625302537,2021-11-09 12:16:36,"@Joeanglo Thank you for your response. May I kindly advise to contact the client relations center in Israel, so the… https://t.co/aD3OmaATRS",0,Joeanglo,False
3,Cartier,1457978062381932545,2021-11-09 07:47:12,"@tomoid Dear Mr. Thomas, thank you for taking the time to contact us and sincerely regret the disappointment you ha… https://t.co/6EpStIjZdV",0,tomoid,False
4,Cartier,1457766124997783556,2021-11-08 17:45:02,"Un voyage depuis les origines sacrées du #parfum, de l’Egypte Antique à Rome en passant par l’Arabie, jusqu’à ses u… https://t.co/czQr0qaShY",26,Cartier,False
...,...,...,...,...,...,...,...
106370,Bulgariofficial,22829316765,2010-09-02 20:57:41,Check out Jessica Alba wearing #Bulgari at the Uomo Vogue Dinner during the Venice Film Festival: http://ow.ly/i/3zBP,0,,False
106371,Bulgariofficial,22732640878,2010-09-01 19:00:18,@hautelivingmag loves the #Bulgari Skincare spa experience in Dallas: http://ow.ly/2xzd9,0,HauteLivingMag,False
106372,Bulgariofficial,22641123692,2010-08-31 19:00:21,@rzrachelzoe and @MRBRADGORESKI stay at the #Bulgari Hotel while taking on Milan Fashion Week. Check it out tonight at 10PM on Bravo!,0,RachelZoe,False
106373,Bulgariofficial,22523955102,2010-08-30 13:28:47,Gugu Mbatha-Raw dazzles at the Emmys in a #Bulgari platinum/diamond bracelet and pink quartz/diamond white gold earrings http://ow.ly/i/3vrR,0,,False




> More than 100000 rows of Tweets to analyze!



In [None]:
dftweets['brand'].value_counts()

Gucci              3250
IWC                3250
montblanc_world    3250
Dior               3250
McQueen            3250
TAGHeuer           3250
LouisVuitton       3250
Fendi              3250
Hublot             3249
marcjacobs         3249
Piaget             3248
baumeetmercier     3247
Guerlain           3245
LoeweOfficial      3244
jaegerlecoultre    3243
petermillar        3242
BenefitBeauty      3241
Bulgariofficial    3240
Burberry           3237
Vacheron1755       3226
RalphLauren        3225
vancleefarpels     3222
GirardPerregaux    3207
sergiorossi        3199
Makeupforever      3168
jrwatches          3139
ulysse_nardin      2917
Cartier            2914
Roger_Dubuis       1967
ZenithWatches      1950
Boucheron          1927
NKirkwoodLondon    1904
alfreddunhill      1825
YSL                1792
EmilioPucci        1675
givenchy           1518
Chaumet            1043
kenzo               737
chloefashion        121
BALENCIAGA           14
Name: brand, dtype: int64

Now let's clean our text column:

In [None]:
# Clean the text

# Create a function to clean the tweets

def cleanTxt(text):
  text = re.sub(r'@[A-Za-z0-9_]+', '', text) #Revoming @mentions
  text = re.sub(r'#', '', text) # Removing the '#' simbol
  text = re.sub(r'RT[\s]+', '', text) # Removing RT
  text = re.sub(r'https?:\/\/\S+', '', text) # Removing the hyper link
  text = text.lower() # make text lowercase
  text = re.sub('\[.*?\]', '', text) # removing text within brackets
  text = re.sub('\(.*?\)', '', text) # removing text within parentheses
  text = re.sub('\w*\d\w*', '', text) # removing numbers
  text = re.sub('\s+', ' ', text) # if there's more than 1 whitespace, then make it just 1
  text = re.sub('\n', ' ', text) # if there's a new line, then make it a whitespace
  text = re.sub('\"+', '', text) # removing any quotes
  text = re.sub('(\&amp\;)', '', text) # removing &amp;
  text = re.sub('[%s]' % re.escape(string.punctuation), '', text) # Get rid of all punctuation
  text = re.sub('(httptco)', '', text) # getting rid of `httptco`
  text = re.sub(r'[^\w\s]', '',text) # remove other punctuation

  return text

# Cleaning the text
dftweets['text'] = dftweets['text'].apply(cleanTxt)

# Show the cleaned text
dftweets.head()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted
0,Cartier,1458479709046648839,2021-11-10 17:00:34,immerse yourself in the studio of damien hirst with this experience of his work on the cherry blossoms serie,36,,False
1,Cartier,1458117376847880198,2021-11-09 17:00:47,cartier and islamic art in search of modernity now open until february highlights the inf,56,,False
2,Cartier,1458045858625302537,2021-11-09 12:16:36,thank you for your response may i kindly advise to contact the client relations center in israel so the,0,Joeanglo,False
3,Cartier,1457978062381932545,2021-11-09 07:47:12,dear mr thomas thank you for taking the time to contact us and sincerely regret the disappointment you ha,0,tomoid,False
4,Cartier,1457766124997783556,2021-11-08 17:45:02,un voyage depuis les origines sacrées du parfum de legypte antique à rome en passant par larabie jusquà ses u,26,Cartier,False


Time to get the subjectivity and polarity and of tweet:

## **Sentiment Analysis**

### **Number of words and text length**

In [None]:
dftweets['text_len'] = dftweets['text'].apply(lambda x: len(str(x).split()))
dftweets

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,Subjectivity,Polarity,text_len
0,Cartier,1458479709046648839,2021-11-10 17:00:34,immerse yourself in the studio of damien hirst with this experience of his work on the cherry blossoms serie,36,,False,0.000000,0.00,19
1,Cartier,1458117376847880198,2021-11-09 17:00:47,cartier and islamic art in search of modernity now open until february highlights the inf,56,,False,0.500000,0.00,15
2,Cartier,1458045858625302537,2021-11-09 12:16:36,thank you for your response may i kindly advise to contact the client relations center in israel so the,0,Joeanglo,False,0.500000,0.25,19
3,Cartier,1457978062381932545,2021-11-09 07:47:12,dear mr thomas thank you for taking the time to contact us and sincerely regret the disappointment you ha,0,tomoid,False,0.450000,-0.05,19
4,Cartier,1457766124997783556,2021-11-08 17:45:02,un voyage depuis les origines sacrées du parfum de legypte antique à rome en passant par larabie jusquà ses u,26,Cartier,False,0.000000,0.00,20
...,...,...,...,...,...,...,...,...,...,...
106370,Bulgariofficial,22829316765,2010-09-02 20:57:41,check out jessica alba wearing bulgari at the uomo vogue dinner during the venice film festival,0,,False,0.000000,0.00,16
106371,Bulgariofficial,22732640878,2010-09-01 19:00:18,loves the bulgari skincare spa experience in dallas,0,HauteLivingMag,False,0.000000,0.00,8
106372,Bulgariofficial,22641123692,2010-08-31 19:00:21,and stay at the bulgari hotel while taking on milan fashion week check it out tonight at on bravo,0,RachelZoe,False,0.000000,0.00,19
106373,Bulgariofficial,22523955102,2010-08-30 13:28:47,gugu mbatharaw dazzles at the emmys in a bulgari platinumdiamond bracelet and pink quartzdiamond white gold earrings,0,,False,0.150000,-0.05,17


In [None]:
# Set a number of words
dftweets.rename(columns={"text_len": "nb_words"}, inplace=True)
dftweets.head()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,Subjectivity,Polarity,nb_words
0,Cartier,1458479709046648839,2021-11-10 17:00:34,immerse yourself in the studio of damien hirst with this experience of his work on the cherry blossoms serie,36,,False,0.0,0.0,19
1,Cartier,1458117376847880198,2021-11-09 17:00:47,cartier and islamic art in search of modernity now open until february highlights the inf,56,,False,0.5,0.0,15
2,Cartier,1458045858625302537,2021-11-09 12:16:36,thank you for your response may i kindly advise to contact the client relations center in israel so the,0,Joeanglo,False,0.5,0.25,19
3,Cartier,1457978062381932545,2021-11-09 07:47:12,dear mr thomas thank you for taking the time to contact us and sincerely regret the disappointment you ha,0,tomoid,False,0.45,-0.05,19
4,Cartier,1457766124997783556,2021-11-08 17:45:02,un voyage depuis les origines sacrées du parfum de legypte antique à rome en passant par larabie jusquà ses u,26,Cartier,False,0.0,0.0,20


In [None]:
# Set Text len
dftweets['text_len'] = dftweets['text'].astype(str).apply(len)
dftweets.head()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,Subjectivity,Polarity,nb_words,text_len
0,Cartier,1458479709046648839,2021-11-10 17:00:34,immerse yourself in the studio of damien hirst with this experience of his work on the cherry blossoms serie,36,,False,0.0,0.0,19,110
1,Cartier,1458117376847880198,2021-11-09 17:00:47,cartier and islamic art in search of modernity now open until february highlights the inf,56,,False,0.5,0.0,15,91
2,Cartier,1458045858625302537,2021-11-09 12:16:36,thank you for your response may i kindly advise to contact the client relations center in israel so the,0,Joeanglo,False,0.5,0.25,19,105
3,Cartier,1457978062381932545,2021-11-09 07:47:12,dear mr thomas thank you for taking the time to contact us and sincerely regret the disappointment you ha,0,tomoid,False,0.45,-0.05,19,107
4,Cartier,1457766124997783556,2021-11-08 17:45:02,un voyage depuis les origines sacrées du parfum de legypte antique à rome en passant par larabie jusquà ses u,26,Cartier,False,0.0,0.0,20,110


### **Subjectivity and Polarity with TextBlob**

In [None]:
# Create a function to get the subjectivity
def getSubjectivity(text):
  return TextBlob(text).sentiment.subjectivity

# Create a function to get the polarity
def getPolarity(text):
  return TextBlob(text).sentiment.polarity

# Create two new columns
dftweets['Subjectivity'] = dftweets['text'].apply(getSubjectivity)
dftweets['Polarity'] = dftweets['text'].apply(getPolarity)

# Show the new dataframe with the new columns
dftweets.head()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,Subjectivity,Polarity
0,Cartier,1458479709046648839,2021-11-10 17:00:34,immerse yourself in the studio of damien hirst with this experience of his work on the cherry blossoms serie,36,,False,0.0,0.0
1,Cartier,1458117376847880198,2021-11-09 17:00:47,cartier and islamic art in search of modernity now open until february highlights the inf,56,,False,0.5,0.0
2,Cartier,1458045858625302537,2021-11-09 12:16:36,thank you for your response may i kindly advise to contact the client relations center in israel so the,0,Joeanglo,False,0.5,0.25
3,Cartier,1457978062381932545,2021-11-09 07:47:12,dear mr thomas thank you for taking the time to contact us and sincerely regret the disappointment you ha,0,tomoid,False,0.45,-0.05
4,Cartier,1457766124997783556,2021-11-08 17:45:02,un voyage depuis les origines sacrées du parfum de legypte antique à rome en passant par larabie jusquà ses u,26,Cartier,False,0.0,0.0


In [None]:
dftweets.head()

Unnamed: 0.1,Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,Subjectivity,Polarity,nb_words,text_len
0,0,Cartier,1458479709046648839,2021-11-10 17:00:34,immerse yourself in the studio of damien hirst with this experience of his work on the cherry blossoms serie,36,,False,0.0,0.0,19,110
1,1,Cartier,1458117376847880198,2021-11-09 17:00:47,cartier and islamic art in search of modernity now open until february highlights the inf,56,,False,0.5,0.0,15,91
2,2,Cartier,1458045858625302537,2021-11-09 12:16:36,thank you for your response may i kindly advise to contact the client relations center in israel so the,0,Joeanglo,False,0.5,0.25,19,105
3,3,Cartier,1457978062381932545,2021-11-09 07:47:12,dear mr thomas thank you for taking the time to contact us and sincerely regret the disappointment you ha,0,tomoid,False,0.45,-0.05,19,107
4,4,Cartier,1457766124997783556,2021-11-08 17:45:02,un voyage depuis les origines sacrées du parfum de legypte antique à rome en passant par larabie jusquà ses u,26,Cartier,False,0.0,0.0,20,110


### **Polarity and Intensity with VADER**

In [None]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


In [None]:
sentiment = dftweets['text'].apply(lambda x: analyzer.polarity_scores(str(x)))
dftweets = pd.concat([dftweets,sentiment.apply(pd.Series)],1)

In [None]:
dftweets.rename(columns={'Subjectivity':'subjectivity', 'Polarity':'polarity', 'neg':'negative',
                         'neu':'neutral', 'pos':'positive'}, inplace=True)
dftweets.head()

Unnamed: 0.1,Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound
0,0,Cartier,1458479709046648839,2021-11-10 17:00:34,immerse yourself in the studio of damien hirst with this experience of his work on the cherry blossoms serie,36,,False,0.0,0.0,19,110,0.0,1.0,0.0,0.0
1,1,Cartier,1458117376847880198,2021-11-09 17:00:47,cartier and islamic art in search of modernity now open until february highlights the inf,56,,False,0.5,0.0,15,91,0.0,1.0,0.0,0.0
2,2,Cartier,1458045858625302537,2021-11-09 12:16:36,thank you for your response may i kindly advise to contact the client relations center in israel so the,0,Joeanglo,False,0.5,0.25,19,105,0.0,0.737,0.263,0.6908
3,3,Cartier,1457978062381932545,2021-11-09 07:47:12,dear mr thomas thank you for taking the time to contact us and sincerely regret the disappointment you ha,0,tomoid,False,0.45,-0.05,19,107,0.205,0.438,0.357,0.5423
4,4,Cartier,1457766124997783556,2021-11-08 17:45:02,un voyage depuis les origines sacrées du parfum de legypte antique à rome en passant par larabie jusquà ses u,26,Cartier,False,0.0,0.0,20,110,0.0,1.0,0.0,0.0


### **Save CSV File**

In [None]:
dftweets.to_csv('Tweet_Posts.csv')

In [None]:
dftweets = pd.read_csv('/content/gdrive/MyDrive/Final Project/NLP/Tweet_Posts.csv')

## **Set the Social Network**

### **Set Network Dataset**

### **Intall needed libraries**

In [None]:
# import essential libraries

%%capture
import networkx as nx  #for the manipulation of networks 
import numpy as np  #for useful maths functions
import pandas as pd  #for the manipulation of dataframes 
import seaborn as sns  #for visualization
import matplotlib.pyplot as plt  #for visualization
from scipy import sparse  #for high-level functions
import community.community_louvain as community_louvain  #community detection inside networks
from sklearn.metrics.pairwise import cosine_distances  #cosine distance between two variables
sns.set(color_codes=True, rc={'figure.figsize':(10,8)})  #set seaborn
sns.set()

In [None]:
# install datashader

%%capture
!pip install -qq datashader

In [None]:
# import the network visualization libraries and backend

import holoviews as hv
from holoviews import opts
hv.extension('bokeh')
from bokeh.plotting import show
kwargs = dict(width=800, height=800, xaxis=None, yaxis=None)
opts.defaults(opts.Nodes(**kwargs), opts.Graph(**kwargs))
from holoviews.operation.datashader import datashade, bundle_graph

Output hidden; open in https://colab.research.google.com to view.

### **Set the Dataset and Scale values**

In [None]:
# Set the Network Dataset
dftweets_net = dftweets.iloc[:,[1,8,9,10,11,12,13,14,15]]
dftweets_net.head()

Unnamed: 0,brand,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound
0,Cartier,0.0,0.0,19,110,0.0,1.0,0.0,0.0
1,Cartier,0.5,0.0,15,91,0.0,1.0,0.0,0.0
2,Cartier,0.5,0.25,19,105,0.0,0.737,0.263,0.6908
3,Cartier,0.45,-0.05,19,107,0.205,0.438,0.357,0.5423
4,Cartier,0.0,0.0,20,110,0.0,1.0,0.0,0.0


In [None]:
# Get the mean of each variable for all Brands
dftweets_net = dftweets_net.groupby(['brand']).mean()
dftweets_net.reset_index(inplace=True)
dftweets_net.head(5)

Unnamed: 0,brand,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound
0,BALENCIAGA,0.0,0.0,8.785714,68.071429,0.0,1.0,0.0,0.0
1,BenefitBeauty,0.410326,0.232698,8.416538,48.190682,0.026861,0.616965,0.258678,0.310477
2,Boucheron,0.365159,0.179372,13.162429,83.83809,0.013025,0.826264,0.158117,0.281332
3,Bulgariofficial,0.37389,0.203347,13.314198,85.194136,0.00908,0.788926,0.201688,0.371378
4,Burberry,0.276546,0.107133,13.772938,89.389249,0.006502,0.912383,0.081116,0.158115


In [None]:
# Scale values

from sklearn.preprocessing import MinMaxScaler

scl = MinMaxScaler()

data_num = scl.fit_transform(dftweets_net.iloc[:,1:])

### **Reiterate Market Network Methodology**

In [None]:
# Calculate distances into a square matrix
dist = cosine_distances(data_num,data_num)

dist

array([[0.        , 1.        , 0.57379533, ..., 0.77410171, 0.58051965,
        0.55349283],
       [1.        , 0.        , 0.2043349 , ..., 0.0349988 , 0.17694795,
        0.26548958],
       [0.57379533, 0.2043349 , 0.        , ..., 0.1044732 , 0.01399381,
        0.00964329],
       ...,
       [0.77410171, 0.0349988 , 0.1044732 , ..., 0.        , 0.10002923,
        0.14308051],
       [0.58051965, 0.17694795, 0.01399381, ..., 0.10002923, 0.        ,
        0.04477514],
       [0.55349283, 0.26548958, 0.00964329, ..., 0.14308051, 0.04477514,
        0.        ]])

In [None]:
# calculate a cutoff (for a less crowded network)
perc = np.percentile(1-dist, 60)

perc

0.9617611045842915

In [None]:
# create NW
G = nx.from_numpy_array(1-dist)

In [None]:
# add names
attributes_dict=dftweets_net.T.to_dict()

attributes_dict

{0: {'brand': 'BALENCIAGA',
  'compound': 0.0,
  'nb_words': 8.785714285714286,
  'negative': 0.0,
  'neutral': 1.0,
  'polarity': 0.0,
  'positive': 0.0,
  'subjectivity': 0.0,
  'text_len': 68.07142857142857},
 1: {'brand': 'BenefitBeauty',
  'compound': 0.31047741437827736,
  'nb_words': 8.416538105522987,
  'negative': 0.026860536871335973,
  'neutral': 0.6169654427645802,
  'polarity': 0.2326975220461031,
  'positive': 0.2586781857451408,
  'subjectivity': 0.4103261988077637,
  'text_len': 48.19068188830608},
 2: {'brand': 'Boucheron',
  'compound': 0.2813320186818888,
  'nb_words': 13.162428645563052,
  'negative': 0.013025428126621703,
  'neutral': 0.8262641411520519,
  'polarity': 0.17937204734650783,
  'positive': 0.1581172807472756,
  'subjectivity': 0.365158959221436,
  'text_len': 83.83809029579658},
 3: {'brand': 'Bulgariofficial',
  'compound': 0.3713784876543216,
  'nb_words': 13.314197530864197,
  'negative': 0.00907962962962963,
  'neutral': 0.7889256172839482,
  'pola

In [None]:
# Set nodes attributes
nx.set_node_attributes(G, attributes_dict)

In [None]:
print(nx.info(G))
# Get rid of low-weight edges
G_sub = nx.edge_subgraph(G, [(u,v) for u,v,d in G.edges(data=True) if d['weight'] > perc])
print(nx.info(G_sub))

Graph with 40 nodes and 819 edges
Graph with 40 nodes and 340 edges


In [None]:
# Set centrality degree
centrality_dgr = nx.degree_centrality(G_sub)
centrality_dgr = pd.DataFrame({'centrality_dgr':centrality_dgr})
centrality_dgr = centrality_dgr.to_dict('index')
nx.set_node_attributes(G_sub, centrality_dgr)

In [None]:
G_sub.nodes[0]['centrality_dgr']

0.05128205128205128

### **Plot and Analyze the Social Network**

In [None]:
# identify communities (optional)
partition = community_louvain.best_partition(G_sub)
nx.set_node_attributes(G_sub, partition, 'partition')

In [None]:
position = nx.spring_layout(G_sub)
graph = hv.Graph.from_networkx(G_sub, position).opts(
                                                                        tools=['hover'],
                                                                        edge_alpha=0.15,
                                                                        node_size=13,
                                                                        node_color='partition', cmap='Set1',
                                                                        legend_position='right'
                                                                        )

labels = hv.Labels(graph.nodes, ['x', 'y'])

show(hv.render((graph * labels.opts(text_font_size='0pt', text_color='black', xoffset=-0.01, 
                                    yoffset=-0.04, bgcolor='white', padding=0.2))))

In [None]:
# Let's use bundle_graph for a better visual

from holoviews.operation.datashader import datashade, bundle_graph
bundled = bundle_graph(graph)
show(hv.render(bundled))

In [None]:
# Network structure metrics
print(nx.density(G_sub))
print(nx.transitivity(G_sub))

0.4358974358974359
0.7879387938793879


## **Output Dataset with Partition and Centrality**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Get Partition and Centrality values
partition_values = [partition.get(node) for node in G_sub.nodes]

centrality_dgr = nx.degree_centrality(G_sub)
centrality_values = [centrality_dgr.get(node) for node in G_sub.nodes]

In [None]:
# Put Partition and Centrality values into the DataFrame
dftweets_net['Partition'] = partition_values
dftweets_net['Centrality'] = centrality_values

# Inspect
dftweets_net.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40 entries, 0 to 39
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   brand         40 non-null     object 
 1   subjectivity  40 non-null     float64
 2   polarity      40 non-null     float64
 3   nb_words      40 non-null     float64
 4   text_len      40 non-null     float64
 5   negative      40 non-null     float64
 6   neutral       40 non-null     float64
 7   positive      40 non-null     float64
 8   compound      40 non-null     float64
 9   Partition     40 non-null     int64  
 10  Centrality    40 non-null     float64
dtypes: float64(9), int64(1), object(1)
memory usage: 3.6+ KB


In [None]:
dftweets_net

Unnamed: 0,brand,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound,Partition,Centrality
0,BALENCIAGA,0.0,0.0,8.785714,68.071429,0.0,1.0,0.0,0.0,0,0.051282
1,BenefitBeauty,0.410326,0.232698,8.416538,48.190682,0.026861,0.616965,0.258678,0.310477,3,0.076923
2,Boucheron,0.365159,0.179372,13.162429,83.83809,0.013025,0.826264,0.158117,0.281332,2,0.692308
3,Bulgariofficial,0.37389,0.203347,13.314198,85.194136,0.00908,0.788926,0.201688,0.371378,3,0.615385
4,Burberry,0.276546,0.107133,13.772938,89.389249,0.006502,0.912383,0.081116,0.158115,4,0.282051
5,Cartier,0.401336,0.212731,14.520247,87.800961,0.024072,0.780746,0.195185,0.364745,3,0.615385
6,Chaumet,0.401652,0.211375,15.055609,94.917546,0.005395,0.830103,0.163543,0.350687,2,0.589744
7,Dior,0.386153,0.16592,16.273231,100.400923,0.007811,0.878862,0.113327,0.255047,4,0.435897
8,EmilioPucci,0.271018,0.148328,11.801194,73.260299,0.009131,0.867853,0.122426,0.217212,2,0.538462
9,Fendi,0.448269,0.20065,14.635077,94.534462,0.009254,0.843227,0.147521,0.300425,2,0.615385


In [None]:
dftweets_net.groupby('Partition').mean()

Unnamed: 0_level_0,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound,Centrality
Partition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,0.0,0.0,8.785714,68.071429,0.0,1.0,0.0,0.0,0.051282
1,0.220635,0.033463,15.78,102.542154,0.029714,0.922061,0.047606,0.041902,0.051282
2,0.360607,0.185352,13.209806,84.268049,0.010861,0.835212,0.152521,0.27695,0.597756
3,0.413201,0.215567,12.801953,77.055008,0.018193,0.744998,0.221479,0.359604,0.382051
4,0.284709,0.128334,14.166863,90.043504,0.009818,0.904502,0.083488,0.161358,0.32906


In [None]:
dftweets_net.sort_values('Centrality', ascending=False).head(10)

Unnamed: 0,brand,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound,Partition,Centrality
21,RalphLauren,0.315262,0.162572,14.644651,91.149767,0.011778,0.855247,0.132668,0.256128,4,0.717949
2,Boucheron,0.365159,0.179372,13.162429,83.83809,0.013025,0.826264,0.158117,0.281332,2,0.692308
27,alfreddunhill,0.355883,0.185807,12.81589,77.623562,0.014635,0.846149,0.138667,0.239435,2,0.666667
13,Hublot,0.389002,0.21051,13.124654,87.212989,0.009817,0.827636,0.161932,0.290553,2,0.666667
20,Piaget,0.436399,0.22473,14.1367,90.006466,0.006919,0.802239,0.188685,0.365919,3,0.641026
14,IWC,0.356602,0.216171,14.195692,90.34,0.01128,0.835186,0.143994,0.277317,2,0.641026
26,ZenithWatches,0.386291,0.207516,14.952308,90.910769,0.012081,0.816235,0.169634,0.295175,2,0.641026
22,Roger_Dubuis,0.395678,0.214028,13.289273,83.902389,0.016767,0.824568,0.157142,0.273001,2,0.641026
3,Bulgariofficial,0.37389,0.203347,13.314198,85.194136,0.00908,0.788926,0.201688,0.371378,3,0.615385
5,Cartier,0.401336,0.212731,14.520247,87.800961,0.024072,0.780746,0.195185,0.364745,3,0.615385


In [None]:
dftweets_net.sort_values('compound', ascending=False).head(10)

Unnamed: 0,brand,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound,Partition,Centrality
28,baumeetmercier,0.430847,0.243027,14.580536,91.480443,0.00789,0.781039,0.211071,0.421729,3,0.564103
23,TAGHeuer,0.461227,0.210255,15.181231,87.274462,0.035754,0.71058,0.252138,0.38457,3,0.128205
36,petermillar,0.401161,0.236951,12.291178,70.766811,0.015945,0.73112,0.23813,0.382898,3,0.25641
3,Bulgariofficial,0.37389,0.203347,13.314198,85.194136,0.00908,0.788926,0.201688,0.371378,3,0.615385
35,montblanc_world,0.450733,0.180192,14.572923,82.179385,0.023419,0.746284,0.228449,0.369975,3,0.384615
20,Piaget,0.436399,0.22473,14.1367,90.006466,0.006919,0.802239,0.188685,0.365919,3,0.641026
5,Cartier,0.401336,0.212731,14.520247,87.800961,0.024072,0.780746,0.195185,0.364745,3,0.615385
6,Chaumet,0.401652,0.211375,15.055609,94.917546,0.005395,0.830103,0.163543,0.350687,2,0.589744
17,Makeupforever,0.437421,0.234701,11.875316,70.996212,0.016478,0.75138,0.217934,0.343982,3,0.410256
12,Guerlain,0.298024,0.142528,14.602773,91.742681,0.008252,0.830799,0.160335,0.321182,2,0.615385


In [None]:
dftweets_net.sort_values('polarity', ascending=False).head(10)

Unnamed: 0,brand,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound,Partition,Centrality
28,baumeetmercier,0.430847,0.243027,14.580536,91.480443,0.00789,0.781039,0.211071,0.421729,3,0.564103
36,petermillar,0.401161,0.236951,12.291178,70.766811,0.015945,0.73112,0.23813,0.382898,3,0.25641
17,Makeupforever,0.437421,0.234701,11.875316,70.996212,0.016478,0.75138,0.217934,0.343982,3,0.410256
1,BenefitBeauty,0.410326,0.232698,8.416538,48.190682,0.026861,0.616965,0.258678,0.310477,3,0.076923
20,Piaget,0.436399,0.22473,14.1367,90.006466,0.006919,0.802239,0.188685,0.365919,3,0.641026
14,IWC,0.356602,0.216171,14.195692,90.34,0.01128,0.835186,0.143994,0.277317,2,0.641026
22,Roger_Dubuis,0.395678,0.214028,13.289273,83.902389,0.016767,0.824568,0.157142,0.273001,2,0.641026
5,Cartier,0.401336,0.212731,14.520247,87.800961,0.024072,0.780746,0.195185,0.364745,3,0.615385
6,Chaumet,0.401652,0.211375,15.055609,94.917546,0.005395,0.830103,0.163543,0.350687,2,0.589744
13,Hublot,0.389002,0.21051,13.124654,87.212989,0.009817,0.827636,0.161932,0.290553,2,0.666667


In [None]:
dftweets_net.sort_values('subjectivity', ascending=False).head(10)

Unnamed: 0,brand,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound,Partition,Centrality
23,TAGHeuer,0.461227,0.210255,15.181231,87.274462,0.035754,0.71058,0.252138,0.38457,3,0.128205
35,montblanc_world,0.450733,0.180192,14.572923,82.179385,0.023419,0.746284,0.228449,0.369975,3,0.384615
9,Fendi,0.448269,0.20065,14.635077,94.534462,0.009254,0.843227,0.147521,0.300425,2,0.615385
17,Makeupforever,0.437421,0.234701,11.875316,70.996212,0.016478,0.75138,0.217934,0.343982,3,0.410256
20,Piaget,0.436399,0.22473,14.1367,90.006466,0.006919,0.802239,0.188685,0.365919,3,0.641026
28,baumeetmercier,0.430847,0.243027,14.580536,91.480443,0.00789,0.781039,0.211071,0.421729,3,0.564103
1,BenefitBeauty,0.410326,0.232698,8.416538,48.190682,0.026861,0.616965,0.258678,0.310477,3,0.076923
16,LouisVuitton,0.403023,0.189163,15.308,98.551077,0.006528,0.885669,0.1078,0.226676,4,0.538462
6,Chaumet,0.401652,0.211375,15.055609,94.917546,0.005395,0.830103,0.163543,0.350687,2,0.589744
5,Cartier,0.401336,0.212731,14.520247,87.800961,0.024072,0.780746,0.195185,0.364745,3,0.615385


In [None]:
dftweets_net.sort_values('nb_words', ascending=False).head(10)

Unnamed: 0,brand,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound,Partition,Centrality
7,Dior,0.386153,0.16592,16.273231,100.400923,0.007811,0.878862,0.113327,0.255047,4,0.435897
18,McQueen,0.220635,0.033463,15.78,102.542154,0.029714,0.922061,0.047606,0.041902,1,0.051282
11,Gucci,0.2994,0.118266,15.68,99.945846,0.011701,0.895176,0.093122,0.185475,4,0.282051
33,kenzo,0.348126,0.161114,15.609227,94.058345,0.014349,0.874418,0.105806,0.21527,4,0.461538
29,chloefashion,0.224872,0.067836,15.520661,98.132231,0.011165,0.940289,0.048554,0.075893,4,0.128205
16,LouisVuitton,0.403023,0.189163,15.308,98.551077,0.006528,0.885669,0.1078,0.226676,4,0.538462
23,TAGHeuer,0.461227,0.210255,15.181231,87.274462,0.035754,0.71058,0.252138,0.38457,3,0.128205
6,Chaumet,0.401652,0.211375,15.055609,94.917546,0.005395,0.830103,0.163543,0.350687,2,0.589744
26,ZenithWatches,0.386291,0.207516,14.952308,90.910769,0.012081,0.816235,0.169634,0.295175,2,0.641026
21,RalphLauren,0.315262,0.162572,14.644651,91.149767,0.011778,0.855247,0.132668,0.256128,4,0.717949


In [None]:
dftweets_net.sort_values('text_len', ascending=False).head(10)

Unnamed: 0,brand,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound,Partition,Centrality
18,McQueen,0.220635,0.033463,15.78,102.542154,0.029714,0.922061,0.047606,0.041902,1,0.051282
7,Dior,0.386153,0.16592,16.273231,100.400923,0.007811,0.878862,0.113327,0.255047,4,0.435897
11,Gucci,0.2994,0.118266,15.68,99.945846,0.011701,0.895176,0.093122,0.185475,4,0.282051
16,LouisVuitton,0.403023,0.189163,15.308,98.551077,0.006528,0.885669,0.1078,0.226676,4,0.538462
29,chloefashion,0.224872,0.067836,15.520661,98.132231,0.011165,0.940289,0.048554,0.075893,4,0.128205
6,Chaumet,0.401652,0.211375,15.055609,94.917546,0.005395,0.830103,0.163543,0.350687,2,0.589744
9,Fendi,0.448269,0.20065,14.635077,94.534462,0.009254,0.843227,0.147521,0.300425,2,0.615385
33,kenzo,0.348126,0.161114,15.609227,94.058345,0.014349,0.874418,0.105806,0.21527,4,0.461538
12,Guerlain,0.298024,0.142528,14.602773,91.742681,0.008252,0.830799,0.160335,0.321182,2,0.615385
28,baumeetmercier,0.430847,0.243027,14.580536,91.480443,0.00789,0.781039,0.211071,0.421729,3,0.564103


In [None]:
dftweets_net.groupby('parition').mean()

NameError: ignored

In [None]:
# Export the Dataset
dftweets_net.to_csv('NLP Similarities.csv')

In [None]:
ana = pd.read_csv('/content/drive/MyDrive/Final Project/3. Storytelling Network Analysis/NLP Similarities.csv')
ana.head()

Unnamed: 0.1,Unnamed: 0,brand,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound,Partition,Centrality
0,0,BALENCIAGA,0.0,0.0,8.785714,68.071429,0.0,1.0,0.0,0.0,0,0.051282
1,1,BenefitBeauty,0.410326,0.232698,8.416538,48.190682,0.026861,0.616965,0.258678,0.310477,3,0.076923
2,2,Boucheron,0.365159,0.179372,13.162429,83.83809,0.013025,0.826264,0.158117,0.281332,2,0.692308
3,3,Bulgariofficial,0.37389,0.203347,13.314198,85.194136,0.00908,0.788926,0.201688,0.371378,3,0.615385
4,4,Burberry,0.276546,0.107133,13.772938,89.389249,0.006502,0.912383,0.081116,0.158115,4,0.282051


In [None]:
ana.groupby(['Partition']).mean()

Unnamed: 0_level_0,Unnamed: 0,subjectivity,polarity,nb_words,text_len,negative,neutral,positive,compound,Centrality
Partition,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,0.0,0.0,0.0,8.785714,68.071429,0.0,1.0,0.0,0.0,0.051282
1,18.0,0.220635,0.033463,15.78,102.542154,0.029714,0.922061,0.047606,0.041902,0.051282
2,19.5625,0.360607,0.185352,13.209806,84.268049,0.010861,0.835212,0.152521,0.27695,0.597756
3,20.5,0.413201,0.215567,12.801953,77.055008,0.018193,0.744998,0.221479,0.359604,0.382051
4,20.333333,0.284709,0.128334,14.166863,90.043504,0.009818,0.904502,0.083488,0.161358,0.32906
