## **Analyze Voice of Foundation**

In this notebook we retrieve and perform sentiment analysis on **L'Occitane Foundation Tweets** using **TestBlob and Vader Python Libraries**.

## **Introduction: Libraries and Credentials** 

In [None]:
# Import all needed libraries
import tweepy                   # Python wrapper around Twitter API
from google.colab import drive  # to mount Drive to Colab notebook

import pandas as pd
pd.set_option('display.max_colwidth', None) #to see more text
import json 
import csv
from datetime import date
from datetime import datetime
import time
import numpy as np
import re
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import seaborn as sns
sns.set()

from textblob import TextBlob
from wordcloud import WordCloud
import string
import itertools
from collections import Counter
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [None]:
# Connect Google Drive to Colab
drive.mount('/content/gdrive')
# Create a variable to store the data path on your drive
path = './gdrive/My Drive/path/to/data'

Mounted at /content/gdrive


In [None]:
# Twitter API credentials
api_key = 'vpBT5zCco120S5VkWGeq0jbU3'
api_secret_key = 'nIk59NQikAS9kug5GcOkWuEqOqbAr1UgRhn5H77T9OiCdeEo7m'
access_token = '1362326919157583873-3NVcDUiiEBUsZGe06A7HzCTAs1tEDb'
access_token_secret = 'Y0Wrzi23jYd0TiAe9tNqQHax3PFqBrPRg2N2KQCl0rVtJ'

In [None]:
# Connect to Twitter API using the secrets
auth = tweepy.OAuthHandler(api_key, api_secret_key)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

In [None]:
# Helper function to save data into a JSON file
# file_name: the file name of the data on Google Drive
# file_content: the data you want to save
def save_json(file_name, file_content):
  with open(path + file_name, 'w', encoding='utf-8') as f:
    json.dump(file_content, f, ensure_ascii=False, indent=4)

In [None]:
# Helper function to handle twitter API rate limit
def limit_handled(cursor, list_name):
  while True:
    try:
      yield cursor.next()
    # Catch Twitter API rate limit exception and wait for 15 minutes
    except tweepy.RateLimitError:
      print("\nData points in list = {}".format(len(list_name)))
      print('Hit Twitter API rate limit.')
      for i in range(3, 0, -1):
        print("Wait for {} mins.".format(i * 5))
        time.sleep(5 * 60)
    # Catch any other Twitter API exceptions
    except tweepy.error.TweepError:
      print('\nCaught TweepError exception' )

In [None]:
# Define Microsoft Translation function 

# Microsoft Translation
import requests, uuid, json

def microsoft_translate(text):
  # Add your subscription key and endpoint
  subscription_key = "XXXXXXXXXXXXXXXX"
  endpoint = "https://api.cognitive.microsofttranslator.com"

  # Add your location, also known as region. The default is global.
  # This is required if using a Cognitive Services resource.
  location = "westeurope"

  path = '/translate'
  constructed_url = endpoint + path

  params = {
      'api-version': '3.0',
      'to': 'en'
  }

  headers = {
      'Ocp-Apim-Subscription-Key': subscription_key,
      'Ocp-Apim-Subscription-Region': location,
      'Content-type': 'application/json',
      'X-ClientTraceId': str(uuid.uuid4())
  }

  # You can pass more than one object in body.
  body = [{
      'text': str(text)
  }]

  request = requests.post(constructed_url, params=params, headers=headers, json=body)
  response = request.json()

  # print(json.dumps(response, sort_keys=True, ensure_ascii=False, indent=4, separators=(',', ': ')))

  return response[0]['translations'][0]['text']

## **Get the account tweets**

In [None]:
# List of L'Occitane and Erborian accounts on Twitter
Brands = ["Fdt_LOccitaneEN"]

In [None]:
data = []

for Brand in Brands:
  # initialize a list to hold all the Tweets
  alltweets = []
  # make initial request for most recent tweets 
  # (200 is the maximum allowed count)
  new_tweets = api.user_timeline(Brand,count=200)
  # save most recent tweets
  alltweets.extend(new_tweets)
  # save the id of the oldest tweet less one to avoid duplication
  oldest = alltweets[-1].id - 1
  # keep grabbing tweets until there are no tweets left
  while len(new_tweets) > 0:
      print("getting tweets before %s" % (oldest))
      # all subsequent requests use the max_id param to prevent
      # duplicates
      new_tweets = api.user_timeline(Brand,count=200,max_id=oldest)
      # save most recent tweets
      alltweets.extend(new_tweets)
      # update the id of the oldest tweet less one
      oldest = alltweets[-1].id - 1
      print("...%s tweets downloaded so far" % (len(alltweets)))
      ### END OF WHILE LOOP ###

  # transform the tweepy tweets into a list 
  # populate the list
  outtweets = [[Brand, tweet.id_str, tweet.created_at, tweet.text, tweet.favorite_count,tweet.in_reply_to_screen_name, tweet.retweeted] for tweet in alltweets]
  data = data + outtweets
  dftweets = pd.DataFrame(data, columns=["brand","id","created_at","text","likes","in reply to","retweeted"])
  print(dftweets["brand"].value_counts())

getting tweets before 1316321336042586113
...400 tweets downloaded so far
getting tweets before 1182321989542973440
...600 tweets downloaded so far
getting tweets before 996683247198330879
...734 tweets downloaded so far
getting tweets before 931562864413900799
...734 tweets downloaded so far
Fdt_LOccitaneEN    734
Name: brand, dtype: int64


**Now let's have a look at our dataset!**

In [None]:
# Check the head of the dftweets
dftweets

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted
0,Fdt_LOccitaneEN,1504762089289756684,2022-03-18 10:10:13,The L'OCCITANE Foundation has been a member of the International Agency for the Prevention of Blindness since 2016.… https://t.co/cdSuKsWarz,2,,False
1,Fdt_LOccitaneEN,1503672746970132481,2022-03-15 10:01:34,"In 2021-2022, @loccitaneusa and The L’OCCITANE Foundation are supporting the reforestation project of… https://t.co/PLLUyPDRqE",0,,False
2,Fdt_LOccitaneEN,1502945311219666952,2022-03-13 09:51:00,"DAY 5 \nDiscover the testimony of Alexandra, founder of Apprendre Autrement, a company that works for the awakening… https://t.co/UbqJm2oBLU",0,,False
3,Fdt_LOccitaneEN,1502582671624609794,2022-03-12 09:50:00,"DAY 4 \nDiscover the testimony of Sandrine, promoter of Doux goûts, a company that produces and markets fruit and ve… https://t.co/hR0eGGS525",0,,False
4,Fdt_LOccitaneEN,1502215445570637826,2022-03-11 09:30:46,"DAY 3 Discover the testimony of Stéphanie, beneficiary and founder of Repère Magazine, a bilingual web platform for… https://t.co/DP3tEiwgwe",0,,False
...,...,...,...,...,...,...,...
729,Fdt_LOccitaneEN,933339871648911361,2017-11-22 14:22:20,Congratulations to @LOCCITANE_UK and @Sightsavers for this great partnership 🤝👁️! https://t.co/CIDzN9pHfe,3,,False
730,Fdt_LOccitaneEN,932604068484304896,2017-11-20 13:38:31,#WorldChildrensDay 😍🤟 https://t.co/ChrK7rV8Tc,1,,False
731,Fdt_LOccitaneEN,931568117167415298,2017-11-17 17:02:01,[#WorldPrematurityDay] @LOccitane_FR and the Foundation support #LionsClubsdeFrance to increase the number of eye… https://t.co/T7IGgmns8X,0,,False
732,Fdt_LOccitaneEN,931565272862986240,2017-11-17 16:50:43,#WorldPrematurityDay https://t.co/GuURfLgpoQ,0,,False


In [None]:
# How many tweets do we have by brand/market? 
dftweets['brand'].value_counts()

Fdt_LOccitaneEN    734
Name: brand, dtype: int64

Now let's **clean** our text column:

In [None]:
# Clean the text

# Create a function to clean the tweets

def cleanTxt(text):
  text = re.sub(r'@[A-Za-z0-9_]+', '', text) #Revoming @mentions
  text = re.sub(r'#', '', text) # Removing the '#' simbol
  text = re.sub(r'RT[\s]+', '', text) # Removing RT
  text = re.sub(r'https?:\/\/\S+', '', text) # Removing the hyper link
  text = text.lower() # make text lowercase
  text = re.sub('\[.*?\]', '', text) # removing text within brackets
  text = re.sub('\(.*?\)', '', text) # removing text within parentheses
  text = re.sub('\w*\d\w*', '', text) # removing numbers
  text = re.sub('\s+', ' ', text) # if there's more than 1 whitespace, then make it just 1
  text = re.sub('\n', ' ', text) # if there's a new line, then make it a whitespace
  text = re.sub('\"+', '', text) # removing any quotes
  text = re.sub('(\&amp\;)', '', text) # removing &amp;
  text = re.sub('[%s]' % re.escape(string.punctuation), '', text) # Get rid of all punctuation
  text = re.sub('(httptco)', '', text) # getting rid of `httptco`
  text = re.sub(r'[^\w\s]', '',text) # remove other punctuation

  return text

# Cleaning the text
dftweets['text'] = dftweets['text'].apply(cleanTxt)

# Show the cleaned text
dftweets.head()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted
0,Fdt_LOccitaneEN,1504762089289756684,2022-03-18 10:10:13,the loccitane foundation has been a member of the international agency for the prevention of blindness since,2,,False
1,Fdt_LOccitaneEN,1503672746970132481,2022-03-15 10:01:34,in and the loccitane foundation are supporting the reforestation project of,0,,False
2,Fdt_LOccitaneEN,1502945311219666952,2022-03-13 09:51:00,day discover the testimony of alexandra founder of apprendre autrement a company that works for the awakening,0,,False
3,Fdt_LOccitaneEN,1502582671624609794,2022-03-12 09:50:00,day discover the testimony of sandrine promoter of doux goûts a company that produces and markets fruit and ve,0,,False
4,Fdt_LOccitaneEN,1502215445570637826,2022-03-11 09:30:46,day discover the testimony of stéphanie beneficiary and founder of repère magazine a bilingual web platform for,0,,False


## **Sentiment Analysis**

### **Number of words and text length**

In [None]:
dftweets['text_len'] = dftweets['text'].apply(lambda x: len(str(x).split()))
dftweets

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,text_len
0,Fdt_LOccitaneEN,1504762089289756684,2022-03-18 10:10:13,the loccitane foundation has been a member of the international agency for the prevention of blindness since,2,,False,17
1,Fdt_LOccitaneEN,1503672746970132481,2022-03-15 10:01:34,in and the loccitane foundation are supporting the reforestation project of,0,,False,11
2,Fdt_LOccitaneEN,1502945311219666952,2022-03-13 09:51:00,day discover the testimony of alexandra founder of apprendre autrement a company that works for the awakening,0,,False,17
3,Fdt_LOccitaneEN,1502582671624609794,2022-03-12 09:50:00,day discover the testimony of sandrine promoter of doux goûts a company that produces and markets fruit and ve,0,,False,19
4,Fdt_LOccitaneEN,1502215445570637826,2022-03-11 09:30:46,day discover the testimony of stéphanie beneficiary and founder of repère magazine a bilingual web platform for,0,,False,17
...,...,...,...,...,...,...,...,...
729,Fdt_LOccitaneEN,933339871648911361,2017-11-22 14:22:20,congratulations to and for this great partnership,3,,False,7
730,Fdt_LOccitaneEN,932604068484304896,2017-11-20 13:38:31,worldchildrensday,1,,False,1
731,Fdt_LOccitaneEN,931568117167415298,2017-11-17 17:02:01,and the foundation support lionsclubsdefrance to increase the number of eye,0,,False,11
732,Fdt_LOccitaneEN,931565272862986240,2017-11-17 16:50:43,worldprematurityday,0,,False,1


In [None]:
# Set number of words
dftweets.rename(columns={"text_len": "nb_words"}, inplace=True)
dftweets.head()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,nb_words
0,Fdt_LOccitaneEN,1504762089289756684,2022-03-18 10:10:13,the loccitane foundation has been a member of the international agency for the prevention of blindness since,2,,False,17
1,Fdt_LOccitaneEN,1503672746970132481,2022-03-15 10:01:34,in and the loccitane foundation are supporting the reforestation project of,0,,False,11
2,Fdt_LOccitaneEN,1502945311219666952,2022-03-13 09:51:00,day discover the testimony of alexandra founder of apprendre autrement a company that works for the awakening,0,,False,17
3,Fdt_LOccitaneEN,1502582671624609794,2022-03-12 09:50:00,day discover the testimony of sandrine promoter of doux goûts a company that produces and markets fruit and ve,0,,False,19
4,Fdt_LOccitaneEN,1502215445570637826,2022-03-11 09:30:46,day discover the testimony of stéphanie beneficiary and founder of repère magazine a bilingual web platform for,0,,False,17


In [None]:
# Set Text len
dftweets['text_len'] = dftweets['text'].astype(str).apply(len)
dftweets.head()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,nb_words,text_len
0,Fdt_LOccitaneEN,1504762089289756684,2022-03-18 10:10:13,the loccitane foundation has been a member of the international agency for the prevention of blindness since,2,,False,17,110
1,Fdt_LOccitaneEN,1503672746970132481,2022-03-15 10:01:34,in and the loccitane foundation are supporting the reforestation project of,0,,False,11,77
2,Fdt_LOccitaneEN,1502945311219666952,2022-03-13 09:51:00,day discover the testimony of alexandra founder of apprendre autrement a company that works for the awakening,0,,False,17,110
3,Fdt_LOccitaneEN,1502582671624609794,2022-03-12 09:50:00,day discover the testimony of sandrine promoter of doux goûts a company that produces and markets fruit and ve,0,,False,19,111
4,Fdt_LOccitaneEN,1502215445570637826,2022-03-11 09:30:46,day discover the testimony of stéphanie beneficiary and founder of repère magazine a bilingual web platform for,0,,False,17,112


### **Subjectivity and Polarity with TextBlob**

In [None]:
# Create a function to get the subjectivity
def getSubjectivity(text):
  return TextBlob(text).sentiment.subjectivity

# Create a function to get the polarity
def getPolarity(text):
  return TextBlob(text).sentiment.polarity

# Create two new columns
dftweets['Subjectivity'] = dftweets['text'].apply(getSubjectivity)
dftweets['Polarity'] = dftweets['text'].apply(getPolarity)

# Show the new dataframe with the new columns
dftweets.head()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,nb_words,text_len,Subjectivity,Polarity
0,Fdt_LOccitaneEN,1504762089289756684,2022-03-18 10:10:13,the loccitane foundation has been a member of the international agency for the prevention of blindness since,2,,False,17,110,0.0,0.0
1,Fdt_LOccitaneEN,1503672746970132481,2022-03-15 10:01:34,in and the loccitane foundation are supporting the reforestation project of,0,,False,11,77,0.25,0.25
2,Fdt_LOccitaneEN,1502945311219666952,2022-03-13 09:51:00,day discover the testimony of alexandra founder of apprendre autrement a company that works for the awakening,0,,False,17,110,0.0,0.0
3,Fdt_LOccitaneEN,1502582671624609794,2022-03-12 09:50:00,day discover the testimony of sandrine promoter of doux goûts a company that produces and markets fruit and ve,0,,False,19,111,0.0,0.0
4,Fdt_LOccitaneEN,1502215445570637826,2022-03-11 09:30:46,day discover the testimony of stéphanie beneficiary and founder of repère magazine a bilingual web platform for,0,,False,17,112,0.0,0.0


In [None]:
dftweets.head()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,nb_words,text_len,Subjectivity,Polarity
0,Fdt_LOccitaneEN,1504762089289756684,2022-03-18 10:10:13,the loccitane foundation has been a member of the international agency for the prevention of blindness since,2,,False,17,110,0.0,0.0
1,Fdt_LOccitaneEN,1503672746970132481,2022-03-15 10:01:34,in and the loccitane foundation are supporting the reforestation project of,0,,False,11,77,0.25,0.25
2,Fdt_LOccitaneEN,1502945311219666952,2022-03-13 09:51:00,day discover the testimony of alexandra founder of apprendre autrement a company that works for the awakening,0,,False,17,110,0.0,0.0
3,Fdt_LOccitaneEN,1502582671624609794,2022-03-12 09:50:00,day discover the testimony of sandrine promoter of doux goûts a company that produces and markets fruit and ve,0,,False,19,111,0.0,0.0
4,Fdt_LOccitaneEN,1502215445570637826,2022-03-11 09:30:46,day discover the testimony of stéphanie beneficiary and founder of repère magazine a bilingual web platform for,0,,False,17,112,0.0,0.0


### **Polarity and Intensity with VADER**

In [None]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...




In [None]:
sentiment = dftweets['text'].apply(lambda x: analyzer.polarity_scores(str(x)))
dftweets = pd.concat([dftweets,sentiment.apply(pd.Series)],1)

In [None]:
dftweets.rename(columns={'Subjectivity':'subjectivity', 'Polarity':'polarity', 'neg':'negative',
                         'neu':'neutral', 'pos':'positive'}, inplace=True)
dftweets.head()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,nb_words,text_len,subjectivity,polarity,negative,neutral,positive,compound
0,Fdt_LOccitaneEN,1504762089289756684,2022-03-18 10:10:13,the loccitane foundation has been a member of the international agency for the prevention of blindness since,2,,False,17,110,0.0,0.0,0.0,1.0,0.0,0.0
1,Fdt_LOccitaneEN,1503672746970132481,2022-03-15 10:01:34,in and the loccitane foundation are supporting the reforestation project of,0,,False,11,77,0.25,0.25,0.0,0.775,0.225,0.4404
2,Fdt_LOccitaneEN,1502945311219666952,2022-03-13 09:51:00,day discover the testimony of alexandra founder of apprendre autrement a company that works for the awakening,0,,False,17,110,0.0,0.0,0.0,1.0,0.0,0.0
3,Fdt_LOccitaneEN,1502582671624609794,2022-03-12 09:50:00,day discover the testimony of sandrine promoter of doux goûts a company that produces and markets fruit and ve,0,,False,19,111,0.0,0.0,0.0,1.0,0.0,0.0
4,Fdt_LOccitaneEN,1502215445570637826,2022-03-11 09:30:46,day discover the testimony of stéphanie beneficiary and founder of repère magazine a bilingual web platform for,0,,False,17,112,0.0,0.0,0.0,0.829,0.171,0.4767


### **Save CSV File**

In [None]:
dftweets.to_csv("/content/gdrive/MyDrive/Kedge Thesis: Voice of Stakeholders/Fondation/Foundation_Tweet_Posts.csv")

In [None]:
dftweets.tail()

Unnamed: 0,brand,id,created_at,text,likes,in reply to,retweeted,nb_words,text_len,subjectivity,polarity,negative,neutral,positive,compound
729,Fdt_LOccitaneEN,933339871648911361,2017-11-22 14:22:20,congratulations to and for this great partnership,3,,False,7,51,0.75,0.8,0.0,0.363,0.637,0.8682
730,Fdt_LOccitaneEN,932604068484304896,2017-11-20 13:38:31,worldchildrensday,1,,False,1,19,0.0,0.0,0.0,1.0,0.0,0.0
731,Fdt_LOccitaneEN,931568117167415298,2017-11-17 17:02:01,and the foundation support lionsclubsdefrance to increase the number of eye,0,,False,11,77,0.0,0.0,0.0,0.559,0.441,0.6486
732,Fdt_LOccitaneEN,931565272862986240,2017-11-17 16:50:43,worldprematurityday,0,,False,1,20,0.0,0.0,0.0,1.0,0.0,0.0
733,Fdt_LOccitaneEN,931562864413900800,2017-11-17 16:41:09,the loccitane foundation is now on twitter firsttweet,0,,False,8,53,0.0,0.0,0.0,1.0,0.0,0.0
