# **Voice of Customers**

We are scraping **Trust Pilot reviews** for both **L'Occitane en Provence** brand using Requests and Beautifoul Soup Libraries:

Link: https://towardsdatascience.com/customer-reviews-identify-your-strengths-and-weaknesses-with-the-help-of-web-scraping-data-b87a3636ef55


Another interesting Reviews Website: https://www.resellerratings.com/store/L_Occitane/page/4

In [25]:
!pip install deep_translator -qq
!pip install googletrans -qq

[K     |████████████████████████████████| 55 kB 2.3 MB/s 
[K     |████████████████████████████████| 1.3 MB 14.4 MB/s 
[K     |████████████████████████████████| 42 kB 1.1 MB/s 
[K     |████████████████████████████████| 65 kB 3.1 MB/s 
[K     |████████████████████████████████| 53 kB 1.9 MB/s 
[?25h  Building wheel for googletrans (setup.py) ... [?25l[?25hdone


In [30]:
#import the libraries
import os
import time
import re
import string

import numpy as np
import pandas as pd
import math

import requests
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen

from deep_translator import GoogleTranslator
from googletrans import Translator
from textblob import TextBlob

In [32]:
summary = []
date = []
language = []
rating = []

for p in range(0,15):

    http = requests.get(f'https://ie.trustpilot.com/review/www.loccitane.co.uk?languages=all&page={p}')
    bsoup = BeautifulSoup(http.text, 'html.parser')

    for x in bsoup.find_all('p', class_ = "typography_typography__QgicV typography_body__9UBeQ typography_color-black__5LYEn typography_weight-regular__TWEnf typography_fontstyle-normal__kHyN3"):
      # Record initial language
      language.append(x.text)
      # Translate text
      text_translated = GoogleTranslator('auto', 'en').translate(x.text)
      # Append final english text
      summary.append(text_translated)

    for x in bsoup.find_all('time', class_ = ""):
      date.append(x['datetime'])

    for x in bsoup.find_all('img'):
      if "Rated" in x['alt']:
        rating.append(x['alt'][6])

#putting everything together
reviews = pd.DataFrame(list(zip(language, summary, date, rating)), 
              columns = ['language','summary', 'date', 'rating'])

reviews['date'] = pd.to_datetime(reviews['date']).dt.strftime("%m/%d/%y")
reviews['date'] = pd.to_datetime(reviews['date'])
reviews['rating'] = reviews['rating'].astype(int)
reviews.sort_values(by='date', ascending= False)
reviews

Unnamed: 0,language,summary,date,rating
0,I received polite and kind service from Vivien...,I received polite and kind service from Vivien...,2022-05-21,5
1,Vivienne helped me with an issue of a Gift cod...,Vivienne helped me with an issue of a Gift cod...,2022-05-21,5
2,I asked about purchasing a product that is exc...,I asked about purchasing a product that is exc...,2022-05-21,5
3,I contacted L'Occitane customer service as my ...,I contacted L'Occitane customer service as my ...,2022-05-21,5
4,Excellent customer service. Soni was extremely...,Excellent customer service. Soni was extremely...,2022-05-20,5
...,...,...,...,...
270,"First of all, I don’t know why all of you peep...","First of all, I don’t know why all of you peep...",2019-06-21,1
271,I got a sample of their hand cream 20 years ag...,I got a sample of their hand cream 20 years ag...,2019-06-09,1
272,Top products! Few days ago received promotion ...,Top products! Few days ago received promotion ...,2019-05-27,5
273,Ordered on a Saturday.Picked and packed on Tue...,Ordered on a Saturday.Picked and packed on Tue...,2019-05-14,1


In [33]:
# Check reviews info
reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 275 entries, 0 to 274
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   language  275 non-null    object        
 1   summary   275 non-null    object        
 2   date      275 non-null    datetime64[ns]
 3   rating    275 non-null    int64         
dtypes: datetime64[ns](1), int64(1), object(2)
memory usage: 8.7+ KB


In [34]:
# Clean text
# Clean the text

# Create a function to clean the tweets

def cleanTxt(text):
  text = re.sub(r'@[A-Za-z0-9_]+', '', text) #Revoming @mentions
  text = re.sub(r'#', '', text) # Removing the '#' simbol
  text = re.sub(r'RT[\s]+', '', text) # Removing RT
  text = re.sub(r'https?:\/\/\S+', '', text) # Removing the hyper link
  text = text.lower() # make text lowercase
  text = re.sub('\[.*?\]', '', text) # removing text within brackets
  text = re.sub('\(.*?\)', '', text) # removing text within parentheses
  text = re.sub('\w*\d\w*', '', text) # removing numbers
  text = re.sub('\s+', ' ', text) # if there's more than 1 whitespace, then make it just 1
  text = re.sub('\n', ' ', text) # if there's a new line, then make it a whitespace
  text = re.sub('\"+', '', text) # removing any quotes
  text = re.sub('(\&amp\;)', '', text) # removing &amp;
  text = re.sub('[%s]' % re.escape(string.punctuation), '', text) # Get rid of all punctuation
  text = re.sub('(httptco)', '', text) # getting rid of `httptco`
  text = re.sub(r'[^\w\s]', '',text) # remove other punctuation

  return text

# Cleaning the text
reviews['summary'] = reviews['summary'].apply(cleanTxt)

# Show the cleaned text
reviews.head()

Unnamed: 0,language,summary,date,rating
0,I received polite and kind service from Vivien...,i received polite and kind service from vivien...,2022-05-21,5
1,Vivienne helped me with an issue of a Gift cod...,vivienne helped me with an issue of a gift cod...,2022-05-21,5
2,I asked about purchasing a product that is exc...,i asked about purchasing a product that is exc...,2022-05-21,5
3,I contacted L'Occitane customer service as my ...,i contacted loccitane customer service as my s...,2022-05-21,5
4,Excellent customer service. Soni was extremely...,excellent customer service soni was extremely ...,2022-05-20,5


In [35]:
import nltk
nltk.download('vader_lexicon')
from nltk.sentiment.vader import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...




In [38]:
sentiment = reviews['summary'].apply(lambda x: analyzer.polarity_scores(str(x)))
reviews = pd.concat([reviews,sentiment.apply(pd.Series)],1)

  


In [39]:
reviews.rename(columns={'Subjectivity':'subjectivity', 'Polarity':'polarity', 'neg':'negative',
                         'neu':'neutral', 'pos':'positive'}, inplace=True)
reviews.head()

Unnamed: 0,language,summary,date,rating,negative,neutral,positive,compound
0,I received polite and kind service from Vivien...,i received polite and kind service from vivien...,2022-05-21,5,0.0,0.654,0.346,0.9231
1,Vivienne helped me with an issue of a Gift cod...,vivienne helped me with an issue of a gift cod...,2022-05-21,5,0.128,0.667,0.205,0.7902
2,I asked about purchasing a product that is exc...,i asked about purchasing a product that is exc...,2022-05-21,5,0.0,0.648,0.352,0.9469
3,I contacted L'Occitane customer service as my ...,i contacted loccitane customer service as my s...,2022-05-21,5,0.072,0.763,0.165,0.9486
4,Excellent customer service. Soni was extremely...,excellent customer service soni was extremely ...,2022-05-20,5,0.0,0.482,0.518,0.9537


In [40]:
reviews.to_csv('/content/drive/MyDrive/Kedge Thesis: Voice of Stakeholders/4. Voice of Customers/TrustPilot reviews.csv')