# Scraping posts for all German energy suppliers at Trustpilot using BeautifulSoup 

Author: Matthias Isele

This notebook scrapes posts for all German energy suppliers at Trustpilot.

The goal is to extract for each company all customer post in hindsight of

- Customer nickname (nickname)
- Location of Customer (location)
- number of stars (stars)
- Headline of post (headline)
- Date of post (dop)
- Date of experience (doe)
- If there is one: Comment of customer (comment)
- If there is one: Answer of eon (answer)
- If there is one: Date of answer (doa)

There is a random time delay implemented to avoid the error 'HTTP Error 403: Forbidden'. This error appears because Trustpilot blocks users that gather too much data per time.   

In [69]:
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as bs

import numpy as np
import pandas as pd

import time
import random


We will iterate over the following list of energy suppliers which was automatically scraped by Stefanies notebook.

In [71]:
energy_suppliers=pd.read_csv('C:/Users/isele/OneDrive/Desktop/Supply Chain - Customer Satisfaction/Data/ener_supplier_rankings_clean.csv')
display(energy_suppliers)

Unnamed: 0,supplier,city,country,cat,score,votes,comment
0,Octopus Energy Germany,München,Deutschland,Ökostromanbieter Energieanbieter Stromversorgu...,4.8,8042,https://de.trustpilot.com/review/octopusenergy.de
1,Ostrom,Berlin,Deutschland,Energieanbieter Ökostromanbieter Stromversorgu...,4.8,1598,https://de.trustpilot.com/review/ostrom.de
2,Rabot Charge,Hamburg,Deutschland,Energieversorger Energieanbieter Stromversorgu...,4.3,174,https://de.trustpilot.com/review/rabot-charge.de
3,MONTANA Group,Grünwald,Deutschland,Energieanbieter Mineralölunternehmen Kraftstof...,4.0,3146,https://de.trustpilot.com/review/montana-energ...
4,E.ON Energie Deutschland GmbH,München,Deutschland,Energieversorger Stromversorgungsunternehmen Ö...,3.7,13223,https://de.trustpilot.com/review/eon.de
5,Grünwelt Energie,Kaarst,Deutschland,Stromversorgungsunternehmen,3.6,1964,https://de.trustpilot.com/review/www.gruenwelt.de
6,RheinEnergie,Köln,Deutschland,Ökostromanbieter Energieanbieter Gasversorgung...,3.4,528,https://de.trustpilot.com/review/rheinenergie.com
7,badenova,Freiburg im Breisgau,Deutschland,Stromversorgungsunternehmen Energieanbieter Ga...,2.7,241,https://de.trustpilot.com/review/www.badenova.de
8,pricewise.de,Heidelberg,Deutschland,Gasversorgungsunternehmen Stromversorgungsunte...,4.8,119,https://de.trustpilot.com/review/www.prizewize.de
9,DFM-Select GmbH,Metzingen,Deutschland,Anbieter von Elektronikbauteilen Technischer K...,4.6,22,https://de.trustpilot.com/review/dfm-select.de


To simplify scraping we split the DataFrame based on number of votes.

In [72]:
#drop companies with zero votes 
es=energy_suppliers.drop(energy_suppliers[energy_suppliers['votes'] == 0].index)
es.reset_index(inplace=True, drop=True) 
#display(es)

In [73]:
#companies with less equal than 250 votes
es_leq250=es[es['votes']<=250]
es_leq250.reset_index(inplace=True, drop=True)

#companies with votes between 250 and 2500
es_leq4000=es[(250<es['votes'])&(es['votes']<=4000)]
es_leq4000.reset_index(inplace=True, drop=True)

#companies with more than 2500 votes
es_geq4000=es[4000<es['votes']]
es_geq4000.reset_index(inplace=True, drop=True)

In [20]:
display(es_leq250)
display(es_leq4000)
display(es_geq4000)

Unnamed: 0,supplier,city,country,cat,score,votes,comment
0,Rabot Charge,Hamburg,Deutschland,Energieversorger Energieanbieter Stromversorgu...,4.3,174,https://de.trustpilot.com/review/rabot-charge.de
1,badenova,Freiburg im Breisgau,Deutschland,Stromversorgungsunternehmen Energieanbieter Ga...,2.7,241,https://de.trustpilot.com/review/www.badenova.de
2,pricewise.de,Heidelberg,Deutschland,Gasversorgungsunternehmen Stromversorgungsunte...,4.8,119,https://de.trustpilot.com/review/www.prizewize.de
3,DFM-Select GmbH,Metzingen,Deutschland,Anbieter von Elektronikbauteilen Technischer K...,4.6,22,https://de.trustpilot.com/review/dfm-select.de
4,Erdgas Südwest GmbH,Ettlingen,Deutschland,Stromversorgungsunternehmen Heizungsanlagenanb...,4.4,46,https://de.trustpilot.com/review/erdgas-suedwe...
5,Zenstrom,Berlin,Deutschland,Energieanbieter Ökostromanbieter Stromversorgu...,4.0,89,https://de.trustpilot.com/review/zenstrom.de
6,VeganStrom,Berlin,Deutschland,Stromversorgungsunternehmen Ökostromanbieter E...,4.0,30,https://de.trustpilot.com/review/veganstrom.com
7,Fair Trade Power Deutschland GmbH,München,Deutschland,Stromversorgungsunternehmen Ökostromanbieter E...,4.0,3,https://de.trustpilot.com/review/fairtradepowe...
8,Lekker Energie GmbH,Berlin,Deutschland,Energieanbieter Stromversorgungsunternehmen En...,3.9,223,https://de.trustpilot.com/review/lekker.de
9,Paketsparer,Berlin,Deutschland,Telekommunikationsanbieter Internetanbieter Te...,3.7,49,https://de.trustpilot.com/review/paketsparer.de


Unnamed: 0,supplier,city,country,cat,score,votes,comment
0,Ostrom,Berlin,Deutschland,Energieanbieter Ökostromanbieter Stromversorgu...,4.8,1598,https://de.trustpilot.com/review/ostrom.de
1,MONTANA Group,Grünwald,Deutschland,Energieanbieter Mineralölunternehmen Kraftstof...,4.0,3146,https://de.trustpilot.com/review/montana-energ...
2,Grünwelt Energie,Kaarst,Deutschland,Stromversorgungsunternehmen,3.6,1964,https://de.trustpilot.com/review/www.gruenwelt.de
3,RheinEnergie,Köln,Deutschland,Ökostromanbieter Energieanbieter Gasversorgung...,3.4,528,https://de.trustpilot.com/review/rheinenergie.com
4,NEW Energie,Mönchengladbach,Deutschland,Stromversorgungsunternehmen,3.6,615,https://de.trustpilot.com/review/www.new-energ...
5,MEP Werke,Eckernförde,Deutschland,Stromversorgungsunternehmen,1.5,961,https://de.trustpilot.com/review/mep-werke.de
6,LichtBlick,Hamburg,Deutschland,Stromversorgungsunternehmen Energieanbieter Ök...,1.3,1708,https://de.trustpilot.com/review/lichtblick.de
7,EWE,,,Stromversorgungsunternehmen,1.2,2228,https://de.trustpilot.com/review/www.ewe.de


Unnamed: 0,supplier,city,country,cat,score,votes,comment
0,Octopus Energy Germany,München,Deutschland,Ökostromanbieter Energieanbieter Stromversorgu...,4.8,8042,https://de.trustpilot.com/review/octopusenergy.de
1,E.ON Energie Deutschland GmbH,München,Deutschland,Energieversorger Stromversorgungsunternehmen Ö...,3.7,13223,https://de.trustpilot.com/review/eon.de
2,Vattenfall Europe Sales GmbH,,,Stromversorgungsunternehmen Gasversorgungsunte...,4.4,10204,https://de.trustpilot.com/review/www.vattenfal...
3,eprimo GmbH,Neu-Isenburg,Deutschland,Energieanbieter Gasversorgungsunternehmen Stro...,2.6,8786,https://de.trustpilot.com/review/eprimo.de


In [6]:
#pd.set_option('display.max_colwidth', -1) 
#pd.reset_option('display.max_colwidth')
#df_final

We scrape, clean and save the result with the help of the following function. The boolean parameter 'sleep' aktivates or deactivates a random time delay between three and five seconds per iteration.

In [23]:
def scrape_clean_save(company,url1,sleep):
    
    #SCRAPE------------------------------------------------------------------------------------------------------------------
    #Find number of pages for fixed energy supplier
    page1 = urlopen(url1)
    soup1 = bs(page1, "html.parser")

    navigation=soup1.find('nav',{'role':'navigation'}).findAll('span',{'class':'typography_heading-xxs__QKBS8 typography_appearance-inherit__D7XqR typography_disableResponsiveSizing__OuNP7'})
    n_pages=int(navigation[-2].text)#number of pages

    #Iterating over pages
    start_page=1 #included
    end_page=n_pages #included

    #The following list will collect a DataFrame for each page.
    df=[]

    for n in range(start_page,end_page+1):

        url = url1+'?page='+str(n)
        req = Request(url , headers={'User-Agent': 'Mozilla/5.0'})

        #Implement try and catch
        try:
            page = urlopen(req).read()
            soup = bs(page, 'html.parser')
            card_list=soup.findAll('div', {'class': 'styles_reviewCardInner__EwDq2'})

            #Create the DataFrame
            nickname=[]
            location=[]
            stars=[]
            headline=[]
            dop=[]
            doe=[]
            comment=[]
            answer=[]
            doa=[]

            for card in card_list:

                nickname.append(card.find('span', {'class': 'typography_heading-xxs__QKBS8 typography_appearance-default__AAY17'}).text)

                location.append(card.find('div', {'class': 'typography_body-m__xgxZ_ typography_appearance-subtle__8_H2l styles_detailsIcon__Fo_ua'}).find('span').text)

                stars.append(card.find('div', {'class': 'star-rating_starRating__4rrcf star-rating_medium__iN6Ty'}).find('img')['alt'])

                headline.append(card.find('h2', {'class': 'typography_heading-s__f7029 typography_appearance-default__AAY17'}).text)

                dop.append(card.find('time')['datetime'])

                doe.append(card.find('p', {'class': 'typography_body-m__xgxZ_ typography_appearance-default__AAY17'}).text)

                #Check, whether there is a comment before appending. If not, append 'None'.  
                if card.find('p', {'class': 'typography_body-l__KUYFJ typography_appearance-default__AAY17 typography_color-black__5LYEn'}):
                    comment.append(card.find('p', {'class': 'typography_body-l__KUYFJ typography_appearance-default__AAY17 typography_color-black__5LYEn'}).text)
                else: comment.append(None)

                #Check, whether there is an answer. If yes, append answer and doa. If no, append 'None' in both cases.                  
                if card.find('p', {'data-service-review-business-reply-text-typography': 'true'}):
                    answer.append(card.find('p', {'data-service-review-business-reply-text-typography': 'true'}).text)
                    doa.append(card.find('div',{'class':'styles_replyInfo__FYSje'}).find('time')['datetime'])
                else:
                    answer.append(None)
                    doa.append(None)



            column_names=['Nickname', 'Location', 'Stars','Headline','DoP','DoE','Comment','Answer','DoA']
            df_page=pd.DataFrame(list(zip(nickname, location, stars, headline, dop, doe, comment, answer, doa)), columns=column_names)
            df_page['Page']=n
            df.append(df_page)

        except:
            print(f'iteration stopped at page {n} and company {company}')
            break

        random_number = round(random.uniform(3, 5), 2)
        if sleep==True:
            time.sleep(random_number) # add stop of random duration between 3-5 seconds

    #Concatenate the list of DataFrames and reset index.
    df_concat=pd.concat(df)
    df_concat.reset_index(inplace=True, drop=True) 
        
    #CLEAN--------------------------------------------------------------------------------------------------------------
    df_cleaned=df_concat.copy()

    #Column 'Stars' should have values from 1 to 5
    df_cleaned['Stars']=df_cleaned['Stars'].apply(lambda x: x[13]).astype('int')


    #Split 'DoE' into 'DoE.day', 'DoE.month', 'DoE.Year' and convert 'DoE' to datetime.
    df_cleaned['DoE']=df_cleaned['DoE'].apply(lambda x: x[21:])
    df_cleaned[['DoE.day', 'DoE.month','DoE.year']] = df_cleaned['DoE'].str.split(' ', expand=True)
    months = {'Januar':'january', 'Februar':'february','März':'march','April':'april','Mai':'may','Juni':'june','Juli':'july','August':'august','September':'september','Oktober':'october','November':'november','Dezember':'december'}
    months_numbers = {'Januar':'1', 'Februar':'2','März':'3','April':'4','Mai':'5','Juni':'6','Juli':'7','August':'8','September':'9','Oktober':'10','November':'11','Dezember':'12'}
    df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
    df_cleaned=df_cleaned.replace({'DoE.month': months_numbers})
    df_cleaned['DoE']=pd.to_datetime(df_cleaned['DoE.month']+'-'+df_cleaned['DoE.day']+'-'+df_cleaned['DoE.year'])
    df_cleaned=df_cleaned.astype({'DoE.day': 'int', 'DoE.month': 'int', 'DoE.year': 'int'})


    #Convert DoP to datetime and create 'DoP.day', 'DoP.month', 'DoP.Year' 
    df_cleaned['DoP']=pd.to_datetime(df_cleaned['DoP'])
    df_cleaned['DoP.day']=df_cleaned['DoP'].apply(lambda x: x.day)
    df_cleaned['DoP.month']=df_cleaned['DoP'].apply(lambda x: x.month)
    df_cleaned['DoP.year']=df_cleaned['DoP'].apply(lambda x: x.year)


    #Convert DoA to datetime and create 'DoA.day', 'DoA.month', 'DoA.Year' 
    df_cleaned['DoA']=pd.to_datetime(df_cleaned['DoA'])
    df_cleaned['DoA.day']=df_cleaned['DoA'].apply(lambda x: x.day)
    df_cleaned['DoA.month']=df_cleaned['DoA'].apply(lambda x: x.month)
    df_cleaned['DoA.year']=df_cleaned['DoA'].apply(lambda x: x.year)


    #Clean Comments and Answers
    df_cleaned['Comment']= df_cleaned['Comment'].str.replace('\n', '')
    df_cleaned['Comment']= df_cleaned['Comment'].str.replace('\r', '')
    df_cleaned['Answer']= df_cleaned['Answer'].str.replace('\n', '')
    df_cleaned['Answer']= df_cleaned['Answer'].str.replace('\r', '')


    #Check whether there is comment or answer
    df_cleaned['Comment_TF']=df_cleaned['Comment'].apply(lambda x: 0 if x==None else 1)
    df_cleaned['Answer_TF']=df_cleaned['Answer'].apply(lambda x: 0 if x==None else 1)  
    
    #Note company
    df_cleaned['Company']=company
    
    #SAVE
    df_cleaned.to_csv('C:/Users/isele/OneDrive/Desktop/Supply Chain - Customer Satisfaction/Data/'+company+'.csv')

    return df_cleaned

First we scrape companies with less than 250 votes. As these are not many votes we can scrae over index ranges without sleep.

In [28]:
for k in range(len(es_leq250)):
    url1=(es_leq250['comment'])[k]
    company=(es_leq250['supplier'])[k]
    scrape_clean_save(company,url1,sleep=False)


  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].s

iteration stopped at page 7 and company voxenergie


  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')


Next companies between 250 and 4000 votes. No sleep is necessary but we should scrape them single handedly.

In [40]:
#k ranges from 0 to 7
for k in range(2,len(es_leq4000)):
    url1=(es_leq4000['comment'])[k]
    company=(es_leq4000['supplier'])[k]
    scrape_clean_save(company,url1,sleep=False)

iteration stopped at page 5 and company Grünwelt Energie


  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')


iteration stopped at page 98 and company EWE


  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')


Finally companies over 4000 votes. Sleep is necessary and they should be scraped them single handedly.

In [41]:
for k in range(len(es_geq4000)):
    url1=(es_geq4000['comment'])[k]
    company=(es_geq4000['supplier'])[k]
    scrape_clean_save(company,url1,sleep=True)

iteration stopped at page 97 and company Octopus Energy Germany


  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')


iteration stopped at page 312 and company E.ON Energie Deutschland GmbH


  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')


### Retrieving the missing pages.

The missing pages will be retriefed by generalizing the above approach to select specific page ranges.

In [74]:
def scrape_clean_save_v2(company,url1,sleep=False,start_page=1,end_page=0,suffix=""):
    
    #SCRAPE------------------------------------------------------------------------------------------------------------------
    #Find number of pages for fixed energy supplier
    page1 = urlopen(url1)
    soup1 = bs(page1, "html.parser")

    navigation=soup1.find('nav',{'role':'navigation'}).findAll('span',{'class':'typography_heading-xxs__QKBS8 typography_appearance-inherit__D7XqR typography_disableResponsiveSizing__OuNP7'})
    n_pages=int(navigation[-2].text)#number of pages

    #Iterating over pages
    if end_page==0:
        end_page=n_pages #included

    #The following list will collect a DataFrame for each page.
    df=[]

    for n in range(start_page,end_page+1):

        url = url1+'?page='+str(n)
        req = Request(url , headers={'User-Agent': 'Mozilla/5.0'})

        #Implement try and catch
        try:
            page = urlopen(req).read()
            soup = bs(page, 'html.parser')
            card_list=soup.findAll('div', {'class': 'styles_reviewCardInner__EwDq2'})

            #Create the DataFrame
            nickname=[]
            location=[]
            stars=[]
            headline=[]
            dop=[]
            doe=[]
            comment=[]
            answer=[]
            doa=[]

            for card in card_list:

                nickname.append(card.find('span', {'class': 'typography_heading-xxs__QKBS8 typography_appearance-default__AAY17'}).text)

                #Check, whether there is a location
                if card.find('div', {'class': 'typography_body-m__xgxZ_ typography_appearance-subtle__8_H2l styles_detailsIcon__Fo_ua'}).find('span'):
                    location.append(card.find('div', {'class': 'typography_body-m__xgxZ_ typography_appearance-subtle__8_H2l styles_detailsIcon__Fo_ua'}).find('span').text)
                else:
                    location.append(None)
                    
                stars.append(card.find('div', {'class': 'star-rating_starRating__4rrcf star-rating_medium__iN6Ty'}).find('img')['alt'])

                headline.append(card.find('h2', {'class': 'typography_heading-s__f7029 typography_appearance-default__AAY17'}).text)

                dop.append(card.find('time')['datetime'])

                doe.append(card.find('p', {'class': 'typography_body-m__xgxZ_ typography_appearance-default__AAY17'}).text)

                #Check, whether there is a comment before appending. If not, append 'None'.  
                if card.find('p', {'class': 'typography_body-l__KUYFJ typography_appearance-default__AAY17 typography_color-black__5LYEn'}):
                    comment.append(card.find('p', {'class': 'typography_body-l__KUYFJ typography_appearance-default__AAY17 typography_color-black__5LYEn'}).text)
                else: comment.append(None)

                #Check, whether there is an answer. If yes, append answer and doa. If no, append 'None' in both cases.                  
                if card.find('p', {'data-service-review-business-reply-text-typography': 'true'}):
                    answer.append(card.find('p', {'data-service-review-business-reply-text-typography': 'true'}).text)
                    doa.append(card.find('div',{'class':'styles_replyInfo__FYSje'}).find('time')['datetime'])
                else:
                    answer.append(None)
                    doa.append(None)



            column_names=['Nickname', 'Location', 'Stars','Headline','DoP','DoE','Comment','Answer','DoA']
            df_page=pd.DataFrame(list(zip(nickname, location, stars, headline, dop, doe, comment, answer, doa)), columns=column_names)
            df_page['Page']=n
            df.append(df_page)

        except:
            print(f'iteration stopped at page {n} and company {company}')
            break

        random_number = round(random.uniform(3, 5), 2)
        if sleep==True:
            time.sleep(random_number) # add stop of random duration between 3-5 seconds

    #Concatenate the list of DataFrames and reset index.
    df_concat=pd.concat(df)
    df_concat.reset_index(inplace=True, drop=True) 
        
    #CLEAN--------------------------------------------------------------------------------------------------------------
    df_cleaned=df_concat.copy()

    #Column 'Stars' should have values from 1 to 5
    df_cleaned['Stars']=df_cleaned['Stars'].apply(lambda x: x[13]).astype('int')


    #Split 'DoE' into 'DoE.day', 'DoE.month', 'DoE.Year' and convert 'DoE' to datetime.
    df_cleaned['DoE']=df_cleaned['DoE'].apply(lambda x: x[21:])
    df_cleaned[['DoE.day', 'DoE.month','DoE.year']] = df_cleaned['DoE'].str.split(' ', expand=True)
    months = {'Januar':'january', 'Februar':'february','März':'march','April':'april','Mai':'may','Juni':'june','Juli':'july','August':'august','September':'september','Oktober':'october','November':'november','Dezember':'december'}
    months_numbers = {'Januar':'1', 'Februar':'2','März':'3','April':'4','Mai':'5','Juni':'6','Juli':'7','August':'8','September':'9','Oktober':'10','November':'11','Dezember':'12'}
    df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')
    df_cleaned=df_cleaned.replace({'DoE.month': months_numbers})
    df_cleaned['DoE']=pd.to_datetime(df_cleaned['DoE.month']+'-'+df_cleaned['DoE.day']+'-'+df_cleaned['DoE.year'])
    df_cleaned=df_cleaned.astype({'DoE.day': 'int', 'DoE.month': 'int', 'DoE.year': 'int'})


    #Convert DoP to datetime and create 'DoP.day', 'DoP.month', 'DoP.Year' 
    df_cleaned['DoP']=pd.to_datetime(df_cleaned['DoP'])
    df_cleaned['DoP.day']=df_cleaned['DoP'].apply(lambda x: x.day)
    df_cleaned['DoP.month']=df_cleaned['DoP'].apply(lambda x: x.month)
    df_cleaned['DoP.year']=df_cleaned['DoP'].apply(lambda x: x.year)


    #Convert DoA to datetime and create 'DoA.day', 'DoA.month', 'DoA.Year' 
    df_cleaned['DoA']=pd.to_datetime(df_cleaned['DoA'])
    df_cleaned['DoA.day']=df_cleaned['DoA'].apply(lambda x: x.day)
    df_cleaned['DoA.month']=df_cleaned['DoA'].apply(lambda x: x.month)
    df_cleaned['DoA.year']=df_cleaned['DoA'].apply(lambda x: x.year)


    #Clean Comments and Answers
    df_cleaned['Comment']= df_cleaned['Comment'].str.replace('\n', '')
    df_cleaned['Comment']= df_cleaned['Comment'].str.replace('\r', '')
    df_cleaned['Answer']= df_cleaned['Answer'].str.replace('\n', '')
    df_cleaned['Answer']= df_cleaned['Answer'].str.replace('\r', '')


    #Check whether there is comment or answer
    df_cleaned['Comment_TF']=df_cleaned['Comment'].apply(lambda x: 0 if x==None else 1)
    df_cleaned['Answer_TF']=df_cleaned['Answer'].apply(lambda x: 0 if x==None else 1)  
    
    #Note company
    df_cleaned['Company']=company
    
    #SAVE
    df_cleaned.to_csv('C:/Users/isele/OneDrive/Desktop/Supply Chain - Customer Satisfaction/Data/'+company+suffix+'.csv')

    return df_cleaned

In [78]:
#voxenergie
k=17
url1=(es_leq250['comment'])[k]
company=(es_leq250['supplier'])[k]
scrape_clean_save_v2(company,url1,sleep=False,start_page=5,end_page=7,suffix='page_6_to_7')

iteration stopped at page 6 and company voxenergie


  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')


Unnamed: 0,Nickname,Location,Stars,Headline,DoP,DoE,Comment,Answer,DoA,Page,...,DoE.year,DoP.day,DoP.month,DoP.year,DoA.day,DoA.month,DoA.year,Comment_TF,Answer_TF,Company
0,Uwe Müller,DE,1,Die fristlose Kündigung,2022-12-15 20:36:53+00:00,2022-12-14,Die fristlose Kündigung nach einer drastischen...,,NaT,5,...,2022,15,12,2022,,,,1,0,voxenergie
1,W Koch,DE,1,"Ein Verbrecherladen, mangelhafte Kundenbetreuu...",2019-10-24 17:03:56+00:00,2019-10-24,"Ein Verbrecherladen, bin im März umgezogen, ha...",,NaT,5,...,2019,24,10,2019,,,,1,0,voxenergie
2,Gabi Nickel,DE,1,Wir haben jetzt Oktober .seit April…,2019-10-16 08:27:34+00:00,2019-10-16,Wir haben jetzt Oktober .seit April wohne ich ...,,NaT,5,...,2019,16,10,2019,,,,1,0,voxenergie
3,Reinhold Bert,DE,1,Ich habe über Voxenergie meinen Strom…,2019-06-16 05:58:41+00:00,2019-06-16,Ich habe über Voxenergie meinen Strom bezogen....,,NaT,5,...,2019,16,6,2019,,,,1,0,voxenergie
4,Emanuel Büttner,DE,4,Gut erreichbar,2023-06-14 16:35:11+00:00,2023-06-14,Alles korrekt. Gute Beratung. Auch mit dem Ang...,Vielen Dank für Ihre 4/5 Sterne Bewertung. Es ...,2023-07-18 10:53:37+00:00,5,...,2023,14,6,2023,18.0,7.0,2023.0,1,1,voxenergie
5,Scarlett,DE,1,"Extrem schlechter Kundensupport, kein Stück en...",2022-11-03 10:06:25+00:00,2022-02-21,"Ich bin absolut unzufrieden mit Voxenergie, au...",,NaT,5,...,2022,3,11,2022,,,,1,0,voxenergie
6,Sasha Katharina,DE,4,Bisher sind wir zufrieden,2023-03-20 10:15:29+00:00,2023-03-20,Bisher sind wir zufrieden. Schnelle Bearbeitun...,Wir schätzen Ihre Bewertung und Ihr Feedback s...,2023-07-18 10:55:41+00:00,5,...,2023,20,3,2023,18.0,7.0,2023.0,1,1,voxenergie
7,Dagmar,DE,1,Schlimmer als die Hütchenspieler...FINGER WEG,2020-09-29 15:00:20+00:00,2020-09-29,Der Verein ist schlimmer als die Hütchenspiel...,,NaT,5,...,2020,29,9,2020,,,,1,0,voxenergie
8,Lee,DE,1,VORSICHT,2022-08-01 21:41:43+00:00,2022-08-01,VORSICHT: Korrupt und inkompetent. Eine gefäh...,,NaT,5,...,2022,1,8,2022,,,,1,0,voxenergie
9,Stephanie Luther,DE,1,Umzuziehen - eine Katastrophe!!,2019-08-24 15:26:10+00:00,2019-08-24,Umzuziehen und dabei den Stromlieferungs-Vertr...,,NaT,5,...,2019,24,8,2019,,,,1,0,voxenergie


In [80]:
#grünwelt energie
k=2
url1=(es_leq4000['comment'])[k]
company=(es_leq4000['supplier'])[k]
scrape_clean_save_v2(company,url1,sleep=False,start_page=1,end_page=0,suffix='page_1_to_infty')

  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')


Unnamed: 0,Nickname,Location,Stars,Headline,DoP,DoE,Comment,Answer,DoA,Page,...,DoE.year,DoP.day,DoP.month,DoP.year,DoA.day,DoA.month,DoA.year,Comment_TF,Answer_TF,Company
0,T. Dohmen,DE,1,Endabrechnung nicht innerhalb der gesetzlichen...,2023-08-31 11:45:33+00:00,2023-08-31,Erfahrung mit Wärmestrom bei der Firma Grünwel...,"Hallo T. Dohmen, wir bedauern, dass Sie bisher...",2023-09-04 06:41:52+00:00,1,...,2023,31,8,2023,4.0,9.0,2023.0,1,1,Grünwelt Energie
1,H. Wiche,DE,5,Schneller und reibungsloser Vertragsabschluss,2023-08-31 13:01:37+00:00,2023-08-30,Sehr gute Internetpräsenz. Alle wichtigen Info...,"Hallo H. Wiche, vielen Dank für Ihre Bewertung...",2023-09-04 06:41:36+00:00,1,...,2023,31,8,2023,4.0,9.0,2023.0,1,1,Grünwelt Energie
2,Dadas,DE,1,Abschlussrechnung fehlt trotz mehreren Anfrage...,2023-08-31 13:09:27+00:00,2023-08-30,Ich warte jetzt seit über einen Monat auf mein...,"Hallo Dadas, wir bedauern sehr, dass Sie auf I...",2023-09-04 06:41:13+00:00,1,...,2023,31,8,2023,4.0,9.0,2023.0,1,1,Grünwelt Energie
3,A. U.,DE,4,Die Daten für den Anbieterwechsel sind…,2023-08-11 19:01:32+00:00,2023-08-10,Die Daten für den Anbieterwechsel sind schnell...,"Hallo A. U., vielen Dank, dass Sie sich die Mü...",2023-08-22 09:51:28+00:00,1,...,2023,11,8,2023,22.0,8.0,2023.0,1,1,Grünwelt Energie
4,Beisel,DE,1,Widerruf fast nicht möglich,2023-09-01 13:34:40+00:00,2023-08-25,Ich habe einen Gasvertrag abgeschlossen und wo...,"Hallo Beisel, wir bedauern sehr, dass Sie schl...",2023-08-30 07:15:04+00:00,1,...,2023,1,9,2023,30.0,8.0,2023.0,1,1,Grünwelt Energie
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1870,Andreas Haverkamp,DE,2,"Gruenwelt ist teuer, ineffizient und nicht kor...",2018-04-26 11:36:38+00:00,2018-04-26,2 nicht angekündigte Preiserhoehungen; in der ...,,NaT,94,...,2018,26,4,2018,,,,1,0,Grünwelt Energie
1871,Patrick Rittau,DE,1,Absolute FRECHHEIT unbedingt MEIDEN!,2018-04-09 13:45:46+00:00,2018-04-09,Absolute FRECHHEIT unbedingt MEIDEN!Als unsere...,,NaT,94,...,2018,9,4,2018,,,,1,0,Grünwelt Energie
1872,Kurt Fries,DE,1,Unglaubliche Geschäftspraxis!,2017-01-24 08:36:25.910000+00:00,2017-01-24,Habe Strom und Gas bei Gruenwelt. Ich schreibe...,,NaT,94,...,2017,24,1,2017,,,,1,0,Grünwelt Energie
1873,nexas,DE,1,Vorsicht! Vor Grünewelt und Stromio,2016-10-05 18:06:27+00:00,2016-10-05,Auch als Stromio bekannt: Lassen sich halt div...,,NaT,94,...,2016,5,10,2016,,,,1,0,Grünwelt Energie


In [58]:
#EWE
k=7
url1=(es_leq4000['comment'])[k]
company=(es_leq4000['supplier'])[k]
scrape_clean_save_v2(company,url1,sleep=False,start_page=98,end_page=0,suffix='page_98_to_infty')

  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')


Unnamed: 0,Nickname,Location,Stars,Headline,DoP,DoE,Comment,Answer,DoA,Page,...,DoE.year,DoP.day,DoP.month,DoP.year,DoA.day,DoA.month,DoA.year,Comment_TF,Answer_TF,Company
0,Benjamin Sommer,DE,2,Technische Mängel,2016-10-10 14:15:44+00:00,2016-10-10,An sich eine gute Idee. Leider kommt es inbeso...,"Moin Benjamin Sommer,ist sehr ärgerlich, dass ...",2016-10-10 14:40:06.123000+00:00,98,...,2016,10,10,2016,10.0,10.0,2016.0,1,1,EWE
1,Klaus aus Stelle,DE,5,Stromantrag,2016-10-10 12:29:17+00:00,2016-10-10,der Antrag für Strom läßt sich superleicht aus...,,NaT,98,...,2016,10,10,2016,,,,1,0,EWE
2,Mike Daske,DE,5,"Einfache und Schnelle Antragstellung,übersicht...",2016-10-10 12:27:14+00:00,2016-10-10,Internet Präsenz sehr gut. Leichte Antragstell...,,NaT,98,...,2016,10,10,2016,,,,1,0,EWE
3,Dieter Homp,DE,4,"Wenn man erstmal verstanden hat, wie EWE sein...",2016-10-10 11:56:56+00:00,2016-10-10,"Wenn man erstmal verstanden hat, wie EWE seine...",,NaT,98,...,2016,10,10,2016,,,,1,0,EWE
4,Klach,DE,5,"Tippi, Toppi",2016-10-06 15:21:41+00:00,2016-10-06,Vertragsänderung ohne Probleme erledigt!!!!,,NaT,98,...,2016,6,10,2016,,,,1,0,EWE
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
160,Hella Struth,DE,5,Sehr gute Beratung. sehr freundlich alles bestens,2015-10-02 08:55:26.633000+00:00,2015-10-02,immer wieder arbeite ich gerne mit der EWE,,NaT,106,...,2015,2,10,2015,,,,1,0,EWE
161,Antje Ebert,DE,4,bewertung,2015-09-24 18:16:39.942000+00:00,2015-09-24,Alles gut und zu empfehlen,,NaT,106,...,2015,24,9,2015,,,,1,0,EWE
162,Gerhard Roggenkamp,DE,3,EWE ist im Vergleich zu anderen Anbietern zu t...,2015-09-24 09:23:57+00:00,2015-09-24,Ansonsten: Der Kundendiest ist freundlich und ...,Sehr geehrter Herr Roggenkamp. Vielen Dank für...,2015-09-30 10:53:19.218000+00:00,106,...,2015,24,9,2015,30.0,9.0,2015.0,1,1,EWE
163,Kemal Eker,DE,5,Ewe ist das beste strom,2015-09-08 08:05:45.081000+00:00,2015-09-08,Ewe ist das beste,,NaT,106,...,2015,8,9,2015,,,,1,0,EWE


In [84]:
#Octopus energy
k=0
url1=(es_geq4000['comment'])[k]
company=(es_geq4000['supplier'])[k]
scrape_clean_save_v2(company,url1,sleep=False,start_page=37,end_page=0,suffix='_page_37_to_infty')

iteration stopped at page 37 and company Octopus Energy Germany


ValueError: No objects to concatenate

In [67]:
#E.ON energy
k=1
url1=(es_geq4000['comment'])[k]
company=(es_geq4000['supplier'])[k]
scrape_clean_save_v2(company,url1,sleep=True,start_page=312,end_page=0,suffix='page_312_to_infty')

  df_cleaned['DoE.day']= df_cleaned['DoE.day'].str.replace('.', '')


Unnamed: 0,Nickname,Location,Stars,Headline,DoP,DoE,Comment,Answer,DoA,Page,...,DoE.year,DoP.day,DoP.month,DoP.year,DoA.day,DoA.month,DoA.year,Comment_TF,Answer_TF,Company
0,Wetzel,DE,5,Zählerstandsmeldung Es geht einfach und schnell,2023-05-18 14:17:48+00:00,2023-05-17,Es geht einfach und schnell. Bitte Anfrage zum...,"Lieber Trustpilot Nutzer, wir freuen uns sehr...",2023-05-19 11:03:33+00:00,312,...,2023,18,5,2023,19,5,2023,1,1,E.ON Energie Deutschland GmbH
1,güro,DE,1,Ein miserables Unternehmen,2023-05-18 13:26:46+00:00,2023-05-18,Ein miserables Unternehmen. Ich kann nur jedem...,"Lieber Trustpilot Nutzer, vielen Dank für de...",2023-05-19 11:03:52+00:00,312,...,2023,18,5,2023,19,5,2023,1,1,E.ON Energie Deutschland GmbH
2,Sch,DE,5,Der Stromwechsel ging völlig…,2023-05-18 12:55:56+00:00,2023-05-17,"Der Stromwechsel ging völlig problemlos, bis j...","Lieber Trustpilot Nutzer, wir freuen uns sehr...",2023-05-19 11:04:08+00:00,312,...,2023,18,5,2023,19,5,2023,1,1,E.ON Energie Deutschland GmbH
3,Roman,DE,1,Antworten nicht auf Anfragen über…,2023-05-18 12:22:22+00:00,2023-04-29,Antworten nicht auf Anfragen über derenOnline ...,"Lieber Trustpilot Nutzer, vielen Dank für de...",2023-05-19 11:04:23+00:00,312,...,2023,18,5,2023,19,5,2023,1,1,E.ON Energie Deutschland GmbH
4,Herr Rieß,DE,5,Sehr einfach in der Eingabe,2023-05-18 10:55:46+00:00,2023-05-15,Sehr einfach in der Eingabe. Alle Infos sofort...,"Lieber Trustpilot Nutzer, wir freuen uns sehr...",2023-05-19 11:04:41+00:00,312,...,2023,18,5,2023,19,5,2023,1,1,E.ON Energie Deutschland GmbH
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6835,Herbert Beenen,DE,1,Mir hat nicht gefallen,2019-01-02 16:30:31+00:00,2019-01-02,"Mir hat nicht gefallen, dass ich meiner Wechse...","Hallo Herr Beenen, Sie haben sich über uns geä...",2019-03-28 09:39:43.924000+00:00,653,...,2019,2,1,2019,28,3,2019,1,1,E.ON Energie Deutschland GmbH
6836,Mustafa Özgür,DE,1,Das der versprochene Bonus nicht dem…,2019-01-02 15:04:22+00:00,2019-01-02,Das der versprochene Bonus nicht dem gewährten...,"Hallo Herr Özgür, es tut uns sehr leid, dass S...",2019-03-28 09:51:35.824000+00:00,653,...,2019,2,1,2019,28,3,2019,1,1,E.ON Energie Deutschland GmbH
6837,Dirk Köster,DE,1,Aufgrund einer E.ON,2018-12-31 17:00:09+00:00,2018-12-31,Aufgrund einer E.ON - Vertragsklausel lief mei...,"Hallo Herr Köster,wir sind leider spät dran :(...",2019-04-01 10:56:10.236000+00:00,653,...,2018,31,12,2018,1,4,2019,1,1,E.ON Energie Deutschland GmbH
6838,Heiko H.,DE,1,Ein sehr schlechter Service .Hotline…,2018-12-31 09:33:44+00:00,2018-12-31,Ein sehr schlechter Service .Hotline kann man ...,"Hallo Heiko. H,wir sind leider spät dran :((En...",2019-04-01 10:56:00.523000+00:00,653,...,2018,31,12,2018,1,4,2019,1,1,E.ON Energie Deutschland GmbH


In [88]:
#MONTANA group
k=1
url1=(es_leq4000['comment'])[k]
company=(es_leq4000['supplier'])[k]
scrape_clean_save_v2(company,url1,sleep=False,start_page=1,end_page=52,suffix='page_53_to_infty')

iteration stopped at page 1 and company MONTANA Group


ValueError: No objects to concatenate

## Final Data Frame

In [89]:
display(es)

Unnamed: 0,supplier,city,country,cat,score,votes,comment
0,Octopus Energy Germany,München,Deutschland,Ökostromanbieter Energieanbieter Stromversorgu...,4.8,8042,https://de.trustpilot.com/review/octopusenergy.de
1,Ostrom,Berlin,Deutschland,Energieanbieter Ökostromanbieter Stromversorgu...,4.8,1598,https://de.trustpilot.com/review/ostrom.de
2,Rabot Charge,Hamburg,Deutschland,Energieversorger Energieanbieter Stromversorgu...,4.3,174,https://de.trustpilot.com/review/rabot-charge.de
3,MONTANA Group,Grünwald,Deutschland,Energieanbieter Mineralölunternehmen Kraftstof...,4.0,3146,https://de.trustpilot.com/review/montana-energ...
4,E.ON Energie Deutschland GmbH,München,Deutschland,Energieversorger Stromversorgungsunternehmen Ö...,3.7,13223,https://de.trustpilot.com/review/eon.de
5,Grünwelt Energie,Kaarst,Deutschland,Stromversorgungsunternehmen,3.6,1964,https://de.trustpilot.com/review/www.gruenwelt.de
6,RheinEnergie,Köln,Deutschland,Ökostromanbieter Energieanbieter Gasversorgung...,3.4,528,https://de.trustpilot.com/review/rheinenergie.com
7,badenova,Freiburg im Breisgau,Deutschland,Stromversorgungsunternehmen Energieanbieter Ga...,2.7,241,https://de.trustpilot.com/review/www.badenova.de
8,pricewise.de,Heidelberg,Deutschland,Gasversorgungsunternehmen Stromversorgungsunte...,4.8,119,https://de.trustpilot.com/review/www.prizewize.de
9,DFM-Select GmbH,Metzingen,Deutschland,Anbieter von Elektronikbauteilen Technischer K...,4.6,22,https://de.trustpilot.com/review/dfm-select.de


In [95]:
df=[]

for k in range(len(es)):
    company=(es['supplier'])[k]
    df.append(pd.read_csv('C:/Users/isele/OneDrive/Desktop/Supply Chain - Customer Satisfaction/Data/Concatenated Data/'+company+'.csv',index_col=0))

df_concat=pd.concat(df)
df_concat.reset_index(inplace=True, drop=True) 
df_concat.drop_duplicates(inplace=True)


In [96]:
len(df_concat)

45180

In [97]:
display(df_concat)

Unnamed: 0,Nickname,Location,Stars,Headline,DoP,DoE,Comment,Answer,DoA,Page,...,DoE.year,DoP.day,DoP.month,DoP.year,DoA.day,DoA.month,DoA.year,Comment_TF,Answer_TF,Company
0,Paul,DE,5,Seriös und preiswert:,2023-08-29 15:00:29+00:00,2023-08-29,Seriös und preiswert:nach einer ziemlich unang...,,,1,...,2023,29,8,2023,,,,1,0,Octopus Energy Germany
1,Anton,DE,5,Tarif mit vernünftigem Preis,2023-08-30 11:20:02+00:00,2023-08-22,Tarif mit vernünftigem Preis bei niedrigem Ein...,,,1,...,2023,30,8,2023,,,,1,0,Octopus Energy Germany
2,Tobias,DE,5,Ringo Star,2023-08-29 10:00:40+00:00,2023-08-21,"Supi Anbieterwechsel,klappt alles mit dem vora...",,,1,...,2023,29,8,2023,,,,1,0,Octopus Energy Germany
3,Dirk Meinel,DE,5,Schritt für Schritt transparent,2023-08-30 09:18:07+00:00,2023-08-27,Erstmalig habe ich mich bei Octopus Engergy an...,,,1,...,2023,30,8,2023,,,,1,0,Octopus Energy Germany
4,Marianne Bäßler,DE,5,HOHE ZUFRIEDENHEIT,2023-08-29 17:30:02+00:00,2023-08-29,HOHE ZUFRIEDENHEIT BEIM NEUEN ANBIETER OCTOPU...,,,1,...,2023,29,8,2023,,,,1,0,Octopus Energy Germany
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
45175,Hella Struth,DE,5,Sehr gute Beratung. sehr freundlich alles bestens,2015-10-02 08:55:26.633000+00:00,2015-10-02,immer wieder arbeite ich gerne mit der EWE,,,106,...,2015,2,10,2015,,,,1,0,EWE
45176,Antje Ebert,DE,4,bewertung,2015-09-24 18:16:39.942000+00:00,2015-09-24,Alles gut und zu empfehlen,,,106,...,2015,24,9,2015,,,,1,0,EWE
45177,Gerhard Roggenkamp,DE,3,EWE ist im Vergleich zu anderen Anbietern zu t...,2015-09-24 09:23:57+00:00,2015-09-24,Ansonsten: Der Kundendiest ist freundlich und ...,Sehr geehrter Herr Roggenkamp. Vielen Dank für...,2015-09-30 10:53:19.218000+00:00,106,...,2015,24,9,2015,30.0,9.0,2015.0,1,1,EWE
45178,Kemal Eker,DE,5,Ewe ist das beste strom,2015-09-08 08:05:45.081000+00:00,2015-09-08,Ewe ist das beste,,,106,...,2015,8,9,2015,,,,1,0,EWE


In [98]:
df_concat.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 45180 entries, 0 to 45179
Data columns (total 22 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Nickname    45178 non-null  object 
 1   Location    45180 non-null  object 
 2   Stars       45180 non-null  int64  
 3   Headline    45180 non-null  object 
 4   DoP         45180 non-null  object 
 5   DoE         45180 non-null  object 
 6   Comment     38931 non-null  object 
 7   Answer      28578 non-null  object 
 8   DoA         28578 non-null  object 
 9   Page        45180 non-null  int64  
 10  DoE.day     45180 non-null  int64  
 11  DoE.month   45180 non-null  int64  
 12  DoE.year    45180 non-null  int64  
 13  DoP.day     45180 non-null  int64  
 14  DoP.month   45180 non-null  int64  
 15  DoP.year    45180 non-null  int64  
 16  DoA.day     28578 non-null  float64
 17  DoA.month   28578 non-null  float64
 18  DoA.year    28578 non-null  float64
 19  Comment_TF  45180 non-nul

In [102]:
len(df_concat['DoP'].unique())

45081

### Save

In [103]:
df_concat.to_csv('C:/Users/isele/OneDrive/Desktop/Supply Chain - Customer Satisfaction/Data/all_suppliers_data.csv')