# Data Scraping - Insurance App Ratings on Google Play Store

### Introduction 

In this digital era, Mobile applications offer a convenient and seamless channel to buy and renew insurance policies. Customers can quickly and easily lodge claims and get prompt service. The applications are effective communication tools for consultations and notifications. With these applications, companies can build loyalty, increase customer engagement and offer customized products. Hence, it is crucial for companies to build an application that lives up to these expectations. 

Almost all General Insurance(GI) companies have a mobile application on Play store. We will leverage Play Store reviews to analyze customers' feedback to know what is working, what are the challenges and how companies can address those challenges. 

In this notebook, we will scrape Play Store ratings of Indian General Insurance Companies.

In [1]:
from google_play_scraper import app, Sort, reviews_all

import pandas as pd
import numpy as np
import time
from tqdm import tqdm


In [2]:
app_ids = { 'icici_lombard' : 'icici.lombard.ghi',
            'bajaj_allianz' : 'com.ba.cp.controller',
            'hdfc_ergo' : 'com.pms.activity',
            'iffco_tokio' : 'com.iffcotokio.CustomerApp',
            'reliance_general' : 'com.rgi.customerapp.live',
            'sbi_general' : 'com.sbig.insurance',
            'tata_aig' : 'com.tataaig.android',
            'future_generali' : 'com.futuregenerali.fginsure',
            'kotak_mahindra' : 'io.cordova.myapp53513c',
            'universal_sompo' : 'com.universalsompo.meta',
            'royal_sundaram' : 'com.rssync',
            'shriram_general' : 'com.sgi.project.android.live',
            'liberty_general' : 'com.lvgi.livmobile',
            'magma_hdi' : 'net.fhpl.magmahealth',

            # insuretechs
            'digit' : 'com.godigit.digit',
            'acko' : 'com.acko.android',
            'navi' : 'com.navi.insurance',
           
          }

Getting app details

source : https://pypi.org/project/google-play-scraper/

In [3]:
app_details = []
for idx in tqdm(app_ids) :
    info = app(app_ids[idx], lang='en', 
               country='in')
    del info['comments']
    app_details.append(info)


100%|██████████████████████████████████████████████████████████████████████████████████| 17/17 [01:24<00:00,  4.99s/it]


In [4]:
df_app_details = pd.DataFrame(app_details)
df_app_details.head()

Unnamed: 0,title,description,descriptionHTML,summary,summaryHTML,installs,minInstalls,score,ratings,reviews,...,adSupported,containsAds,released,updated,version,recentChanges,recentChangesHTML,editorsChoice,appId,url
0,ILTakeCare: Insurance & Wellness Needs,"Now, renew or buy bike, car, and health insura...","Now, renew or buy bike, car, and health insura...","An app to take care of all your health, car & ...","An app to take care of all your health, car &a...","500,000+",500000,4.009925,8035,4647,...,,False,"May 29, 2019",1629814871,2.0.40,Our latest app build is equipped with new feat...,Our latest app build is equipped with new feat...,False,icici.lombard.ghi,https://play.google.com/store/apps/details?id=...
1,Caringly Yours,Caringly Yours is a mobile app platform by Baj...,Caringly Yours is a mobile app platform by Baj...,Bajaj Allianz General Insurance Caringly Yours,Bajaj Allianz General Insurance Caringly Yours,"1,000,000+",1000000,4.034217,20468,7750,...,,False,"Jan 9, 2015",1629890013,16.2,,,False,com.ba.cp.controller,https://play.google.com/store/apps/details?id=...
2,HDFC ERGO Insurance App,Manage all your insurance needs at one place w...,Manage all your insurance needs at one place w...,Keep track of your insurance polices on your m...,Keep track of your insurance polices on your m...,"1,000,000+",1000000,3.820928,18368,8001,...,,False,"Dec 15, 2011",1627022095,7.12,Performance enhancements and bug fixes.,Performance enhancements and bug fixes.,False,com.pms.activity,https://play.google.com/store/apps/details?id=...
3,IFFCO Tokio - Customer,CUSTOMER APP\r\nFrom buying and renewing polic...,CUSTOMER APP<br>From buying and renewing polic...,Muskurate Raho,Muskurate Raho,"100,000+",100000,3.015038,1336,715,...,,False,"Feb 12, 2018",1621250042,4.3.5,Bug Fixes & enhancements.,Bug Fixes &amp; enhancements.,False,com.iffcotokio.CustomerApp,https://play.google.com/store/apps/details?id=...
4,Reliance Self-i,The Reliance Self-i App from Reliance General ...,The Reliance Self-i App from Reliance General ...,Simplify your Insurance Claims and Policy Rene...,Simplify your Insurance Claims and Policy Rene...,"100,000+",100000,3.542071,6166,2892,...,,False,"Apr 20, 2018",1629952199,1.0.73,,,False,com.rgi.customerapp.live,https://play.google.com/store/apps/details?id=...


In [5]:
df_app_details.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17 entries, 0 to 16
Data columns (total 49 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   title                     17 non-null     object 
 1   description               17 non-null     object 
 2   descriptionHTML           17 non-null     object 
 3   summary                   17 non-null     object 
 4   summaryHTML               17 non-null     object 
 5   installs                  17 non-null     object 
 6   minInstalls               17 non-null     int64  
 7   score                     17 non-null     float64
 8   ratings                   17 non-null     int64  
 9   reviews                   17 non-null     int64  
 10  histogram                 17 non-null     object 
 11  price                     17 non-null     int64  
 12  free                      17 non-null     bool   
 13  currency                  17 non-null     object 
 14  sale        

In [6]:
%timeit
all_app_reviews = []

for idx in tqdm(app_ids) :
    # Get all reviews for the app
    app_reviews = reviews_all(
        app_ids[idx],
        sleep_milliseconds=2,
        lang='en', 
        country='in', 
        sort=Sort.NEWEST)
    # append appID to each review
    for review in app_reviews:
        review['appId'] = app_ids[idx]
        all_app_reviews.append(review)
    print('Company : {} Reviews: {}'.format(idx,len(app_reviews)))
    time.sleep(2)
    

  0%|                                                                                           | 0/17 [00:00<?, ?it/s]

Company : icici_lombard Reviews: 4624


  6%|████▉                                                                              | 1/17 [01:11<19:04, 71.52s/it]

Company : bajaj_allianz Reviews: 7716


 12%|█████████▋                                                                        | 2/17 [04:05<32:56, 131.76s/it]

Company : hdfc_ergo Reviews: 7967


 18%|██████████████▍                                                                   | 3/17 [07:02<35:34, 152.50s/it]

Company : iffco_tokio Reviews: 710


 24%|███████████████████▌                                                               | 4/17 [07:21<21:38, 99.87s/it]

Company : reliance_general Reviews: 2879


 29%|████████████████████████▍                                                          | 5/17 [08:28<17:33, 87.76s/it]

Company : sbi_ggeneral Reviews: 447


 35%|█████████████████████████████▎                                                     | 6/17 [08:42<11:30, 62.78s/it]

Company : tata_aig Reviews: 110


 41%|██████████████████████████████████▏                                                | 7/17 [08:50<07:30, 45.03s/it]

Company : future_generali Reviews: 121


 47%|███████████████████████████████████████                                            | 8/17 [08:57<04:56, 32.91s/it]

Company : kotak_mahindra Reviews: 360


 53%|███████████████████████████████████████████▉                                       | 9/17 [09:15<03:44, 28.05s/it]

Company : universal_sompo Reviews: 248


 59%|████████████████████████████████████████████████▏                                 | 10/17 [09:27<02:42, 23.22s/it]

Company : royal_sundaram Reviews: 432


 65%|█████████████████████████████████████████████████████                             | 11/17 [09:40<02:00, 20.02s/it]

Company : shriram_general Reviews: 72


 71%|█████████████████████████████████████████████████████████▉                        | 12/17 [09:50<01:25, 17.05s/it]

Company : liberty_general Reviews: 306


 76%|██████████████████████████████████████████████████████████████▋                   | 13/17 [10:09<01:10, 17.65s/it]

Company : magma_hdi Reviews: 67


 82%|███████████████████████████████████████████████████████████████████▌              | 14/17 [10:15<00:41, 13.96s/it]

Company : digit Reviews: 413


 88%|████████████████████████████████████████████████████████████████████████▎         | 15/17 [10:27<00:26, 13.45s/it]

Company : acko Reviews: 4219


 94%|█████████████████████████████████████████████████████████████████████████████▏    | 16/17 [11:43<00:32, 32.29s/it]

Company : navi Reviews: 459


100%|██████████████████████████████████████████████████████████████████████████████████| 17/17 [12:02<00:00, 42.48s/it]


In [7]:
df_reviews = pd.DataFrame(all_app_reviews)
df_reviews.drop(columns=['userName','userImage'])
df_reviews.shape

(31150, 11)

In [8]:
df_reviews.head()

Unnamed: 0,reviewId,userName,userImage,content,score,thumbsUpCount,reviewCreatedVersion,at,replyContent,repliedAt,appId
0,gp:AOqpTOF86De47sxTLcSz5dRWRlYNfO2IN5pk3baCjf8...,Nikki Tamboli,https://play-lh.googleusercontent.com/a/AATXAJ...,Excellent.. The service and reaction were exce...,5,0,,2021-08-26 22:29:04,,NaT,icici.lombard.ghi
1,gp:AOqpTOH09z1vHSV2KRUcILU1eSsx-5HzWABJDC2pCW6...,Sarita Chauhan,https://play-lh.googleusercontent.com/a/AATXAJ...,"The app is simple and tidy, with a focus on th...",5,0,,2021-08-26 22:28:35,,NaT,icici.lombard.ghi
2,gp:AOqpTOGYmB7z6aIyAFtbD4YheqYzuDrtrQcYBBJSkmZ...,Asish Mandal,https://play-lh.googleusercontent.com/a-/AOh14...,It is simple to select and purchase insurance....,5,0,,2021-08-26 22:27:21,,NaT,icici.lombard.ghi
3,gp:AOqpTOFEDySNZlgqJByDGyF8C8WMT6pJhqcgAhMokeI...,Jatt Zimidaar,https://play-lh.googleusercontent.com/a/AATXAJ...,I've used this app for my car insurance a few ...,5,0,,2021-08-26 22:26:59,,NaT,icici.lombard.ghi
4,gp:AOqpTOFE6WGoCMTXBlT57sZBjZIxwH0zsmyBH8D2C14...,V g,https://play-lh.googleusercontent.com/a/AATXAJ...,It is wrost insurance company . One of my frie...,1,0,2.0.40,2021-08-26 21:21:10,,NaT,icici.lombard.ghi


In [9]:
df_reviews.tail()

Unnamed: 0,reviewId,userName,userImage,content,score,thumbsUpCount,reviewCreatedVersion,at,replyContent,repliedAt,appId
31145,gp:AOqpTOGWW-s9kWpa4Lsz-4lAzZFa9grKuu0o1jZqELf...,Siddharth Shukla,https://play-lh.googleusercontent.com/a/AATXAJ...,"A first of its kind, completely digital produc...",5,61,0.0.8,2020-12-23 19:10:22,"Hi,\nThanks for your feedback. We are continuo...",2021-04-20 10:17:43,com.navi.insurance
31146,gp:AOqpTOEn-uWSimaiYMqInVa9iXgnku2CtuKRvAc9lQU...,Nimesh Agarwal,https://play-lh.googleusercontent.com/a-/AOh14...,Very quick and easy way to purchase health ins...,5,242,0.0.7,2020-12-21 15:35:00,It is delightful to hear such positive words a...,2021-04-20 10:17:54,com.navi.insurance
31147,gp:AOqpTOFEOCRfrsxOnzXBQIoERAdFSLc_Q1jlDo7QYxC...,Swapnil Pawar,https://play-lh.googleusercontent.com/a-/AOh14...,A unique interactive way to buy Health Insuran...,5,294,0.0.7,2020-12-21 00:37:24,"Hi,\nThanks for your feedback. We are continuo...",2021-04-20 10:18:02,com.navi.insurance
31148,gp:AOqpTOGMOuq2ohN_jzVxPiya09XSj_vM3Z5qiTF4WKw...,RISHI PRATAP SINGH SISODIYA,https://play-lh.googleusercontent.com/a/AATXAJ...,Easy and Fast!!,5,36,0.0.7,2020-12-20 23:35:24,"Hi, thank you very much for your feedback.",2021-04-20 10:18:14,com.navi.insurance
31149,gp:AOqpTOEt3PPJluXxGDbJxjjsJlLsrXq-Fxgvg3jQgqR...,Sumit Jhawar,https://play-lh.googleusercontent.com/a-/AOh14...,Best experience ever of buying a insurance pol...,5,414,0.0.7,2020-12-20 01:58:09,It is delightful to hear such positive words a...,2021-04-20 10:18:29,com.navi.insurance


In [10]:
df_reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31150 entries, 0 to 31149
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   reviewId              31150 non-null  object        
 1   userName              31150 non-null  object        
 2   userImage             31150 non-null  object        
 3   content               31146 non-null  object        
 4   score                 31150 non-null  int64         
 5   thumbsUpCount         31150 non-null  int64         
 6   reviewCreatedVersion  27294 non-null  object        
 7   at                    31150 non-null  datetime64[ns]
 8   replyContent          20849 non-null  object        
 9   repliedAt             20849 non-null  datetime64[ns]
 10  appId                 31150 non-null  object        
dtypes: datetime64[ns](2), int64(2), object(7)
memory usage: 2.6+ MB


We are mainly interested in below fields
1. reviewId - unique identifier for reviews
2. content - review text
3. score - customer rating between 1-5
4. at - time of review posting
5. repliedAt - time of reply to review
6. replyContent - text of reply from company

Saving the data for Data Wrangling step.

In [11]:
review_path = '../data/reviews.csv'
df_reviews.to_csv(review_path,  index=None, header=True)

app_details_path = '../data/app_details.csv'
df_app_details.to_csv(app_details_path,  index=None, header=True)

