**Importing Required Libraries**

In [1]:
from tqdm import tqdm
from google_play_scraper import Sort, reviews, app
import json

from pygments import highlight
from pygments.lexers import JsonLexer
from pygments.formatters import TerminalFormatter 

import pandas as pd

import warnings
warnings.filterwarnings('ignore')

**Let's use the apps again for reviews that fit the criteria from the Productivity category.**

In [2]:
app_packages = [
  'com.jio.myjio',
  'share.sharekaro.pro',
  'com.speedifylink.fastest',
  'com.mventus.selfcare.activity',
  'com.document.reader.pdfreader.pdf',
  'com.downlood.sav.whmedia',
  'com.cambyte.okenscan',
  'com.digilocker.android',
  'com.nic.mparivahan',
  'com.taymay.pdf.scanner',
  'cn.wps.moffice_eng',
  'com.documentreader.documentapp.filereader',
  'com.lightning.safespeed',
  'com.cv.docscanner',
  'com.adobe.reader'
]

**Let's scrape the reviews for each app. We can satisfy the first requirement by using the scraping package option to filter the review score. For the second, we'll sort the reviews by their helpfulness, which are the reviews that Google Play thinks are most important. Just in case, we'll get a subset from the newest, too.**

In [3]:
app_reviews = []
for app_link in tqdm(app_packages):
    for score in list(range(1, 6)):
        for sort_order in [Sort.MOST_RELEVANT, Sort.NEWEST]:
            try:
                rvs, _ = reviews(
                app_link,
                lang='en',
                country='in',
                sort=sort_order,
                count= 200 if score == 3 else 100,
                filter_score_with=score
                )
                for r in rvs:
                    r['sortOrder'] = 'most_relevant' if sort_order == Sort.MOST_RELEVANT else 'newest'
                    r['appId'] = app_link
                app_reviews.extend(rvs)
            except:
                print("App reviews not found(404).")
                

100%|██████████████████████████████████████████████████████████████████████████████████| 15/15 [02:54<00:00, 11.63s/it]


**We got the reviews for all 15 apps. Let's write a helper function that prints JSON objects a bit better.**

In [4]:
def print_json(json_object):
    json_str = json.dumps(
    json_object,
    indent=2,
    sort_keys=True,
    default=str
    )
    print(highlight(json_str, JsonLexer(), TerminalFormatter())) 

In [5]:
print_json(app_reviews[0])

{[37m[39;49;00m
[37m  [39;49;00m[94m"appId"[39;49;00m:[37m [39;49;00m[33m"com.jio.myjio"[39;49;00m,[37m[39;49;00m
[37m  [39;49;00m[94m"at"[39;49;00m:[37m [39;49;00m[33m"2022-04-23 11:13:17"[39;49;00m,[37m[39;49;00m
[37m  [39;49;00m[94m"content"[39;49;00m:[37m [39;49;00m[33m"Whenever there is bad weather or a power cut, there is no network in my area. Some of us are doing WFH, and I have missed many deadlines because of this. This problem has always been there but it appears that they never try to fix it. I've raised this issue many times before, but I got the same old scripted reply stating that I should change my setting. Time to switch it to Airtel, at least they don't have any network issue."[39;49;00m,[37m[39;49;00m
[37m  [39;49;00m[94m"repliedAt"[39;49;00m:[37m [39;49;00m[33m"2022-04-24 18:11:04"[39;49;00m,[37m[39;49;00m
[37m  [39;49;00m[94m"replyContent"[39;49;00m:[37m [39;49;00m[33m"Hi, we desire to build your experience better an

In [6]:
len(app_reviews)

15012

**Let's save the reviews to a CSV file.**

In [7]:
app_reviews_df = pd.DataFrame(app_reviews)
app_reviews_df.to_csv('app_reviews.csv', index=None, header=True)