<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Preprocessing,-exploration-and-export-of-app-reviews-[Plume---USA]" data-toc-modified-id="Preprocessing,-exploration-and-export-of-app-reviews-[Plume---USA]-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Preprocessing, exploration and export of app reviews [Plume - USA]</a></span><ul class="toc-item"><li><span><a href="#Load-data" data-toc-modified-id="Load-data-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Load data</a></span></li><li><span><a href="#Ratings" data-toc-modified-id="Ratings-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Ratings</a></span></li><li><span><a href="#Detect-language" data-toc-modified-id="Detect-language-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Detect language</a></span></li><li><span><a href="#Sort-data" data-toc-modified-id="Sort-data-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Sort data</a></span></li><li><span><a href="#Export-data" data-toc-modified-id="Export-data-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Export data</a></span></li></ul></li></ul></div>

# Preprocessing, exploration and export of app reviews [Plume - USA]

Here reviews are taken from the US appstore: [Plume](https://apps.apple.com/us/app/plume-air-report-pollution/id950289243). We proceed in a similar way as for the reviews collected on the French appstore.

In [1]:
import pandas as pd
from langdetect import detect
import warnings
warnings.filterwarnings('ignore')
import os

## Load data

In [2]:
path = os.getcwd()
filename ='app_reviews_plume_us.json' 

In [3]:
df = pd.read_json(path+"/../data/0_scraped_data/"+filename)

In [4]:
df.head()

Unnamed: 0,review_id,rating,title,review_date,user_name,review,response_id,dev_response,response_date
0,1977613148,5,Everyone should be using this APP,2017-12-07T14:41:53Z,KC3M,I've been using Plume for almost a year and ha...,,,
1,2030111284,4,Look no further!,2017-12-26T22:04:10Z,Jacques Matineau,This app does what is says and quite well. \n\...,2145616.0,Many thanks for your review and detailed feedb...,2017-12-27T16:33:02Z
2,3969994802,5,So far so good!,2019-04-04T21:48:24Z,littleapples,Just learned about this and got it on the basi...,,,
3,2163867122,5,Great if you have asthma,2018-02-03T23:25:26Z,Much Kuler,I use this daily so I know the best time to go...,2469302.0,Thank you very much for your suggestion. We va...,2018-02-01T14:29:16Z
4,3677839043,5,Phenomenal and Life Saving,2019-01-21T14:37:57Z,MDTannen,Plume worked so well to protect my family memb...,,,


In [5]:
df.columns

Index(['review_id', 'rating', 'title', 'review_date', 'user_name', 'review',
       'response_id', 'dev_response', 'response_date'],
      dtype='object')

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 130 entries, 0 to 129
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   review_id      130 non-null    int64  
 1   rating         130 non-null    int64  
 2   title          130 non-null    object 
 3   review_date    130 non-null    object 
 4   user_name      130 non-null    object 
 5   review         130 non-null    object 
 6   response_id    16 non-null     float64
 7   dev_response   16 non-null     object 
 8   response_date  16 non-null     object 
dtypes: float64(1), int64(2), object(6)
memory usage: 9.3+ KB


In [7]:
df['review']

0      I've been using Plume for almost a year and ha...
1      This app does what is says and quite well. \n\...
2      Just learned about this and got it on the basi...
3      I use this daily so I know the best time to go...
4      Plume worked so well to protect my family memb...
                             ...                        
125    The AQI is consistently reading 50-100 points ...
126    The app isn’t reporting the same stats as the ...
127    Plume is a great resource to have (I live in S...
128    Look assez vieux. Appli pas tres innovante et ...
129    Update (2015/12/22)\nLooks like Austin has bee...
Name: review, Length: 130, dtype: object

## Ratings

In [8]:
# assess the distribution of ratings
df['rating'].value_counts()

5    61
1    26
3    18
4    13
2    12
Name: rating, dtype: int64

In [9]:
# assess mean rating
df['rating'].mean()

3.546153846153846

## Detect language

In [10]:
# Define a function to identify language and catch exceptions
def lang_detect(text):
    # use deterministic approach for language detection
    from langdetect import DetectorFactory
    DetectorFactory.seed = 0
    try:
        return detect(text)
    except:
        return "language not detected"

In [11]:
# Detect the language used in the reviews
df['lang-r'] = df['review'].apply(lang_detect)

In [12]:
# What are the detected languages?
df['lang-r'].unique()

array(['en', 'ro', 'af', 'es', 'fr'], dtype=object)

In [13]:
len(df[df['lang-r']!='en'])

5

In [14]:
df[df['lang-r']!='en']

Unnamed: 0,review_id,rating,title,review_date,user_name,review,response_id,dev_response,response_date,lang-r
55,1552482348,5,Great information,2017-02-25T18:32:59Z,Omar Jonguitud,Quite useful,,,,ro
68,5182439168,5,Stay safe with Plume,2019-11-22T18:20:14Z,City Otter NYC,After,,,,af
120,1576466746,5,Muy útil,2017-03-30T16:46:55Z,Cadavis94,Siempre quise un app que brinde información so...,,,,es
122,1311927439,5,NO TE ASFIXIES!,2016-01-06T02:36:47Z,«OpuS»,Excelente App para aminorar los posibles efect...,,,,es
128,1362931952,2,Bof,2016-04-13T14:44:38Z,Jlex78,Look assez vieux. Appli pas tres innovante et ...,,,,fr


## Sort data

In [22]:
dfout = df[(df['lang-r']!='es')&(df['lang-r']!='fr')]

In [23]:
len(dfout)

127

In [24]:
dfout = dfout.sort_values(by=['rating','review_date'], ascending = False)

In [25]:
dfout

Unnamed: 0,review_id,rating,title,review_date,user_name,review,response_id,dev_response,response_date,lang-r
34,5860156783,5,Crashes,2020-04-25T03:09:18Z,rivrrat,Update: Version 3.0.1 seems to have fixed the ...,14890549.0,"Hi,\n\nThank you very much for your feedback. ...",2020-04-24T12:45:36Z,en
38,5543359500,5,Depend upon it,2020-02-17T12:15:35Z,Miss A Step,I use this app all day long. Living in a high...,,,,en
50,5223823949,5,"Climate Change Is Here, Folks!",2019-12-02T06:44:26Z,AarCox2019,A great app and fun daily reminder of just how...,,,,en
68,5182439168,5,Stay safe with Plume,2019-11-22T18:20:14Z,City Otter NYC,After,,,,af
49,5155963285,5,Refreshing,2019-11-17T05:31:36Z,Not So Fearless Reviewer,Always nice to know when it's safe to go outside.,,,,en
...,...,...,...,...,...,...,...,...,...,...
115,1282718171,1,Zero data for Austin? really?,2015-11-06T12:44:36Z,Shaktiboi,"Looks like a nice interface but says ""no recen...",,,,en
104,1264412663,1,Inaccurate,2015-09-29T11:40:32Z,Wash DC20009,This App does not have good data and is inaccu...,,,,en
114,1258985406,1,Unrealiable,2015-09-18T02:00:52Z,Asthma bro,Downloaded this app to help with my asthma. Fi...,,,,en
117,1258985173,1,Not good.,2015-09-18T02:00:12Z,GreenFreak1994,Poor USer experience and USer interface. Not r...,,,,en


## Export data

In [26]:
export_filename = filename[:-5]+'.csv'
print(export_filename)

app_reviews_plume_us.csv


In [27]:
# exporta to csv
dfout.to_csv(path+"/../data/1_preprocessed_data/"+export_filename, encoding='utf-8-sig', sep =';')