# **PROJECT DESCRIPTION**
**Project Title**: Sentiment Analysis of Simcity BuiltId App Reviews

**Project Description**:<br>
This project aims to analyze the sentiment of user reviews for the Simcity BuiltId app on the Google Play Store, providing insights into users’ experiences and overall satisfaction with the app. The sentiment analysis will categorize reviews as either positive or negative, allowing us to identify prevalent themes and common user concerns.

**Data Collection**:<br>
The data will be collected by scraping reviews from the Google Play Store using the google-play-scraper Python library. This ensures real-time data acquisition directly from users, reflecting the most current feedback.

**Methodology**:<br>
The project will involve the following components and methodologies:
1. **Sentiment Classification Models**:<br>
Three machine learning and deep learning models will be utilized for sentiment classification:
    * Naive Bayes
    * Decision Tree
    * LSTM (Long Short-Term Memory Neural Network)
2. **Feature Extraction Techniques**: <br>
    * TF-IDF (Term Frequency-Inverse Document Frequency) for transforming text data into numerical features.
    * Word2Vec Word Embeddings for capturing semantic relationships between words.
3. **Data Splitting Strategy**: <br>
To evaluate model performance, the data will be split into training and testing sets using two different ratios:
    * 80/20 split (80% training, 20% testing)
    * 70/30 split (70% training, 30% testing)
4. **Model Training Configurations**: <br>
The following combinations of models, feature extraction, and data splits will be tested:
    * `Naive Bayes` with `TF-IDF` and `80/20` split
    * `Decision Tree` with `TF-IDF` and `70/30` split
    * `LSTM` with `Word2Vec` and `80/20` split

**Project Objective:** <br>
This analysis aims to uncover the user experience with the Simcity BuiltId app by analyzing sentiment patterns in reviews. Insights gained from the sentiment analysis will highlight common positive aspects as well as frequent areas of user dissatisfaction, which can inform app development, customer support strategies, and user engagement.

# **1. Import Library**

In [1]:
!pip install google-play-scraper

Collecting google-play-scraper
  Downloading google_play_scraper-1.2.7-py3-none-any.whl.metadata (50 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.2/50.2 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading google_play_scraper-1.2.7-py3-none-any.whl (28 kB)
Installing collected packages: google-play-scraper
Successfully installed google-play-scraper-1.2.7


In [2]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import json

from google_play_scraper import reviews_all, Sort, app

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# **2. Scraping Dataset**

We scrap review dataset from google-play-scraper library. Thus, we should install it first

Below code for printing the description of the hbo application.

In [3]:
app_id = 'com.ea.game.simcitymobile_row'
appdesc = app(
    app_id,
    lang='id', country='id'
)

appdesc

{'title': 'SimCity BuildIt',
 'description': 'Selamat datang, Wali Kota! Jadilah pahlawan kotamu sendiri sambil merancang dan menciptakan metropolis ramai yang indah. Setiap keputusan ada di tanganmu untuk membuat kotamu semakin besar dan lengkap. Buat pilihan cerdas agar wargamu tetap bahagia dan kaki langitmu meninggi. Setelah itu berdagang, mengobrol, bersaing, dan bergabung dengan klub dengan sesama wali kota. Bangun jalan menuju kejayaan!\r\n\r\nHIDUPKAN KOTAMU\r\nBangun gedung pencakar langit, taman, jembatan, dan banyak lagi! Tempatkan bangunan secara strategis agar pajak tetap mengalir dan kotamu berkembang. Tuntaskan tantangan kehidupan nyata, seperti lalu lintas dan polusi. Sediakan berbagai layanan, seperti pembangkit listrik dan kantor polisi. Jagalah agar lalu lintas lancar dengan jalanan lebar dan trem.\r\n\r\nTUMPAHKAN IMAJINASIMU DI PETA\r\nBangun lingkungan bergaya Tokyo, London, Paris, dan buka bangunan penting eksklusif, seperti Menara Eiffel dan Patung Liberty. Ungk

Let's start **scraping**!

In [4]:
scrap_app_reviews = reviews_all(
    app_id, 
    lang='id', 
    country='id',
    sort=Sort.MOST_RELEVANT, 
    count=1000
)

# **3. Loading Dataset**

In [5]:
app_reviews_df = pd.DataFrame(scrap_app_reviews)
app_reviews_df.to_csv('simcity_reviews.csv', index=False)
num_of_reviews, num_of_columns = app_reviews_df.shape

print(f'Number of Reviews: {num_of_reviews}')
print(f'Number of Columns: {num_of_columns}')
app_reviews_df.head()

Number of Reviews: 144652
Number of Columns: 11


Unnamed: 0,reviewId,userName,userImage,content,score,thumbsUpCount,reviewCreatedVersion,at,replyContent,repliedAt,appVersion
0,153cae38-9708-4e35-a4c4-92d3078ba8b0,Pengguna Google,https://play-lh.googleusercontent.com/EGemoI2N...,Barangnya mahal2 dan sulit untuk dapet simleon...,4,637,1.57.2.129660,2024-10-16 07:42:13,,NaT,1.57.2.129660
1,ec85c1e0-ac51-4c18-958b-a90914a7e26f,Pengguna Google,https://play-lh.googleusercontent.com/EGemoI2N...,Yang paling disayangkan adalah saat ada tantan...,4,4,1.57.1.129081,2024-10-16 10:08:37,,NaT,1.57.1.129081
2,807166ba-f53a-44f1-a6fe-bb4834ed7d42,Pengguna Google,https://play-lh.googleusercontent.com/EGemoI2N...,Pendapatan uang nya terlalu kecil dan susah. b...,2,16,1.57.1.129081,2024-10-03 13:05:17,,NaT,1.57.1.129081
3,b64cc92b-ea52-4eef-84ef-34ec52326ccc,Pengguna Google,https://play-lh.googleusercontent.com/EGemoI2N...,"Offline dari mana'nya coba, udah numpuk resour...",3,19,1.57.1.129081,2024-10-07 09:01:38,,NaT,1.57.1.129081
4,1ae3aae2-ec03-473e-821d-741240df7e1a,Pengguna Google,https://play-lh.googleusercontent.com/EGemoI2N...,"""untuk saran"" 1.ringankan harga tanah 2.kenapa...",3,32,1.57.1.129081,2024-09-21 11:38:00,,NaT,1.57.1.129081


In [6]:
app_reviews_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 144652 entries, 0 to 144651
Data columns (total 11 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   reviewId              144652 non-null  object        
 1   userName              144652 non-null  object        
 2   userImage             144652 non-null  object        
 3   content               144645 non-null  object        
 4   score                 144652 non-null  int64         
 5   thumbsUpCount         144652 non-null  int64         
 6   reviewCreatedVersion  102140 non-null  object        
 7   at                    144652 non-null  datetime64[ns]
 8   replyContent          2 non-null       object        
 9   repliedAt             2 non-null       datetime64[ns]
 10  appVersion            102140 non-null  object        
dtypes: datetime64[ns](2), int64(2), object(7)
memory usage: 12.1+ MB
