TASK 1: GATHERING REVIEWS

First we mount our Google drive to our Notebook.



In [16]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Here, the help of python's 'google-play-scraper' library is taken. First of all it is installed and then imported. And then the scraping is done for all the apps. As there are thousands of reviews for these apps, first 1000 reviews are being scraped from each app and they are stored in one csv file.

In [17]:
!pip install google-play-scraper

Collecting google-play-scraper
  Downloading google_play_scraper-1.2.4-py3-none-any.whl (28 kB)
Installing collected packages: google-play-scraper
Successfully installed google-play-scraper-1.2.4


In [28]:
from google_play_scraper import Sort, reviews
import csv
import os

# Function to scrape reviews and save to CSV
def scrape_reviews_to_csv(app_id, file_name, max_reviews=1000):
    result, _ = reviews(
        app_id,
        lang='en',
        country='us',
        sort=Sort.MOST_RELEVANT,
        count=max_reviews,
        filter_score_with=None
    )

    # Writing the reviews to the CSV file
    with open(file_name, 'w', newline='', encoding='utf-8') as file:
        writer = csv.writer(file)
        writer.writerow(['Package Name', 'Reviewer Name', 'Review', 'Rating'])

        for review in result:
            writer.writerow([app_id, review['userName'], review['content'], review['score']])

    print(f'Scraped {len(result)} reviews.')

The applications and their google play id is given below.

Wikipedia = org.wikipedia
Amazon Kindle = com.amazon.kindle
Academia = com.academia.academia
Medium = com.medium.reader
Everand = com.scribd.app.reader0

In [29]:
file_paths = [
    'wikipedia1_reviews.csv',
    'wikipedia2_reviews.csv',
    'wikipedia3_reviews.csv',
    'wikipedia4_reviews.csv',
    'wikipedia5_reviews.csv'
]

# Scraping and saving reviews
scrape_reviews_to_csv('org.wikipedia', file_paths[0])
scrape_reviews_to_csv('com.amazon.kindle', file_paths[1])
scrape_reviews_to_csv('com.academia.academia', file_paths[2])
scrape_reviews_to_csv('com.medium.reader', file_paths[3])
scrape_reviews_to_csv('com.scribd.app.reader0', file_paths[4])

# Reading all the CSV files and merging them
merged_df = pd.concat([pd.read_csv(fp) for fp in file_paths])

# Saving the merged file
merged_file_path = 'wikipedia_reviews.csv'
merged_df.to_csv(merged_file_path, index=False)

# Deleting the original files
for fp in file_paths:
    os.remove(fp)

print("Merging and cleanup completed. Merged file stored at:", merged_file_path)

Scraped 1000 reviews.
Scraped 1000 reviews.
Scraped 1000 reviews.
Scraped 1000 reviews.
Scraped 1000 reviews.
Merging and cleanup completed. Merged file stored at: /content/drive/My Drive/wikipedia_reviews.csv


Task 2:PREPROCESS YOUR TEXT

Now, we install and import the required python libraries needed for this task.

In [30]:
!pip install nltk inflect

import nltk
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')

import string
import inflect
import nltk
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from nltk.tokenize import word_tokenize
import pandas as pd



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


Now, we preprocess the reviews using these steps:
I. Remove punctuations,
II. Remove special characters and emojis,
III. Turn numbers into text,
IV. Remove extra white spaces,
V. Turn all words into lowercase,
VI. Remove stop words,
VII. Lemmatize the reviews

We also print a sample of 15 Pre-processed reviews outputted in the next console output.

In [31]:
# Reviews loaded from CSV file in Google Drive
file_path = '/content/drive/My Drive/wikipedia_reviews.csv'
df = pd.read_csv(file_path)
reviews = df['Review'].tolist()

# Function to remove punctuation
def remove_punctuation(text):
    return text.translate(str.maketrans('', '', string.punctuation))

# Function to convert numbers to words
def convert_numbers_to_words(text):
    p = inflect.engine()
    new_text = []
    for word in text.split():
        if word.isdigit():
            new_word = p.number_to_words(word)
            new_text.append(new_word)
        else:
            new_text.append(word)
    return ' '.join(new_text)

# Function to remove extra whitespace
def remove_extra_whitespace(text):
    return " ".join(text.split())

# Function to remove stopwords
def remove_stopwords(text):
    stop_words = set(stopwords.words('english'))
    word_tokens = word_tokenize(text)
    filtered_text = [word for word in word_tokens if word not in stop_words]
    return ' '.join(filtered_text)

# Function to lemmatize text
def lemmatize_text(text):
    lemmatizer = WordNetLemmatizer()
    word_tokens = word_tokenize(text)
    lemmatized_text = [lemmatizer.lemmatize(word) for word in word_tokens]
    return ' '.join(lemmatized_text)

# Function to preprocess review text
def preprocess_text(text):
    text = remove_punctuation(text)
    text = text.encode('ascii', 'ignore').decode() # This removes special characters and emojis
    text = convert_numbers_to_words(text)
    text = text.lower()
    text = remove_extra_whitespace(text)
    text = remove_stopwords(text)
    text = lemmatize_text(text)
    return text

# Preprocess reviews
preprocessed_reviews = [preprocess_text(review) for review in reviews]

# Display the first 15 preprocessed reviews
print('First 15 Preprocessed Reviews:')
preprocessed_reviews[:15]


First 15 Preprocessed Reviews:


['even frequent user wikipedia totally unaware existence app discovering searched whim app overall excellent random article button offer dangerous game play plan productive ability easily curate personal collection article nice gripe far apparent lack return main page button fall rabbit hole ready click back button lot',
 'offline usage sd storage great interface one annoying flaw incredibly easy accidentally delete offline page several occasion attempted tap article ever slightly swiped little side time deletes page dosent ask confirmation rather remove page blink eye although allow undo page downloaded',
 'better update made clumsy difficulttonavigate word developer text run right rightside border leftright movement im constantly sliding apps page left read end sentence looking like wellcrafted new tabbing yuk somewhat poor term usability ive seen guy better know im donor maybe give listen',
 'truly love wikipedia app suddenly crowned feature place tab missing version feature placed 

Task 3: SENTIMENT ANALYSIS

First Textblob and VaderSentiment libraries are installed.

In [32]:
!pip install textblob
!pip install vaderSentiment



Then we import Textblob and VaderSentiment libraries.

In [33]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from textblob import TextBlob

Textblob reviews sentiment analysis is done at first and the sameple output of 250 reviews is printed here.


In [34]:
# Analyze sentiments using TextBlob
textblob_sentiments = []

#This is for printing sample
for review in preprocessed_reviews[:250]:
    blob = TextBlob(review)
    textblob_polarity = blob.sentiment.polarity
    textblob_sentiments.append(['org.wikipedia', review, textblob_polarity])

# Convert to DataFrame
textblob_df = pd.DataFrame(textblob_sentiments, columns=['Package Name', 'Review', 'Polarity'])

# Display the results
print("TextBlob Sentiments:")
print(textblob_df.to_string(index=True))

for review in preprocessed_reviews:
    blob = TextBlob(review)
    textblob_polarity = blob.sentiment.polarity
    textblob_sentiments.append(['org.wikipedia', review, textblob_polarity])

#This is for storing all the reviews
for review in preprocessed_reviews:
    blob = TextBlob(review)
    textblob_polarity = blob.sentiment.polarity
    textblob_sentiments.append(['org.wikipedia', review, textblob_polarity])

textblob_df = pd.DataFrame(textblob_sentiments, columns=['Package Name', 'Review', 'Polarity'])

TextBlob Sentiments:
      Package Name                                                                                                                                                                                                                                                                                                                                                                                                                                    Review  Polarity
0    org.wikipedia                                                                                                            even frequent user wikipedia totally unaware existence app discovering searched whim app overall excellent random article button offer dangerous game play plan productive ability easily curate personal collection article nice gripe far apparent lack return main page button fall rabbit hole ready click back button lot  0.076667
1    org.wikipedia                                                   

VaderSentiment reviews analysis is done then and the sameple output of 250 reviews is printed here.

In [36]:
# Initialize Vader
analyzer = SentimentIntensityAnalyzer()

# Analyze sentiments using Vader
vader_sentiments = []

#This is for printing sample
for review in preprocessed_reviews[:250]:
    vader_polarity = analyzer.polarity_scores(review)['compound']
    vader_sentiments.append(['org.wikipedia', review, vader_polarity])

# Convert to DataFrame
vader_df = pd.DataFrame(vader_sentiments, columns=['Package Name', 'Review', 'Polarity'])

# Display the results
print("\nVader Sentiments:")
print(vader_df.to_string(index=True))

for review in preprocessed_reviews:
    vader_polarity = analyzer.polarity_scores(review)['compound']
    vader_sentiments.append(['org.wikipedia', review, vader_polarity])

#This is for storing all the reviews
for review in preprocessed_reviews[:250]:
    vader_polarity = analyzer.polarity_scores(review)['compound']
    vader_sentiments.append(['org.wikipedia', review, vader_polarity])

vader_df = pd.DataFrame(vader_sentiments, columns=['Package Name', 'Review', 'Polarity'])



Vader Sentiments:
      Package Name                                                                                                                                                                                                                                                                                                                                                                                                                                    Review  Polarity
0    org.wikipedia                                                                                                            even frequent user wikipedia totally unaware existence app discovering searched whim app overall excellent random article button offer dangerous game play plan productive ability easily curate personal collection article nice gripe far apparent lack return main page button fall rabbit hole ready click back button lot    0.8228
1    org.wikipedia                                                     

Now, both the sentiment analysis tools are compared here in the code below.
At first the below output console will print the numerical comparison between them and then we will discuss their comparison after that.

The code looks at reviews and gives them a score between -1 (very negative) and 1 (very positive).
Since Play Store ratings are from 1 to 5, the code divides the range -1 to 1 into five parts. Each part matches a Play Store rating (1 to 5).

Here (map_polarity_to_rating) function in the code takes a score from TextBlob or Vader and changes it into a rating from 1 to 5, like in the Play Store. For example, a really negative score (-1 to -0.600001) turns into a rating of 1, and a really positive score (0.600000 to 1) becomes a rating of 5. Comparing TextBlob and Vader Scores:

If TextBlob's score is higher than Vader's, it counts one point in textblob_higher. If Vader's score is higher, it counts one point in vader_higher. If both scores are the same, it adds a point to same_polarity.
In simple terms, the code turns the review scores from TextBlob and Vader into Play Store-like ratings and then checks to see which tool gives higher or the same scores for the reviews.

And then the results are printed.

In [37]:
# Function to map polarity to Play Store rating equivalence
def map_polarity_to_rating(polarity):
    if -1 <= polarity <= -0.600001:
        return 1
    elif -0.600000 <= polarity <= -0.200001:
        return 2
    elif -0.200000 <= polarity <= 0.199999:
        return 3
    elif 0.200000 <= polarity <= 0.599999:
        return 4
    elif 0.600000 <= polarity <= 1:
        return 5

# Variables to count comparisons
textblob_higher, vader_higher, same_polarity = 0, 0, 0
textblob_close_to_rating, vader_close_to_rating = 0, 0

for index, row in df.iterrows():


    textblob_polarity = textblob_df.iloc[index]['Polarity']
    vader_polarity = vader_df.iloc[index]['Polarity']
    actual_rating = row['Rating']

    # Comparing polarity scores
    if textblob_polarity > vader_polarity:
        textblob_higher += 1
    elif vader_polarity > textblob_polarity:
        vader_higher += 1
    else:
        same_polarity += 1

    # Comparing polarity to actual rating
    if map_polarity_to_rating(textblob_polarity) == actual_rating:
        textblob_close_to_rating += 1
    if map_polarity_to_rating(vader_polarity) == actual_rating:
        vader_close_to_rating += 1

# Print results
print(f"TextBlob higher than Vader: {textblob_higher}")
print(f"Vader higher than TextBlob: {vader_higher}")
print(f"Same polarity: {same_polarity}")
print(f"TextBlob close to actual rating: {textblob_close_to_rating}")
print(f"Vader close to actual rating: {vader_close_to_rating}")


TextBlob higher than Vader: 1226
Vader higher than TextBlob: 3693
Same polarity: 81
TextBlob close to actual rating: 867
Vader close to actual rating: 1573


Here above we can see the numerical comparison between TextBlob and Vader. Now let's discuss them below:

TextBlob higher than Vader (1226 times): TextBlob gave higher scores to some reviews than Vader did. This might suggest TextBlob notices more positive things in these reviews than Vader.

Vader higher than TextBlob (3693 times): Vader gave higher scores to a lot more reviews than TextBlob. This might suggest Vader sees more positive stuff in these reviews or maybe it's a bit too positive compared to TextBlob.

Same score (81 times): In a few reviews, both TextBlob and Vader gave the same scores. Since this didn't happen a lot, it shows they usually don't agree on how positive or negative a review is.

TextBlob close to actual ratings (867 times): In these cases, TextBlob's scores were similar to what people actually rated on Google Play. This means TextBlob did a good job guessing how people felt in these reviews.

Vader close to actual ratings (1573 times): Vader's scores matched the actual Google Play ratings more often than TextBlob's did. This suggests that Vader might be better at understanding how people feel in their reviews, at least for the reviews we looked at.






TASK4: SUPERVISED LEARNING

First a Random Forest classifier is designed to identify whether a user review is a feature request or not. The console prints the accuracy first and then it also prints first 20 Reviews and Predictions on the Wikipedia SUD. It also prints 2 reviews Predicted as Feature Request so that we can use this data for later analysis.

In [40]:
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score
from sklearn.metrics import accuracy_score
from scipy.sparse import hstack
import numpy as np

# Load labeled data
labeled_df = pd.read_csv('/content/drive/My Drive/reviews_classified_lemmatized.csv')

# Preprocess 'Review text' column
labeled_df['Processed Review'] = labeled_df['Review text'].apply(preprocess_text)

# Vectorize the preprocessed review text
vectorizer = TfidfVectorizer()
X_text = vectorizer.fit_transform(labeled_df['Processed Review'])

# The 'Rating' as a feature included
X_rating = labeled_df[['Rating']].values

# Combined text features with the 'Rating'
X_combined = hstack([X_text, X_rating])

# Created binary labels for feature requests
labeled_df['is_feature'] = labeled_df['Review category'].apply(lambda x: 1 if 'f' in x else 0)

# Splitted data for training and testing
y = labeled_df['is_feature']
X_train, X_test, y_train, y_test = train_test_split(X_combined, y, test_size=0.3, random_state=42)

# Training Random Forest classifier
rf_classifier = RandomForestClassifier()
rf_classifier.fit(X_train, y_train)

# Predict and evaluate
y_pred = rf_classifier.predict(X_test)
print("Accuracy for Feature Request Classifier:", accuracy_score(y_test, y_pred))

#Rating for the app_reviews
csv_rating = df['Rating'].tolist()

# Vectorize the new review texts using the same vectorizer
X_new_text = vectorizer.transform(preprocessed_reviews)

# Extract the ratings for the new reviews
X_new_rating = np.array(csv_rating).reshape(-1, 1)

# Combine text features with the ratings
X_new_combined = hstack([X_new_text, X_new_rating])

# Predict with the trained classifier
new_predictions = rf_classifier.predict(X_new_combined)

# Output the first 20 reviews and their predictions
print("First 20 Reviews and Predictions on the Wikipedia SUD:")
for i in range(20):
    category = 'Feature Request' if new_predictions[i] == 1 else 'Not Feature Request'
    print(f"Review: {preprocessed_reviews[i]}, Rating: {csv_rating[i]}, Predicted Category: {category}")

# Find and print the first review predicted as Feature Request
print("\ 2 Reviews Predicted as Feature Request:")
j = 0
for i in range(len(preprocessed_reviews)):
    if new_predictions[i] == 1:
        print(f"Review: {preprocessed_reviews[i]}, Rating: {csv_rating[i]}, Predicted Category: Feature Request")
        j = j + 1
        if j == 2:
          break


Accuracy for Feature Request Classifier: 0.9548022598870056
First 20 Reviews and Predictions on the Wikipedia SUD:
Review: even frequent user wikipedia totally unaware existence app discovering searched whim app overall excellent random article button offer dangerous game play plan productive ability easily curate personal collection article nice gripe far apparent lack return main page button fall rabbit hole ready click back button lot, Rating: 5, Predicted Category: Not Feature Request
Review: offline usage sd storage great interface one annoying flaw incredibly easy accidentally delete offline page several occasion attempted tap article ever slightly swiped little side time deletes page dosent ask confirmation rather remove page blink eye although allow undo page downloaded, Rating: 4, Predicted Category: Not Feature Request
Review: better update made clumsy difficulttonavigate word developer text run right rightside border leftright movement im constantly sliding apps page left re

Then a Random Forest classifier is designed to identify whether a user review is a bug review or not. The console prints the accuracy first and then it also prints first 20 Reviews and Predictions on the Wikipedia SUD. It also prints 2 reviews Predicted as Bug Review so that we can use this data for later analysis.

In [41]:
# Load labeled data
labeled_df = pd.read_csv('/content/drive/My Drive/reviews_classified_lemmatized.csv')

# Preprocess 'Review text'
labeled_df['Processed Review'] = labeled_df['Review text'].apply(preprocess_text)

# Vectorize the preprocessed review text
vectorizer = TfidfVectorizer()
X_text = vectorizer.fit_transform(labeled_df['Processed Review'])

# Include the 'Rating' as a feature
X_rating = labeled_df[['Rating']].values

# Combine text features with the 'Rating'
X_combined = hstack([X_text, X_rating])

# Create binary labels for feature requests
labeled_df['is_feature'] = labeled_df['Review category'].apply(lambda x: 1 if 'b' in x else 0)

# Split data for training and testing
y = labeled_df['is_feature']
X_train, X_test, y_train, y_test = train_test_split(X_combined, y, test_size=0.3, random_state=42)

# Train Random Forest classifier
rf_classifier = RandomForestClassifier()
rf_classifier.fit(X_train, y_train)

# Predict and evaluate
y_pred = rf_classifier.predict(X_test)
print("Accuracy for Bug Review Classifier:", accuracy_score(y_test, y_pred))

# Predict with the trained classifier
new_predictions = rf_classifier.predict(X_new_combined)

# Output the first 20 reviews and their predictions
print("First 20 Reviews and Predictions on our SUDs:")
for i in range(20):
    category = 'Bug Review' if new_predictions[i] == 1 else 'Not Bug Review'
    print(f"Review: {preprocessed_reviews[i]}, Rating: {csv_rating[i]}, Predicted Category: {category}")

# Find and print the first review predicted as Feature Request
print("\2 Reviews Predicted as Bug Review:")
j = 0
for i in range(len(preprocessed_reviews)):
    if new_predictions[i] == 1:
        print(f"Review: {preprocessed_reviews[i]}, Rating: {csv_rating[i]}, Predicted Category: Bug Review")
        j = j + 1
        if j == 2:
          break


Accuracy for Bug Review Classifier: 0.8813559322033898
First 20 Reviews and Predictions on our SUDs:
Review: even frequent user wikipedia totally unaware existence app discovering searched whim app overall excellent random article button offer dangerous game play plan productive ability easily curate personal collection article nice gripe far apparent lack return main page button fall rabbit hole ready click back button lot, Rating: 5, Predicted Category: Not Bug Review
Review: offline usage sd storage great interface one annoying flaw incredibly easy accidentally delete offline page several occasion attempted tap article ever slightly swiped little side time deletes page dosent ask confirmation rather remove page blink eye although allow undo page downloaded, Rating: 4, Predicted Category: Bug Review
Review: better update made clumsy difficulttonavigate word developer text run right rightside border leftright movement im constantly sliding apps page left read end sentence looking like

FURTHER ANALYSIS:

Two recommended changes for the Wikipedia app are:

Improve Offline Page Management: The reviews above, 1st Bug Review to be more specific, highlighted issues with accidentally deleting offline pages, suggesting a need for better management and confirmation prompts before deletion. This change would enhance user experience by reducing accidental data loss and providing a more robust offline usage feature.

Optimize Table Navigation and UI Responsiveness: Users have reported, 2nd Bug Review to be more specific, difficulties with navigating tables and UI elements, especially on smaller screens. Improving table rendering and touch sensitivity would enhance the app's usability and accessibility.

Bug Reviews are given more importance than Feature Requests as feature requests are requests without which the app can function properly. They are just requests asking for more features. Whereas, Bug Reviews are important as they suggest not normal functioning of the app.

Risks and Uncertainties:

Implementing these changes may require significant development effort and testing to ensure they don't introduce new bugs or usability issues.
User preferences vary widely. Changes that benefit some users might not be well-received by others. It's important to balance feedback from a diverse user base.
The recommendations are based on a subset of reviews, which might not represent the broader user base's needs or preferences. Additional user research and testing would be valuable to validate these changes.

TASK 5: TOPIC MODELING Using LDA

In [42]:
import gensim
from gensim import corpora
from nltk.tokenize import word_tokenize

# Tokenizing the preprocessed reviews
tokenized_docs = [word_tokenize(doc) for doc in preprocessed_reviews]

# Creating a dictionary and corpus
dictionary = corpora.Dictionary(tokenized_docs)
corpus = [dictionary.doc2bow(doc) for doc in tokenized_docs]

# Running LDA model
lda_model = gensim.models.LdaMulticore(corpus, num_topics=15, id2word=dictionary, passes=2, workers=2)

# Extract and display topics
for idx, topic in lda_model.print_topics(-1, num_words=7):
    print("Topic: {} \nWords: {}".format(idx, topic))




Topic: 0 
Words: 0.028*"app" + 0.018*"book" + 0.018*"love" + 0.017*"get" + 0.014*"read" + 0.010*"audiobooks" + 0.009*"great"
Topic: 1 
Words: 0.037*"book" + 0.017*"app" + 0.010*"scribd" + 0.010*"great" + 0.008*"many" + 0.008*"month" + 0.008*"audio"
Topic: 2 
Words: 0.044*"book" + 0.031*"app" + 0.013*"one" + 0.009*"audiobooks" + 0.008*"ive" + 0.008*"two" + 0.008*"im"
Topic: 3 
Words: 0.049*"book" + 0.023*"read" + 0.017*"app" + 0.011*"audiobooks" + 0.011*"month" + 0.010*"ive" + 0.009*"get"
Topic: 4 
Words: 0.051*"app" + 0.019*"book" + 0.011*"use" + 0.008*"good" + 0.007*"time" + 0.007*"reading" + 0.006*"issue"
Topic: 5 
Words: 0.020*"app" + 0.016*"page" + 0.010*"read" + 0.010*"many" + 0.010*"wikipedia" + 0.010*"information" + 0.007*"best"
Topic: 6 
Words: 0.043*"app" + 0.012*"like" + 0.012*"great" + 0.011*"book" + 0.008*"would" + 0.008*"one" + 0.007*"scribd"
Topic: 7 
Words: 0.021*"book" + 0.016*"app" + 0.014*"reading" + 0.012*"title" + 0.012*"read" + 0.011*"great" + 0.010*"listen"
Topic:

ANALYSIS OF TASK 5'S OUTPUT WITH TASK4'S OUTPUT:

Manual analysis is done here as it is easier in this case.

Comparison with Features: Topics about books and reading (like Topics 0, 1, 2, 3) likely correspond to reviews discussing the app's primary features, such as the availability of books, reading experience, and audiobook functionality. However, some topics might not directly align with specific features. For example, topics discussing usability or general praise (like Topics 4 and 6) are more about the overall app experience rather than specific features.

Comparison with Bug Reports: The LDA topics do not seem to focus specifically on bugs or technical issues. They are more about the general use and enjoyment of the app. Bug reports, on the other hand, are specific complaints or issues users have encountered, such as app crashes or functional errors, which might not be clearly reflected in the broader themes identified by LDA.

Differences in Outputs: The LDA model provides a broad overview of what users talk about, like what they like or what they find important in the app (e.g., reading experience, book availability). The classifiers are more precise, identifying specific types of feedback like feature requests or bug reports. They are more about "what users want" or "what problems users face" rather than what users generally discuss. The accuracy of the models and the amount of data they are trained on affect these insights. LDA can work with large datasets to find general patterns, while classifiers need specific labeled data to identify precise feedback types.

