# Google Play Store Multi-App Scraper

This notebook scrapes app information and user reviews from Google Play Store for multiple apps and saves the data to CSV and XLSX formats.

## Target Apps:
- WhatsApp (`com.whatsapp`)
- Facebook (`com.facebook.katana`)
- Instagram (`com.instagram.android`)
- Snapchat (`com.snapchat.android`)
- Spotify (`com.spotify.music`)

In [1]:
# Import required libraries
import pandas as pd
import numpy as np
from google_play_scraper import app, reviews, Sort
import time
import json
from datetime import datetime
import os
from typing import List, Dict, Tuple
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

print("‚úÖ Libraries imported successfully!")
print(f"üìÖ Current time: {datetime.now()}")

‚úÖ Libraries imported successfully!
üìÖ Current time: 2025-07-03 11:16:34.123018


In [None]:
# Configuration
app_ids = [
    'com.whatsapp',
    'com.facebook.katana',
    'com.instagram.android',
    'com.snapchat.android',
    'com.spotify.music'
]

COUNTRY = 'id'  # Indonesia
LANG = 'jv'    # Indonesian
REVIEWS_PER_APP = 1000  # Number of reviews to scrape per app

print(f"üéØ Target: {len(app_ids)} apps")
print(f"üìä Reviews per app: {REVIEWS_PER_APP}")
print(f"üåç Country: {COUNTRY}, Language: {LANG}")
print(f"üì± Apps: {app_ids}")

üéØ Target: 5 apps
üìä Reviews per app: 1000
üåç Country: id, Language: id
üì± Apps: ['com.whatsapp', 'com.facebook.katana', 'com.instagram.android', 'com.snapchat.android', 'com.spotify.music']


## Step 1: Define Helper Functions

In [3]:
def get_app_info(app_id: str, country: str = 'id', lang: str = 'id') -> Dict:
    """
    Get detailed app information from Google Play Store.
    """
    try:
        app_info = app(app_id, lang=lang, country=country)
        
        return {
            'app_id': app_id,
            'title': app_info.get('title', ''),
            'developer': app_info.get('developer', ''),
            'developer_id': app_info.get('developerId', ''),
            'category': app_info.get('genre', ''),
            'rating': app_info.get('score', 0),
            'rating_count': app_info.get('ratings', 0),
            'installs': app_info.get('installs', ''),
            'price': app_info.get('price', 0),
            'free': app_info.get('free', True),
            'size': app_info.get('size', ''),
            'min_android': app_info.get('minInstalls', ''),
            'content_rating': app_info.get('contentRating', ''),
            'description': app_info.get('description', ''),
            'summary': app_info.get('summary', ''),
            'updated': app_info.get('updated', ''),
            'version': app_info.get('version', ''),
            'recent_changes': app_info.get('recentChanges', ''),
            'scraped_at': datetime.now().isoformat()
        }
        
    except Exception as e:
        print(f"‚ùå Error getting app info for {app_id}: {e}")
        return {
            'app_id': app_id,
            'error': str(e),
            'scraped_at': datetime.now().isoformat()
        }

def scrape_app_reviews(app_id: str, count: int = 1000, country: str = 'id', lang: str = 'id') -> List[Dict]:
    """
    Scrape reviews for a specific app.
    """
    print(f"\nüì± Scraping {count} reviews for {app_id}...")
    
    try:
        all_reviews = []
        continuation_token = None
        batch_size = 200
        
        with tqdm(total=count, desc=f"Reviews for {app_id}") as pbar:
            while len(all_reviews) < count:
                try:
                    result, continuation_token = reviews(
                        app_id,
                        lang=lang,
                        country=country,
                        sort=Sort.NEWEST,
                        count=min(batch_size, count - len(all_reviews)),
                        continuation_token=continuation_token
                    )
                    
                    if not result:
                        print(f"‚ö†Ô∏è No more reviews available for {app_id}")
                        break
                    
                    # Add metadata to each review
                    for review in result:
                        review['app_id'] = app_id
                        review['scraped_at'] = datetime.now().isoformat()
                    
                    all_reviews.extend(result)
                    pbar.update(len(result))
                    
                    time.sleep(1)  # Rate limiting
                    
                except Exception as e:
                    print(f"‚ùå Error scraping batch for {app_id}: {e}")
                    break
        
        print(f"‚úÖ Successfully scraped {len(all_reviews)} reviews for {app_id}")
        return all_reviews
        
    except Exception as e:
        print(f"‚ùå Error scraping reviews for {app_id}: {e}")
        return []

print("‚úÖ Helper functions defined!")

‚úÖ Helper functions defined!


## Step 2: Scrape App Information

In [4]:
# Scrape app information
print("üîç Scraping app information...")
apps_info = []

for app_id in tqdm(app_ids, desc="Getting app info"):
    info = get_app_info(app_id, COUNTRY, LANG)
    apps_info.append(info)
    
    if 'error' not in info:
        print(f"‚úÖ {info.get('title', app_id)} - Rating: {info.get('rating', 'N/A')}‚≠ê")
    else:
        print(f"‚ùå Failed to get info for {app_id}")
    
    time.sleep(1)  # Rate limiting

# Convert to DataFrame
apps_df = pd.DataFrame(apps_info)
print(f"\nüìä App information collected for {len(apps_df)} apps")

# Display results
if not apps_df.empty:
    display(apps_df[['app_id', 'title', 'developer', 'rating', 'rating_count', 'installs']].head())
else:
    print("‚ùå No app information collected")

üîç Scraping app information...


Getting app info:   0%|          | 0/5 [00:00<?, ?it/s]

‚úÖ WhatsApp Messenger - Rating: 4.362371‚≠ê


Getting app info:  20%|‚ñà‚ñà        | 1/5 [00:01<00:07,  1.98s/it]

‚úÖ Facebook - Rating: 4.619018‚≠ê


Getting app info:  40%|‚ñà‚ñà‚ñà‚ñà      | 2/5 [00:03<00:05,  1.97s/it]

‚úÖ Instagram - Rating: 4.117008‚≠ê


Getting app info:  60%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 3/5 [00:05<00:03,  1.95s/it]

‚úÖ Snapchat - Rating: 4.240867‚≠ê


Getting app info:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 4/5 [00:07<00:01,  1.93s/it]

‚úÖ Spotify: Music dan Podcast - Rating: 4.471803‚≠ê


Getting app info: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 5/5 [00:09<00:00,  1.95s/it]


üìä App information collected for 5 apps





Unnamed: 0,app_id,title,developer,rating,rating_count,installs
0,com.whatsapp,WhatsApp Messenger,WhatsApp LLC,4.362371,209299181,10.000.000.000+
1,com.facebook.katana,Facebook,"Meta Platforms, Inc.",4.619018,169113269,10.000.000.000+
2,com.instagram.android,Instagram,Instagram,4.117008,163289953,5.000.000.000+
3,com.snapchat.android,Snapchat,Snap Inc,4.240867,37934342,1.000.000.000+
4,com.spotify.music,Spotify: Music dan Podcast,Spotify AB,4.471803,33682597,1.000.000.000+


## Step 3: Scrape User Reviews

In [5]:
# Scrape reviews for all apps
print("üí¨ Starting review scraping process...")
all_reviews = []

for app_id in app_ids:
    app_reviews = scrape_app_reviews(app_id, REVIEWS_PER_APP, COUNTRY, LANG)
    all_reviews.extend(app_reviews)
    print(f"üìà Total reviews collected so far: {len(all_reviews)}")
    
    # Longer pause between apps to avoid rate limiting
    time.sleep(3)

print(f"\nüéâ Completed! Total reviews scraped: {len(all_reviews)}")

üí¨ Starting review scraping process...

üì± Scraping 1000 reviews for com.whatsapp...


Reviews for com.whatsapp: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1000/1000 [00:07<00:00, 130.17it/s]


‚úÖ Successfully scraped 1000 reviews for com.whatsapp
üìà Total reviews collected so far: 1000

üì± Scraping 1000 reviews for com.facebook.katana...


Reviews for com.facebook.katana: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1000/1000 [00:07<00:00, 129.62it/s]


‚úÖ Successfully scraped 1000 reviews for com.facebook.katana
üìà Total reviews collected so far: 2000

üì± Scraping 1000 reviews for com.instagram.android...


Reviews for com.instagram.android: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1000/1000 [00:07<00:00, 136.12it/s]


‚úÖ Successfully scraped 1000 reviews for com.instagram.android
üìà Total reviews collected so far: 3000

üì± Scraping 1000 reviews for com.snapchat.android...


Reviews for com.snapchat.android: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1000/1000 [00:08<00:00, 121.62it/s]


‚úÖ Successfully scraped 1000 reviews for com.snapchat.android
üìà Total reviews collected so far: 4000

üì± Scraping 1000 reviews for com.spotify.music...


Reviews for com.spotify.music: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 1000/1000 [00:07<00:00, 139.11it/s]


‚úÖ Successfully scraped 1000 reviews for com.spotify.music
üìà Total reviews collected so far: 5000

üéâ Completed! Total reviews scraped: 5000


In [6]:
# Process reviews data
if all_reviews:
    reviews_df = pd.DataFrame(all_reviews)
    
    # Add additional features
    reviews_df['review_length'] = reviews_df['content'].str.len()
    reviews_df['word_count'] = reviews_df['content'].str.split().str.len()
    
    # Convert date columns
    reviews_df['at'] = pd.to_datetime(reviews_df['at'])
    
    print(f"üìä Reviews DataFrame created with {len(reviews_df)} rows")
    print(f"üìã Columns: {list(reviews_df.columns)}")
    
    # Show summary by app
    print("\nüì± Reviews by app:")
    app_review_counts = reviews_df['app_id'].value_counts()
    for app_id, count in app_review_counts.items():
        app_name = apps_df[apps_df['app_id'] == app_id]['title'].iloc[0] if not apps_df.empty else app_id
        print(f"  ‚Ä¢ {app_name}: {count} reviews")
    
    # Display sample
    display(reviews_df[['app_id', 'userName', 'score', 'content', 'at']].head())
else:
    print("‚ùå No reviews were scraped!")
    reviews_df = pd.DataFrame()

üìä Reviews DataFrame created with 5000 rows
üìã Columns: ['reviewId', 'userName', 'userImage', 'content', 'score', 'thumbsUpCount', 'reviewCreatedVersion', 'at', 'replyContent', 'repliedAt', 'appVersion', 'app_id', 'scraped_at', 'review_length', 'word_count']

üì± Reviews by app:
  ‚Ä¢ WhatsApp Messenger: 1000 reviews
  ‚Ä¢ Facebook: 1000 reviews
  ‚Ä¢ Instagram: 1000 reviews
  ‚Ä¢ Snapchat: 1000 reviews
  ‚Ä¢ Spotify: Music dan Podcast: 1000 reviews


Unnamed: 0,app_id,userName,score,content,at
0,com.whatsapp,Rafita Ali,5,Mantap aplikasi nya,2025-07-02 11:15:53
1,com.whatsapp,Semeon Atading (Meon),5,sangat baik,2025-07-02 11:15:32
2,com.whatsapp,Mufida Hasna,5,Mufida hasna tolong dong kembali in ke WhatsAp...,2025-07-02 11:13:06
3,com.whatsapp,Septian Almahzumi,5,tolong dong pas orang lain nelepon pake NSP pa...,2025-07-02 11:12:20
4,com.whatsapp,Lepoh Indah,5,ok,2025-07-02 11:11:40


## Step 4: Data Analysis and Summary

In [7]:
# Create summary statistics
if not reviews_df.empty and not apps_df.empty:
    print("=" * 60)
    print("üìä GOOGLE PLAY STORE SCRAPING SUMMARY")
    print("=" * 60)
    
    print(f"\nüéØ Apps scraped: {len(apps_df)}")
    print(f"üí¨ Total reviews: {len(reviews_df)}")
    print(f"üìÖ Date range: {reviews_df['at'].min()} to {reviews_df['at'].max()}")
    
    print("\nüì± App Information Summary:")
    for _, app in apps_df.iterrows():
        if 'title' in app and 'rating' in app and 'error' not in app:
            rating_count = app.get('rating_count', 'N/A')
            installs = app.get('installs', 'N/A')
            print(f"  ‚Ä¢ {app['title']}: {app['rating']:.1f}‚≠ê ({rating_count} ratings, {installs} installs)")
    
    print("\nüìä Review Statistics by App:")
    review_stats = reviews_df.groupby('app_id').agg({
        'score': ['count', 'mean'],
        'review_length': 'mean',
        'word_count': 'mean'
    }).round(2)
    
    review_stats.columns = ['Review Count', 'Avg Rating', 'Avg Length', 'Avg Words']
    display(review_stats)
    
    print("\n‚≠ê Overall Rating Distribution:")
    rating_dist = reviews_df['score'].value_counts().sort_index()
    for rating, count in rating_dist.items():
        percentage = (count / len(reviews_df)) * 100
        stars = "‚≠ê" * rating
        print(f"  {stars} ({rating}): {count} reviews ({percentage:.1f}%)")
else:
    print("‚ùå No data available for summary")

üìä GOOGLE PLAY STORE SCRAPING SUMMARY

üéØ Apps scraped: 5
üí¨ Total reviews: 5000
üìÖ Date range: 2025-05-13 15:31:37 to 2025-07-02 11:15:53

üì± App Information Summary:
  ‚Ä¢ WhatsApp Messenger: 4.4‚≠ê (209299181 ratings, 10.000.000.000+ installs)
  ‚Ä¢ Facebook: 4.6‚≠ê (169113269 ratings, 10.000.000.000+ installs)
  ‚Ä¢ Instagram: 4.1‚≠ê (163289953 ratings, 5.000.000.000+ installs)
  ‚Ä¢ Snapchat: 4.2‚≠ê (37934342 ratings, 1.000.000.000+ installs)
  ‚Ä¢ Spotify: Music dan Podcast: 4.5‚≠ê (33682597 ratings, 1.000.000.000+ installs)

üìä Review Statistics by App:


Unnamed: 0_level_0,Review Count,Avg Rating,Avg Length,Avg Words
app_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
com.facebook.katana,1000,4.25,36.78,5.97
com.instagram.android,1000,4.03,45.35,7.48
com.snapchat.android,1000,4.29,40.56,7.04
com.spotify.music,1000,4.39,38.74,6.5
com.whatsapp,1000,3.69,49.34,8.3



‚≠ê Overall Rating Distribution:
  ‚≠ê (1): 714 reviews (14.3%)
  ‚≠ê‚≠ê (2): 175 reviews (3.5%)
  ‚≠ê‚≠ê‚≠ê (3): 259 reviews (5.2%)
  ‚≠ê‚≠ê‚≠ê‚≠ê (4): 451 reviews (9.0%)
  ‚≠ê‚≠ê‚≠ê‚≠ê‚≠ê (5): 3401 reviews (68.0%)


## Step 5: Export Data to CSV and XLSX

In [8]:
# Create timestamp for file naming
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
print(f"üìÖ Export timestamp: {timestamp}")

# Export app information
if not apps_df.empty:
    # CSV export
    apps_csv_filename = f"google_play_apps_info_{timestamp}.csv"
    apps_df.to_csv(apps_csv_filename, index=False, encoding='utf-8')
    print(f"‚úÖ App information saved to: {apps_csv_filename}")
    
    # XLSX export
    try:
        apps_xlsx_filename = f"google_play_apps_info_{timestamp}.xlsx"
        apps_df.to_excel(apps_xlsx_filename, index=False, engine='openpyxl')
        print(f"‚úÖ App information saved to: {apps_xlsx_filename}")
    except ImportError:
        print("‚ö†Ô∏è openpyxl not installed. Install with: pip install openpyxl")
else:
    print("‚ùå No app information to export")

# Export reviews
if not reviews_df.empty:
    # CSV export
    reviews_csv_filename = f"google_play_reviews_{timestamp}.csv"
    reviews_df.to_csv(reviews_csv_filename, index=False, encoding='utf-8')
    print(f"‚úÖ Reviews saved to: {reviews_csv_filename}")
    
    # XLSX export
    try:
        reviews_xlsx_filename = f"google_play_reviews_{timestamp}.xlsx"
        reviews_df.to_excel(reviews_xlsx_filename, index=False, engine='openpyxl')
        print(f"‚úÖ Reviews saved to: {reviews_xlsx_filename}")
    except ImportError:
        print("‚ö†Ô∏è openpyxl not installed. Install with: pip install openpyxl")
else:
    print("‚ùå No reviews to export")

print(f"\nüìÅ All data exported with timestamp: {timestamp}")

üìÖ Export timestamp: 20250703_111737
‚úÖ App information saved to: google_play_apps_info_20250703_111737.csv
‚úÖ App information saved to: google_play_apps_info_20250703_111737.xlsx
‚úÖ Reviews saved to: google_play_reviews_20250703_111737.csv
‚úÖ Reviews saved to: google_play_reviews_20250703_111737.xlsx

üìÅ All data exported with timestamp: 20250703_111737


In [9]:
# Create a combined Excel file with multiple sheets
if not apps_df.empty or not reviews_df.empty:
    try:
        combined_filename = f"google_play_complete_data_{timestamp}.xlsx"
        
        with pd.ExcelWriter(combined_filename, engine='openpyxl') as writer:
            if not apps_df.empty:
                apps_df.to_excel(writer, sheet_name='App_Information', index=False)
                print(f"‚úÖ App information added to sheet 'App_Information'")
            
            if not reviews_df.empty:
                reviews_df.to_excel(writer, sheet_name='Reviews', index=False)
                print(f"‚úÖ Reviews added to sheet 'Reviews'")
                
                # Create summary sheet
                summary_data = []
                
                # Overall statistics
                summary_data.append(['Metric', 'Value'])
                summary_data.append(['Total Apps', len(apps_df) if not apps_df.empty else 0])
                summary_data.append(['Total Reviews', len(reviews_df)])
                summary_data.append(['Date Range Start', reviews_df['at'].min().strftime('%Y-%m-%d')])
                summary_data.append(['Date Range End', reviews_df['at'].max().strftime('%Y-%m-%d')])
                summary_data.append(['Average Rating', round(reviews_df['score'].mean(), 2)])
                summary_data.append(['', ''])  # Empty row
                
                # Rating distribution
                summary_data.append(['Rating Distribution', ''])
                rating_dist = reviews_df['score'].value_counts().sort_index()
                for rating, count in rating_dist.items():
                    percentage = (count / len(reviews_df)) * 100
                    summary_data.append([f'{rating} Stars', f'{count} ({percentage:.1f}%)'])
                
                summary_df = pd.DataFrame(summary_data)
                summary_df.to_excel(writer, sheet_name='Summary', index=False, header=False)
                print(f"‚úÖ Summary added to sheet 'Summary'")
        
        print(f"\nüéâ Combined Excel file created: {combined_filename}")
        print(f"   üìä Contains: App Information, Reviews, and Summary sheets")
        
    except ImportError:
        print("‚ö†Ô∏è openpyxl not installed. Combined Excel file not created.")
        print("   Install with: pip install openpyxl")
    except Exception as e:
        print(f"‚ùå Error creating combined Excel file: {e}")
else:
    print("‚ùå No data available to create combined file")

‚úÖ App information added to sheet 'App_Information'
‚úÖ Reviews added to sheet 'Reviews'
‚úÖ Summary added to sheet 'Summary'

üéâ Combined Excel file created: google_play_complete_data_20250703_111737.xlsx
   üìä Contains: App Information, Reviews, and Summary sheets


## Step 6: Final Summary and Next Steps

In [10]:
# Display final summary and file information
print("=" * 70)
print("üéâ SCRAPING COMPLETED SUCCESSFULLY!")
print("=" * 70)

# List created files
created_files = []
for filename in os.listdir('.'):
    if timestamp in filename and (filename.endswith('.csv') or filename.endswith('.xlsx')):
        file_size = os.path.getsize(filename)
        created_files.append((filename, file_size))

if created_files:
    print("\nüìÅ Created files:")
    total_size = 0
    for filename, size in created_files:
        size_mb = size / (1024 * 1024)
        total_size += size
        print(f"  üìÑ {filename} ({size_mb:.2f} MB)")
    
    print(f"\nüìä Summary:")
    print(f"  ‚Ä¢ Total files: {len(created_files)}")
    print(f"  ‚Ä¢ Total size: {total_size / (1024 * 1024):.2f} MB")
    print(f"  ‚Ä¢ Timestamp: {timestamp}")

print("\nüöÄ Next Steps:")
print("  1. üìä Open Excel files to explore the data")
print("  2. üìà Use CSV files for data analysis or machine learning")
print("  3. ü§ñ Consider running sentiment analysis on review content")
print("  4. üìâ Create visualizations and trend analysis")
print("  5. üîç Perform competitive analysis between apps")

print("\nüí° Tips:")
print("  ‚Ä¢ Use the 'Reviews' sheet for sentiment analysis")
print("  ‚Ä¢ Check the 'Summary' sheet for quick insights")
print("  ‚Ä¢ Filter reviews by app_id for individual app analysis")
print("  ‚Ä¢ Use review_length and word_count for text analysis")

print("\n‚ú® Happy analyzing!")

üéâ SCRAPING COMPLETED SUCCESSFULLY!

üìÅ Created files:
  üìÑ google_play_apps_info_20250703_111737.csv (0.01 MB)
  üìÑ google_play_apps_info_20250703_111737.xlsx (0.01 MB)
  üìÑ google_play_complete_data_20250703_111737.xlsx (0.84 MB)
  üìÑ google_play_reviews_20250703_111737.csv (1.39 MB)
  üìÑ google_play_reviews_20250703_111737.xlsx (0.84 MB)

üìä Summary:
  ‚Ä¢ Total files: 5
  ‚Ä¢ Total size: 3.09 MB
  ‚Ä¢ Timestamp: 20250703_111737

üöÄ Next Steps:
  1. üìä Open Excel files to explore the data
  2. üìà Use CSV files for data analysis or machine learning
  3. ü§ñ Consider running sentiment analysis on review content
  4. üìâ Create visualizations and trend analysis
  5. üîç Perform competitive analysis between apps

üí° Tips:
  ‚Ä¢ Use the 'Reviews' sheet for sentiment analysis
  ‚Ä¢ Check the 'Summary' sheet for quick insights
  ‚Ä¢ Filter reviews by app_id for individual app analysis
  ‚Ä¢ Use review_length and word_count for text analysis

‚ú® Happy analyzing!
