Skip to content

(mirror) Discover apps by different mehtods. Mass download app packages and metadata.

License

Notifications You must be signed in to change notification settings

marzzzello/gplaycrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Repo on GitLab Repo on GitHub license commit-activity Mastodon Follow

gplaycrawler

Discover apps by different methods. Mass download app packages and metadata.

Setup

Install protobuf:

Using apt:

$ sudo apt install protobuf-compiler

Using pacman:

$ sudo pacman -S protobuf

Check version:

$ protoc --version  # Ensure compiler version is 3+

Install gplaycrawler using pip:

$ pip install gplaycrawler

Usage

set env vars (optional):

export PLAYSTORE_TOKEN='ya29.fooooo'
export PLAYSTORE_GSFID='1234567891234567890'
export HTTP_PROXY='http://localhost:8080'
export HTTPS_PROXY='http://localhost:8080'
export CURL_CA_BUNDLE='/usr/local/myproxy_info/cacert.pem'
usage: gplaycrawler [-h] [-v {warning,info,debug}]
                    {help,usage,charts,search,related,metadata,packages} ...

Crawl the Google PlayStore

positional arguments:
  {help,usage,charts,search,related,metadata,packages}
                        Desired action to perform
    help                Print this help message
    usage               Print full usage
    charts              parallel downloading of all cross category app charts
    search              parallel searching of apps via search terms
    related             parallel searching of apps via related apps
    metadata            parallel scraping of app metadata
    packages            parallel downloading app packages

optional arguments:
  -h, --help            show this help message and exit
  -v {warning,info,debug}, --verbosity {warning,info,debug}
                        Set verbosity level (default: info)


All commands in detail:


Common optional arguments for related, search, metadata, packages:
  --locale LOCALE      (default: en_US)
  --timezone TIMEZONE  (default: UTC)
  --device DEVICE      (default: px_3a)
  --delay DELAY        Delay between every request in seconds (default: 0.51)
  --threads THREADS    Number of parallel workers (default: 2)


related:
usage: gplaycrawler related [-h] [--locale LOCALE] [--timezone TIMEZONE]
                            [--device DEVICE] [--delay DELAY]
                            [--threads THREADS] [--output OUTPUT]
                            [--level LEVEL]
                            input

parallel searching of apps via related apps

positional arguments:
  input                name of the input file (default: charts.json)

optional arguments:
  --output OUTPUT      base name of the output files (default: ids_related)
  --level LEVEL        How deep to crawl (default: 3)


search:
usage: gplaycrawler search [-h] [--locale LOCALE] [--timezone TIMEZONE]
                           [--device DEVICE] [--delay DELAY]
                           [--threads THREADS] [--output OUTPUT]
                           [--length LENGTH]

parallel searching of apps via search terms

optional arguments:
  --output OUTPUT      name of the output file (default: ids_search.json)
  --length LENGTH      length of strings to search (default: 2)


metadata:
usage: gplaycrawler metadata [-h] [--locale LOCALE] [--timezone TIMEZONE]
                             [--device DEVICE] [--delay DELAY]
                             [--threads THREADS] [--output OUTPUT]
                             input

parallel scraping of app metadata

positional arguments:
  input                name of the input file (json)

optional arguments:
  --output OUTPUT      directory name of the output files (default:
                       out_metadata)


packages:
usage: gplaycrawler packages [-h] [--locale LOCALE] [--timezone TIMEZONE]
                             [--device DEVICE] [--delay DELAY]
                             [--threads THREADS] [--output OUTPUT]
                             [--expansions] [--splits]
                             input

parallel downloading app packages

positional arguments:
  input                name of the input file (json)

optional arguments:
  --output OUTPUT      directory name of the output files (default:
                       out_packages)
  --expansions         also download expansion files (default: False)
  --splits             also download split files (default: False)