Skip to content
Extracting URLs of a specific target based on the results of ""
Branch: master
Clone or download
si9int Merge pull request #7 from jgor/master
Allow to run outside of project directory
Latest commit c8f11f0 Mar 10, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE Initial commit Jul 13, 2018 use script-relative path to find index.txt Feb 14, 2019

Extracting URLs of a specific target based on the results of "".
Updated to v.0.3 | Whats new:

  • 65% faster proceeding
  • Specify a year via -y/--year, e.g.: -y 2018
  • Specify an output file via -o/--out, e.g.: -o whatever.txt
  • Crawl all pages for a specific index -i/--index, e.g.: -i CC-MAIN-2018-05
  • List all available indexes -l/--list, e.g.: -l


  • Crawl for a specific index
  • Implementation of multithreading
  • Allowing a range of years as input
  • Implementing direct-grep
  • Temporary file-writing

Usage [-h] [-y YEAR] [-o OUT] [-l] [-i INDEX] [-u] domain

positional arguments:
  domain                domain which will be crawled for

optional arguments:
  -h, --help            show this help message and exit
  -y YEAR, --year YEAR  limit the result to a specific year (default: all)
  -o OUT, --out OUT     specify an output file (default: domain.txt)
  -l, --list            Lists all available indexes
  -i INDEX, --index INDEX
                        Crawl for a specific index (this will crawl all
  -u, --update          Update index file


python3 -y 2018 -o github_18.txt
cat github_18.txt | grep user


  • Python3
You can’t perform that action at this time.