Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time

Commonspeak is a wordlist generation tool that leverages public datasets from Google's BigQuery platform. By performing queries on large datasets that are updated frequently, commonspeak is able to generate wordlists that are "evolutionary", in the sense that they reflect the newest trends on the internet.

Commonspeak was made to generate content discovery and subdomain wordlists for use in application security testing. More details about this tool can be found here.



  • Install jq (sudo apt-get install jq or brew install jq)

  • Clone the repository:

    git clone

  • Install Google Cloud SDK

  • Create a Google Cloud project to use with BigQuery (mine was named crunchbox-160315)

  • cd to the dataset you would like to pull down: cd commonspeak/hackernews

  • Run the bash script, specifying the project name as the first argument: bash crunchbox-160315

The output will be located in commonspeak/hackernews/output/compiled


Commonspeak currently supports the following datasets:

  • StackOverflow, HackerNews

    • Directories
    • Filenames
    • Subdomains
  • HTTPArchive

    • Directories
    • Filenames
    • Language based directories and filenames
    • Subdomains
  • Certificate Transparency Logs

    • Subdomains
  • Collection of bash scripts that can easily be automated by using cron jobs

  • Easy to modify SQL queries for each separate dataset


Extracting the top 1 million unique subdomains from certificate transparency logs:

⟩ bash crunchbox-160315
* Creating new dataset on BigQuery: crunchbox-160315:ctl_2017_12_02
* running bq mk crunchbox-160315:ctl_2017_12_02

Dataset 'crunchbox-160315:ctl_2017_12_02' successfully created.

* Running query to extract all_dns_names to ctl_2017_12_02.all_dns_names
Waiting on bqjob_r5535032cd1a736b2_000001601706601a_1 ... (139s) Current status: DONE
|                  dns_names                   |
|                              |
|                              |
|                             |
|                           |
|                               |
| [...omitted for brevity...]                  |
|                 |
|              |
|             |
| |

* Cleaning subdomains from all all_dns_names to ctl_2017_12_02.top_1m_all_dns_names
Waiting on bqjob_r236f25aea0828b3a_00000160170897a0_1 ... (657s) Current status: DONE
* Parsing results and saving to output/compiled/ctl_2017_12_02.subdomains.txt

* Compiled top 1000000 subdomains

Follow the team on twitter