Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



5 Commits

Repository files navigation

Mass IPv4 WHOIS Collection Tool


There is a blog post that accompanies this code base.

Running this code base locally

The following allows you to run this code base locally. These commands were tested on a fresh installation of Ubuntu 14.04.3 LTS.

Various dependencies are needed, this will install them via Debian packages.

$ sudo apt-get update
$ sudo apt-get install -y \
    default-jre \
    jq \
    python-dev \
    python-pip \
    python-virtualenv \
    redis-server \
    zip \

The following will install and launch a known-good version of Kafka.

$ cd /tmp
$ curl -O
$ tar -xzf kafka_2.11-
$ cd kafka_2.11-
$ nohup bin/ \
    config/ \
    > ~/kafka.log 2>&1 &
$ export PATH="`pwd`/bin:$PATH"

The following will create the results and metrics topics in Kafka.

$ \
    --zookeeper \
    --create \
    --partitions 1 \
    --replication-factor 1 \
    --topic results

$ \
    --zookeeper \
    --create \
    --partitions 1 \
    --replication-factor 1 \
    --topic metrics

The following will launch Redis.

$ redis-server &

The following will create a virtual environment and install various Python-based dependencies.

$ virtualenv .ips
$ source .ips/bin/activate
$ pip install -r requirements.txt requirements-dev.txt

The following will bootstrap a local database.

$ cd ips
$ python migrate

The following will generate 4.7 million seed IP addresses that will be used by workers.

$ python gen_ips

Set the coordinator IP address:

$ python set_config

The following launches the web interface for the coordinator.

$ python runserver &

The following launches the look up worker, telemetry reporting and process that collects IP addresses from the coordinator.

$ python celeryd --concurrency=5 &
$ python celerybeat &
$ python get_ips_from_coordinator &

The following launches the process that collects the WHOIS records and stores unique CIDR blocks in Redis.

$ python collect_whois &


To see aggregated telemetry:

$ python telemetry

If you want to monitor celery's activity run the following:

$ watch 'python celery inspect stats'

To see the results of successful WHOIS queries:

$ \
    --zookeeper localhost:2181 \
    --topic results \

To continuously dump results to a file:

$ \
    --zookeeper localhost:2181 \
    --topic results \
    --from-beginning > output &

To see per-minute metrics from the workers:

$ \
    --zookeeper localhost:2181 \
    --topic metrics \


To run Ansible on a cloud service you first need to create an inventory file like the following.

$ vi devops/inventory
coord1 ansible_host=x.x.x.x ansible_user=ubuntu ansible_private_key_file=~/.ssh/ec2.pem

worker1 ansible_host=x.x.x.x ansible_user=ubuntu ansible_private_key_file=~/.ssh/ec2.pem
worker2 ansible_host=x.x.x.x ansible_user=ubuntu ansible_private_key_file=~/.ssh/ec2.pem
worker3 ansible_host=x.x.x.x ansible_user=ubuntu ansible_private_key_file=~/.ssh/ec2.pem

To provision and deploy run:

$ zip -r \ \
    ips/ *.txt \
    -x *.sqlite3 \
    -x *.pid \
    -x *.pyc

$ cd devops
$ ansible-playbook bootstrap.yml


Mass IPv4 WHOIS Collection Tool







No releases published