Running

This project aims to take in a list of website and see what HTTP headers those sites are returning, looking for what's popular. Based on a project from Summer 2015, this repo represents the more interesting parts.

Have a look at the sql/schema.sql to get an idea of how the data is stored once collected. If you're interested in Event Loops then [this post is worth a read] (https://falkus.co/2016/04/using-an-event-loop-for-multiple-http-requests/).

Running

Setup the database and user:

mysql> CREATE DATABASE popular_headers;
mysql> CREATE USER 'ph_user'@'127.0.0.1' IDENTIFIED BY 'your-secure-password';
mysql> GRANT ALL PRIVILEGES ON popular_headers.* TO 'ph_user'@'127.0.0.1';

Load the schema, mysql -u ph_user -p popular_headers < sql/schema.sql
Update the configuration file, etc/config.yaml, with your database credentials
Pipe a list of sites in to bin/gather-headers.

    wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
    unzip top-1m.csv.zip
    cat top-1m.csv | head -n 10000 | bin/gather-headers

Query the DB for results.

Requirements

Running from Ubuntu, the only extras you'll need are a few additional perl libraries:

sudo apt-get install \
    libmojolicious-perl # for Mojo::UserAgent
    libconfig-yaml-perl # for YAML
    libreadonly-perl    # for Readonly

Notes

Each script and perl module has perldoc documentation that should be up-to-date and contain more information than this README.

The default schema stores the header value in a VARCHAR(750). This means that really long header values over this length will be truncated, something to bear in mind.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
bin		bin
etc		etc
lib/PopularHeaders		lib/PopularHeaders
sql		sql
.gitignore		.gitignore
LICENSE		LICENSE
README.mdown		README.mdown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

etc

etc

lib/PopularHeaders

lib/PopularHeaders

sql

sql

.gitignore

.gitignore

LICENSE

LICENSE

README.mdown

README.mdown

Repository files navigation

Running

Requirements

Notes

About

Releases

Packages

Languages

License

mfalkus/popular-headers

Folders and files

Latest commit

History

Repository files navigation

Running

Requirements

Notes

About

Resources

License

Stars

Watchers

Forks

Languages