Skip to content

Commit

Permalink
Merge commit 'd5df7183af359ea1ffab91dcbf956f7e07eb1616'
Browse files Browse the repository at this point in the history
  • Loading branch information
Alp Toker committed Nov 12, 2017
2 parents a2a8cc7 + d5df718 commit a0f93d1
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 0 deletions.
44 changes: 44 additions & 0 deletions INSTALL.md
@@ -0,0 +1,44 @@
# Installing and running block-crawler

Pre-requisites
--------------

You need Node.js (version >= 8) and the npm package manager.

Installing dependencies
-----------------------

npm install

Running
-------

Simplest run:

node index.js http://starting.point.example/

(The Node.js executable may be named "nodejs" on your system)

Running with a collector:

node index.js --collector https://collector.example/ http://starting.point.example/

The collector has to be able to receive POST results and do something
with them. A very limited collector in Python+WSGI is:

def store(start_response, environ):
fileo = open("/var/storage/store.log", 'a')
status = '200 OK'
data = environ['wsgi.input'].read()
fileo.write(data)
fileo.close()
output = "Stored %i bytes\n" % len(data)
response_headers = [('Content-Type', 'text/plain'),
('Content-Length', str(len(output)))]
start_response(status, response_headers)
return [output]

The results (only the HTTP errors) will appear in JSON format in
/var/storage/store.log, for instance:

{"date":"2017-11-11T12:10:07.314Z","creator":"block-crawler","version":"0.1","url":"http://httpstat.us/451","status":451,"statusText":"Unavailable For Legal Reasons"}
4 changes: 4 additions & 0 deletions README.md
Expand Up @@ -20,6 +20,10 @@ Because HTTP 451 is typically used to 'geoblock' content, it is expected that va

Results are produced in a simple streaming JSON annotation format which identifies the affected URL, observed status code and status text and optional blocking entity. A single report entity identifies a one HTTP request at a specific point in time observed from a single IP address.

## Installing and running it

See INSTALL.md

## Status and contributor guidelines

This tool is under development and not yet recommended for use in production.
Expand Down

0 comments on commit a0f93d1

Please sign in to comment.