Rust Web Crawler saving pages on Redis
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
doc
man/man1
src Clippy fix Oct 15, 2018
tests
.editorconfig
.gitignore Inital commit May 2, 2016
.travis.yml
CHANGELOG.md Bump to 0.13.0 Oct 15, 2018
Cargo.toml Remove clippy from Cargo.toml Oct 15, 2018
LICENSE
Makefile
README.md Add deps status badge Apr 1, 2018
rustfmt.toml

README.md

Maman

Maman is a Rust Web Crawler saving pages on Redis.

Pages are send to list <MAMAN_ENV>:queue:maman using Sidekiq job format

{
"class": "Maman",
"jid": "b4a577edbccf1d805744efa9",
"retry": true,
"created_at": 1461789979, "enqueued_at": 1461789979,
"args": {
    "document":"<html><body><a href='#' /><a href='/new' /></html>",
    "urls": ["https://example.net/new"],
    "headers": {"content-type": "text/html"},
    "url": "https://example.net/"
    }
}

Dependencies

Installation

With cargo

cargo install maman

With make

PREFIX=~/.local make install

Usage

maman URL [LIMIT] [MIME_TYPES]

LIMIT must be an integer or 0 is the default, meaning no limit.

Environment variables

Defaults

  • MAMAN_ENV=development
  • REDIS_URL="redis://127.0.0.1/"

Others

  • RUST_LOG=maman=info

LICENSE

The MIT License

Copyright (c) 2016-2018 Laurent Arnoud laurent@spkdev.net


Build Version Documentation License Project status Dependency status