Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
simple favicon crawler
branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
lib
t
test
Changes
LICENSE
MANIFEST
Makefile.PL
README.pod
ignore.txt

README.pod

fav_bot - simple command line util for fetching favicons

Features

  • Fetch favicon main page and search favicons in meta tags. If not found trying fetch standart /favicon.ico
  • Fetch robots.txt for favicon's domain. If favicon disallowed - skip this domain.
  • Store 16x16 .png in dir named as domain

Quick start

  git clone git@github.com:nordicdyno/fav_bot.git fav_bot
  cd fav_bot
  sh ./test/call_fetcher.sh

this fetch host's favicons from file ./test/uris.txt

result in dir:

  ./test/rslt

Installation

check that you have required system libs:

  lib-png
  lib-jpeg
  lib-gif

install CPAN-minus

  sudo cpan App::cpanminus

install code to local dir:

  mkdir ~/myCPAN
  cd fav_bot
  cpanm -n -L ~/myCPAN  .

run: perl -I ~/myCPAN/lib/perl5 ~/myCPAN/bin/fetch_favicons.pl

Usage

  fetch_favicons.pl --store-dir=<store-original-images-dir>  --out-dir=<result-dir>
                    --url-source=<file-with-grabed-domain-uris> 
                    [ --n-parallel=N ]
   options:
    --store-dir     - dir where to store original favicon's images (for simple debug)
                      (TODO : use /tmp dir by default)
    --out-dir       - dir where to store result png 16x16 files

    --n-parallel     - count of forks (default=4)

TODO

  • write POD
  • improve tests
  • better logging
Something went wrong with that request. Please try again.