Crawlie (the crawler)

Crawlie is a simple Elixir library for writing decently-performing crawlers with minimum effort.

Usage example

Inner workings

Crawlie uses Elixir's GenStage to parallelise the work. Most of the logic is handled by the Crawlie.Stage.UrlManager, which consumes the url collection passed by the user, receives the urls extracted by the subsequent processing, makes sure no url is processed more than once, makes sure that the "discovered urls" collection is as small as possible by traversing the url tree in a roughly depth-first manner.

The urls are requested from the Crawlie.Stage.UrlManager by a GenStage Flow, which in parallel fetches the urls using HTTPoison, and parses the responses using user-provided callbacks. Discovered urls get sent back to UrlManager.

Here's a rough diagram:

Statistics

If you're interested in the crawling statistics or want to track the progress in real time, see Crawlie.crawl_and_track_stats/3. It starts a Stats GenServer in Crawlie's supervision tree, which accumulates the statistics for the crawling session.

Configuration

See the docs for supported options.

Installation

The package can be installed as:

Add crawlie to your list of dependencies in mix.exs:

def deps do
  [{:crawlie, "~> 1.0.0"}]
end

Ensure crawlie is started before your application:

def application do
  [applications: [:crawlie]]
end

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
assets		assets
config		config
lib		lib
test		test
.gitignore		.gitignore
.tool-versions		.tool-versions
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE.txt		LICENSE.txt
README.md		README.md
mix.exs		mix.exs
mix.lock		mix.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawlie (the crawler)

Usage example

Inner workings

Statistics

Configuration

Installation

About

Releases

Packages

Languages

License

praxis-of-nines/crawlie

Folders and files

Latest commit

History

Repository files navigation

Crawlie (the crawler)

Usage example

Inner workings

Statistics

Configuration

Installation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages