Experimental UI

We believe that web scraping is a process. It might seem easy to extract first data items, however we believe that the data delivery requires a bit more efforts or a process which supports it!

Our aim is to provide you with the following services:

Schedule (start and stop) your spiders on a cloud
View running jobs (performance based analysis)
View and validate scraped items for quality assurance and data analysis purposes.
View individual items and compare them with the actual website.

Setting it up

You can find setup examples here

On the highest level it's required to:

Add SendToUI pipeline to the list of your item pipelines (before encoder pipelines) {Crawly.Pipelines.Experimental.SendToUI, ui_node: :'ui@127.0.0.1'}
Organize erlang cluster so Crawly nodes can find CrawlyUI node in the example above I was using erlang-node-discovery application for this task, however any other alternative would also work. For setting up erlang-node-discovery

add the following code dependency to deps section of mix.exs {:erlang_node_discovery, git: "https://github.com/oltarasenko/erlang-node-discovery"}
add the following lines to the config.exs:

          hosts: ["127.0.0.1", "crawlyui.com"],
          node_ports: [
            {:ui, 0}
          ]

Testing it locally with a docker-compose

CrawlyUI ships with a docker compose which brings up UI, worker and database nodes, so everything is ready for testing with just one command.

In order to try it:

clone crawly_ui repo: git clone git@github.com:oltarasenko/crawly_ui.git
build ui and worker nodes: docker-compose build
apply migrations: docker-compose run ui bash -c "/crawlyui/bin/ec eval \"CrawlyUI.ReleaseTasks.migrate\""
run it all: docker-compose up

Live demo

Live demo is available as well. However it might be a bit unstable due to our continuous release process. Please give it a try and let us know what do you think

Live Demo

Items browser

One of the cool features of the CrawlyUI is items browser which allows comparing extracted data with a target website loaded in the IFRAME. However, as sites may block iframes, a workaround browser extension may be used to ignore X-Frame headers. For example: Chrome extension

Provide feedback

Saved searches

Use saved searches to filter your results more quickly