Skip to content

Commit

Permalink
Add ci bagde
Browse files Browse the repository at this point in the history
  • Loading branch information
maiphuong-van committed Sep 4, 2020
1 parent 8a117ac commit d94f3db
Showing 1 changed file with 12 additions and 9 deletions.
21 changes: 12 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,25 @@
# CrawlyUI

[![Build Status](https://travis-ci.org/oltarasenko/crawly_ui.svg?branch=master)](https://travis-ci.org/github/oltarasenko/crawly_ui)
[![Coverage Status](https://coveralls.io/repos/github/oltarasenko/crawly_ui/badge.svg?branch=master)](https://coveralls.io/github/oltarasenko/crawly_ui?branch=master)

## Motivation for the project

Web scraping is simple. You can easily fetch many pages with CURL or any other
old good client. However we don't believe it :)!

We thing that web scraping is complex if you do it commercially.
Most of the times we see that, indeed people can extract pages with simple curl
We thing that web scraping is complex if you do it commercially.
Most of the times we see that, indeed people can extract pages with simple curl
based client (or Floki + something), however they normally fail if it comes about
clear deliverable.
clear deliverable.

We think that Crawling is hard when it comes to:
1. Scalability - imagine you have SLA to deliver another million of items before the EOD
2. Data quality - it's easy to get some pages, it's hard to extract the right data
3. Javascript - most of the platforms would not even mention it :)
4. Coverage - what if you need to get literally all pages/products, do you really
plan to check it all manually using the command line?

## Our approach

We have created a platform which allows to manage jobs and to visualize items in the
Expand All @@ -27,7 +30,7 @@ And finally it's important to be able to compare extracted data with the data on
the target website.

We think that web scraping is a process! It involves development, debugging, QA
and finally maintenance. And that's what we're trying to achieve with CrawlyUI
and finally maintenance. And that's what we're trying to achieve with CrawlyUI
project.

## Trying it
Expand All @@ -41,7 +44,7 @@ You could run it locally using the following commands
This should bring the Crawly UI, Crawly worker and postgres database for you.
Now you can access the server from localhost:80

## Gallery
## Gallery

1. Main page. Schedule jobs here!
![Main Page](gallery/main_page.png?raw=true)
Expand All @@ -65,12 +68,12 @@ You can test the full chain (crawly-ui, crawly and sample spiders) locally with:
`docker-compose build`
`docker-compose up`

The interface will be available on localhost:4000 for your tests.
The interface will be available on localhost:4000 for your tests.

# Item previews

If your item has URL field you will get a nice preview capabilities, with the
help of iframe.
If your item has URL field you will get a nice preview capabilities, with the
help of iframe.

NOTE:

Expand Down

0 comments on commit d94f3db

Please sign in to comment.