Skip to content

pkakelas/cooper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 

Repository files navigation

cooper

logo

Cooper is a fancy program created as an exercise for crawling urls and searching within them.

It consists of:

  • A Golang crawler which crawls, parses and stores the crawled data using TF-IDF in an SQLite database.
  • A Golang backend which acts as a server for the Cooper search.
  • A React.JS frontend which implements the search site called Cooper search.

Crawling

Just use the cooper crawler tool

Welcome to Cooper, an simple and lightweight crawler written in Golang!

           _=,_
        o_/6 /#\
        \__ |##/
        ='|--\
        /   #'-.
        \#|_   _'-. /
         |/ \_( # |"
        C/ ,--___/

Usage: 
  -base_url string 
      The base url where Cooper will start crawling.

  -include_query_params bool
        Should Cooper consider test.com?query and test.com as the same document?
        (default true)

  -limit int
        The maximum sites that Cooper should visit.
        (default 50)

  -load_existed_data
        Whether or not the existing crawled urls should be loaded.
        (default true)

  -server_mode
        Work in server mode for serving data to the cooper frontend.

  -threads int
        How many crawl threads should Cooper use.
        (default 2)

Cooper search

  • Open the backend with go run crawler -server-mode
  • Open the frontend with cd frontend && yarn start
  • If not opened automatically visit: http://localhost:3000 and search:

logo

About

Just a fancy web crawler called cooper!

Resources

Stars

Watchers

Forks

Packages

No packages published