cooper

Cooper is a fancy program created as an exercise for crawling urls and searching within them.

It consists of:

A Golang crawler which crawls, parses and stores the crawled data using TF-IDF in an SQLite database.
A Golang backend which acts as a server for the Cooper search.
A React.JS frontend which implements the search site called Cooper search.

Crawling

Just use the cooper crawler tool

Welcome to Cooper, an simple and lightweight crawler written in Golang!

           _=,_
        o_/6 /#\
        \__ |##/
        ='|--\
        /   #'-.
        \#|_   _'-. /
         |/ \_( # |"
        C/ ,--___/

Usage: 
  -base_url string 
      The base url where Cooper will start crawling.

  -include_query_params bool
        Should Cooper consider test.com?query and test.com as the same document?
        (default true)

  -limit int
        The maximum sites that Cooper should visit.
        (default 50)

  -load_existed_data
        Whether or not the existing crawled urls should be loaded.
        (default true)

  -server_mode
        Work in server mode for serving data to the cooper frontend.

  -threads int
        How many crawl threads should Cooper use.
        (default 2)

Cooper search

Open the backend with go run crawler -server-mode
Open the frontend with cd frontend && yarn start
If not opened automatically visit: http://localhost:3000 and search:

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
crawler		crawler
frontend		frontend
README.md		README.md
report.pdf		report.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cooper

Crawling

Cooper search

About

Releases

Packages

Languages

pkakelas/cooper

Folders and files

Latest commit

History

Repository files navigation

cooper

Crawling

Cooper search

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages