Skip to content
[WIP] GOPA, a spider written in Golang, for Elasticsearch. DEMO: http://index.elasticsearch.cn
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
api update to align with framework Aug 23, 2018
cmd/backup allow to set from and size in backup cmd Jun 3, 2018
config auto generate schema for elasticsearch Aug 25, 2018
docs
model fix urlNormalization test, add rawFileName filed Jan 31, 2019
pipeline skip nofollow links in parse joint, Closes #40 Feb 20, 2019
plugins add back plugins Jan 31, 2019
static update Makefile and update to vfs Dec 26, 2018
test
ui remove default index_prefix, align to framework Jan 31, 2019
.gitignore ignore vendor folder Jan 8, 2019
.travis.yml update travis config Jan 31, 2019
CHANGES.md update CHANGES Jan 31, 2019
CODE_OF_CONDUCT.md Create CODE_OF_CONDUCT.md Jan 20, 2018
LICENSE update license Jan 23, 2017
Makefile
README.md
gopa.yml update default config Jan 31, 2019
main.go
stop.sh enable multi-instance on single node Jul 28, 2017

README.md

What a Spider!

GOPA, A Spider Written in Go.

Travis Go Report Card Join the chat at https://gitter.im/infinitbyte/gopa

Goal

  • Light weight, low footprint, memory requirement should < 100MB
  • Easy to deploy, no runtime or dependency required
  • Easy to use, no programming or scripts ability needed, out of box features

Screenshoot

What a Spider! GOPA Spider!


How to use

Requirements

  • Elasticsearch v5.3+

Setup

First of all, get it, two opinions: download the pre-built package or compile it yourself.

Download Pre Built Package

Go to Release page, download the right package for your platform.

Note: Darwin is for Mac

Compile The Package Manually

Requirements

  • Golang 1.9+

Supported platform

So far, we have:

gopa, the main program, a single binary.
gopa.yml, main configuration for gopa.

Required Config

Note: Elasticsearch version should >= v5.3

  • Enable elastic module in gopa.yml, update the elasticsearch's setting:
- name: elastic
  enabled: true
  kv_enabled: true
  orm_enabled: true
  elasticsearch:
    endpoint: http://localhost:9200
    index_prefix: gopa-
    username: elastic
    password: changeme

Start

Besides Elasticsearch, Gopa doesn't require any other dependencies, just simply run ./gopa to start the program.

Gopa can be run as daemon(Note: Only available on Linux and Mac):

Example
➜  gopa git:(master) ✗ ./bin/gopa --daemon
  ________ ________ __________  _____
 /  _____/ \_____  \\______   \/  _  \
/   \  ___  /   |   \|     ___/  /_\  \
\    \_\  \/    |    \    |  /    |    \
 \______  /\_______  /____|  \____|__  /
        \/         \/                \/
[gopa] 0.10.0_SNAPSHOT
///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///

[10-21 16:01:09] [INF] [instance.go:23] workspace: data/gopa/nodes/0 [gopa] started.

Also run ./gopa -h to get the full list of command line options.

Example
➜  gopa git:(master) ✗ ./bin/gopa -h
  ________ ________ __________  _____
 /  _____/ \_____  \\______   \/  _  \
/   \  ___  /   |   \|     ___/  /_\  \
\    \_\  \/    |    \    |  /    |    \
 \______  /\_______  /____|  \____|__  /
        \/         \/                \/
[gopa] 0.10.0_SNAPSHOT
///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 ///

Usage of ./bin/gopa: -config string the location of config file (default "gopa.yml") -cpuprofile string write cpu profile to this file -daemon run in background as daemon -debug run in debug mode, gopa will quit with panic error -log string the log level,options:trace,debug,info,warn,error (default "info") -log_path string the log path (default "log") -memprofile string write memory profile to this file -pidfile string pidfile path (only for daemon) -pprof string enable and setup pprof/expvar service, eg: localhost:6060 , the endpoint will be: http://localhost:6060/debug/pprof/ and http://localhost:6060/debug/vars

Stop

It's safety to press ctrl+c stop the current running Gopa, Gopa will handle the rest,saving the checkpoint, you may restore the job later, the world is still in your hand.

If you are running Gopa as daemon, you may stop it like this:

 kill -QUIT `pgrep gopa`

Configuration

UI

  • Search Console http://127.0.0.1:9000/
  • Admin Console http://127.0.0.1:9000/admin/

API

Architecture

What a Spider! GOPA Spider!

Contributing

You are sincerely and warmly welcomed to play with this project, from UI style to core features, or just a piece of document, welcome! let's make it better.

License

Released under the Apache License, Version 2.0 .

You can’t perform that action at this time.