Skip to content

k1LoW/harvest

main
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
db
 
 
doc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Harvest Build Status GitHub release Go Report Card

Portable log aggregation tool for middle-scale system operation/troubleshooting.

screencast

Harvest provides the hrv command with the following features.

  • Agentless.
  • Portable.
  • Only 1 config file.
  • Fetch various remote/local log data via SSH/exec/Kubernetes API. ( hrv fetch )
  • Output all fetched logs in the order of timestamp. ( hrv cat )
  • Stream various remote/local logs via SSH/exec/Kubernetes API. ( hrv stream )
  • Copy remote/local raw logs via SSH/exec. ( hrv cp )

Quick Start ( for Kubernetes )

$ hrv generate-k8s-config > cluster.yml
$ hrv stream -c cluster.yml --tag='kube_apiserver or coredns' --with-path --with-timestamp

Usage

🪲 Fetch and output remote/local log data

1. Set log sources (and log type) in config.yml

---
targetSets:
  -
    description: webproxy syslog
    type: syslog
    sources:
      - 'ssh://webproxy.example.com/var/log/syslog*'
    tags:
      - webproxy
      - syslog
  -
    description: webproxy NGINX access log
    type: combinedLog
    sources:
      - 'ssh://webproxy.example.com/var/log/nginx/access_log*'
    tags:
      - webproxy
      - nginx
  -
    description: app log
    type: regexp
    regexp: 'time:([^\t]+)'
    timeFormat: 'Jan 02 15:04:05' # Golang time format and 'unixtime'
    timeZone: '+0900'
    sources:
      - 'ssh://app-1.example.com/var/log/ltsv.log*'
      - 'ssh://app-2.example.com/var/log/ltsv.log*'
      - 'ssh://app-3.example.com/var/log/ltsv.log*'
    tags:
      - app
  -
    description: db dump log
    type: regexp
    regexp: '"ts":"([^"]+)"'
    timeFormat: '2006-01-02T15:04:05.999-0700'
    sources:
      - 'ssh://db.example.com/var/log/tcpdp/eth0/dump*'
    tags:
      - db
      - query
  -
    description: PostgreSQL log
    type: regexp
    regexp: '^\[?(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2} \w{3})'
    timeFormat: '2006-01-02 15:04:05 MST'
    multiLine: true
    sources:
      - 'ssh://db.example.com/var/log/postgresql/postgresql*'
    tags:
      - db
      - postgresql
  -
    description: local Apache access log
    type: combinedLog
    sources:
      - 'file:///path/to/httpd/access.log'
    tags:
      - httpd
-
    description: api on Kubernetes
    type: k8s
    sources:
      - 'k8s://context-name/namespace/pod-name*'
    tags:
      - api
      - k8s

You can use hrv configtest for config test.

$ hrv configtest -c config.yml

2. Fetch target log data via SSH/exec/Kubernetes API ( hrv fecth )

$ hrv fetch -c config.yml --tag=webproxy,db

3. Output log data ( hrv cat )

$ hrv cat harvest-20181215T2338+900.db --with-timestamp --with-host --with-path | less -R

4. Count log data ( hrv count )

$ hrv count harvest-20191015T2338+900.db -g minute -g webproxy -b db
ts      webproxy db
2019-09-24 08:01:00     9618    5910
2019-09-24 08:02:00     9767    5672
2019-09-24 08:03:00     10815   7394
2019-09-24 08:04:00     11782   7109
2019-09-24 08:05:00     9896    6346
[...]
2019-09-24 08:24:00     11619   5646
2019-09-24 08:25:00     10541   6097
2019-09-24 08:26:00     11336   5264
2019-09-24 08:27:00     1102    5261
2019-09-24 08:28:00     1318    6660
2019-09-24 08:29:00     10362   5663
2019-09-24 08:30:00     11136   5373
2019-09-24 08:31:00     1748    1340

🪲 Stream remote/local logs

1. Set config.yml

2. Stream target logs via SSH/exec/Kubernetes API ( hrv stream )

$ hrv stream -c config.yml --with-timestamp --with-host --with-path --with-tag

🪲 Copy remote/local raw logs

1. Set config.yml

2. Copy remote/local raw logs to local directory via SSH/exec ( hrv cp )

$ hrv cp -c config.yml

--tag filter operators

The following operators can be used to filter targets

not, and, or, !, &&, ||

$ hrv stream -c config.yml --tag='webproxy or db' --with-timestamp --with-host --with-path

, is converted to or

$ hrv stream -c config.yml --tag='webproxy,db'

is converted to

$ hrv stream -c config.yml --tag='webproxy or db'

--source filter

filter targets using source regexp

$ hrv fetch -c config.yml --source='app-[0-9].example'

Architecture

hrv fetch and hrv cat

img

hrv stream

img

Installation

$ brew install k1LoW/tap/harvest

or

$ go get github.com/k1LoW/harvest/cmd/hrv

What is "middle-scale system"?

  • < 50 instances
  • < 1 million logs per hrv fetch

What if you are operating a large-scale/super-large-scale/hyper-large-scale system?

Let's consider agent-base log collector/platform, service mesh and distributed tracing platform!

Internal

Requirements

  • UNIX commands
    • date
    • find
    • grep
    • head
    • ls
    • tail
    • xargs
    • zcat
  • sudo
  • SQLite

WANT

  • tag DAG
  • Viewer / Visualizer

References

  • Hayabusa: A Simple and Fast Full-Text Search Engine for Massive System Log Data
    • Make simple with a combination of commands.
    • Full-Text Search Engine using SQLite FTS.
  • stern: ⎈ Multi pod and container log tailing for Kubernetes
    • Multiple Kubernetes log streaming architecture.