Skip to content

elastic/go-grok

Repository files navigation

Grok

Grok is grok parsing library based on re2 regexp.

Usage

Basic usage:

g := grok.New()

// use custom patterns
patternDefinitions := map[string]string{
    // patterns can be nested
    "NGINX_HOST":         `(?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port})?`,
    // NGINX_NOTSEPARATOR is used in NGINX_HOST. IP and NUMBER are part of default pattern set
    "NGINX_NOTSEPARATOR": `"[^\t ,:]+"`,
}
g.AddPatterns(patternDefinitions)

// compile grok before use, this will generate regex.Regex based on pattern and 
// subpatterns provided.
// this needs to be performed just once.
err := g.Compile("%{NGINX_HOST}", true)

res, err := g.ParseString("127.0.0.1:1234")

results in:

map[string]string {
    "destination.ip": "127.0.0.1", 
    "destination.port": "1234", 
}

Unnamed usage:

In this case we changed err := g.Compile("%{NGINX_HOST}", false) to err := g.Compile("%{NGINX_HOST}", true) allowing unnamed return matches. In case of unnamed match, definition name is used.

g := grok.New()

// use custom patterns
patternDefinitions := map[string]string{
    // patterns can be nested
    "NGINX_HOST":         `(?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port})?`,
    // NGINX_NOTSEPARATOR is used in NGINX_HOST. IP and NUMBER are part of default pattern set
    "NGINX_NOTSEPARATOR": `"[^\t ,:]+"`,
}
g.AddPatterns(patternDefinitions)

// compile grok before use, this will generate regex.Regex based on pattern and 
// subpatterns provided
err := g.Compile("%{NGINX_HOST}", false)

res, err := g.ParseString("127.0.0.1:1234")

results in:

map[string]string {
    "NGINX_HOST": "127.0.0.1:1234", 
    "destination.ip": "127.0.0.1", 
    "IPV4": "127.0.0.1", 
    "destination.port": "1234", 
    "BASE10NUM": "1234", 
}

Typed arguments usage:

In this case we're marking destination.port as int using definition %{NUMBER:destination.port:int}.

g := grok.New()

// use custom patterns
patternDefinitions := map[string]string{
    "NGINX_HOST":         `(?:%{IP:destination.ip}|%{NGINX_NOTSEPARATOR:destination.domain})(:%{NUMBER:destination.port:int})?`,
    "NGINX_NOTSEPARATOR": `"[^\t ,:]+"`,
}
g.AddPatterns(patternDefinitions)

// compile grok before use, this will generate regex.Regex based on pattern and 
// subpatterns provided
err := g.Compile("%{NGINX_HOST}", true)

res, err := g.ParseTypedString("127.0.0.1:1234")

See type changed from map[string]string to map[string]interface{} and destination.port is now a number:

map[string]interface {} {
    "destination.ip": "127.0.0.1", 
    "destination.port": 1234, 
}

Benchmarks

Comparing to github.com/vjeantet/grok and more optimized version based on previous one github.com/trivago/grok

BenchmarkParseString-10                 	   15466	     76811 ns/op	    4578 B/op	       5 allocs/op
BenchmarkParseStringRegexp-10           	   15351	     77109 ns/op	    3840 B/op	       3 allocs/op
BenchmarkParseStringTrivago-10          	   15868	     76416 ns/op	    4593 B/op	       5 allocs/op
BenchmarkParseStringVjeanet-10          	   15548	     77111 ns/op	    5897 B/op	       6 allocs/op

BenchmarkNestedParseString-10           	   42201	     28908 ns/op	    3463 B/op	       4 allocs/op
BenchmarkNestedParseStringTrivago-10    	   41937	     28836 ns/op	    3449 B/op	       4 allocs/op
BenchmarkNestedParseStringVjeanet-10    	   41080	     29174 ns/op	    4045 B/op	       5 allocs/op

BenchmarkTypedParseString-10            	   39934	     29707 ns/op	    3851 B/op	       9 allocs/op
BenchmarkTypedParseStringTrivago-10     	   40146	     29238 ns/op	    3475 B/op	       6 allocs/op
BenchmarkTypedParseStringVjeanet-10     	   39931	     30616 ns/op	    4196 B/op	      14 allocs/op

Default set of patterns

This library comes with a default set of patterns defined in patterns/default.go file. You can include more predefined patterns from patterns/*.go like so

g := grok.New()
g.AddPatterns(patterns.Rails) // to include whole set
g.AddPattern(patterns.Ruby["RUBY_LOGLEVEL"]) // to include specific one

Default set consists of:

Name Example
WORD "hello", "world123", "test_data"
NOTSPACE "example", "text-with-dashes", "12345"
SPACE " ", "\t", " "
INT "123", "-456", "+789"
NUMBER "123", "456.789", "-0.123"
BOOL "true", "false", "true"
BASE10NUM "123", "-123.456", "0.789"
BASE16NUM "1a2b", "0x1A2B", "-0x1a2b3c"
BASE16FLOAT "0x1.a2b3", "-0x1A2B3C.D"
POSINT "123", "456", "789"
NONNEGINT "0", "123", "456"
GREEDYDATA "anything goes", "literally anything", "123 #@!"
QUOTEDSTRING ""This is a quote"", "'single quoted'"
UUID "123e4567-e89b-12d3-a456-426614174000"
URN "urn:isbn:0451450523", "urn:ietf:rfc:2648"

Network patterns

Name Example
IP "192.168.1.1", "2001:0db8:85a3:0000:0000:8a2e:0370:7334"
IPV6 "2001:0db8:85a3:0000:0000:8a2e:0370:7334", "
IPV4 "192.168.1.1", "10.0.0.1", "172.16.254.1"
IPORHOST "example.com", "192.168.1.1", "fe80::1ff:fe23:4567:890a"
HOSTNAME "example.com", "sub.domain.co.uk", "localhost"
EMAILLOCALPART "john.doe", "alice123", "bob-smith"
EMAILADDRESS "john.doe@example.com", "alice123@domain.co.uk"
USERNAME "user1", "john.doe", "alice_123"
USER "user1", "john.doe", "alice_123"
MAC "00:1A:2B:3C:4D:5E", "001A.2B3C.4D5E"
CISCOMAC "001A.2B3C.4D5E", "001B.2C3D.4E5F", "001C.2D3E.4F5A"
WINDOWSMAC "00-1A-2B-3C-4D-5E", "00-1B-2C-3D-4E-5F"
COMMONMAC "00:1A:2B:3C:4D:5E", "00:1B:2C:3D:4E:5F"
HOSTPORT "example.com:80", "192.168.1.1:8080"

Paths patterns

Name Example
UNIXPATH "/home/user", "/var/log/syslog", "/tmp/abc_123"
TTY "/dev/pts/1", "/dev/tty0", "/dev/ttyS0"
WINPATH "C:\Program Files\App", "D:\Work\project\file.txt"
URIPROTO "http", "https", "ftp"
URIHOST "example.com", "192.168.1.1:8080"
URIPATH "/path/to/resource", "/another/path", "/root"
URIQUERY "key=value", "search=query&active=true"
URIPARAM "?key=value", "?search=query&active=true"
URIPATHPARAM "/path?query=1", "/folder/path?valid=true"
PATH "/home/user/documents", "C:\Windows\system32", "/var/log/syslog"

Datetime patterns

Name Example
MONTH "January", "Feb", "March", "Apr", "May", "Jun", "Jul", "August", "September", "October", "Nov", "December"
MONTHNUM "01", "02", "03", ... "11", "12"
DAY "Monday", "Tuesday", ... "Sunday"
YEAR "1999", "2000", "2021"
HOUR "00", "12", "23"
MINUTE "00", "30", "59"
SECOND "00", "30", "60"
TIME "14:30", "23:59:59", "12:00:00", "12:00:60"
DATE_US "04/21/2022", "12-25-2020", "07/04/1999"
DATE_EU "21.04.2022", "25/12/2020", "04-07-1999"
ISO8601_TIMEZONE "Z", "+02:00", "-05:00"
ISO8601_SECOND "59", "30", "60.123"
TIMESTAMP_ISO8601 "2022-04-21T14:30:00Z", "2020-12-25T23:59:59+02:00", "1999-07-04T12:00:00-05:00"
DATE "04/21/2022", "21.04.2022", "12-25-2020"
DATESTAMP "04/21/2022 14:30", "21.04.2022 23:59", "12-25-2020 12:00"
TZ "EST", "CET", "PDT"
DATESTAMP_RFC822 "Wed Jan 12 2024 14:33 EST"
DATESTAMP_RFC2822 "Tue, 12 Jan 2022 14:30 +0200", "Fri, 25 Dec 2020 23:59 -0500", "Sun, 04 Jul 1999 12:00 Z"
DATESTAMP_OTHER "Tue Jan 12 14:30 EST 2022", "Fri Dec 25 23:59 CET 2020", "Sun Jul 04 12:00 PDT 1999"
DATESTAMP_EVENTLOG "20220421143000", "20201225235959", "19990704120000"

Syslog patterns

Name Example
SYSLOGTIMESTAMP "Jan 1 00:00:00", "Mar 15 12:34:56", "Dec 31 23:59:59"
PROG "sshd", "kernel", "cron"
SYSLOGPROG "sshd[1234]", "kernel", "cron[5678]"
SYSLOGHOST "example.com", "192.168.1.1", "localhost"
SYSLOGFACILITY "<1.2>", "<12345.13456>"
HTTPDATE "25/Dec/2024:14:33 4"