Skip to content

nekrassov01/access-log-parser

Repository files navigation

access-log-parser

CI codecov Go Reference Go Report Card

Simple access log parser utilities written in Go

Features

  • Flexible serialization of log lines
  • Streaming processing support
  • Line filtering by filter expressions like size < 100 method == GET remote_host =~ ^192.168.
  • Display column selection by field name
  • Line skipping by line number
  • Customization by handler functions
  • Various preset constructors for well-known log formats
  • LTSV format support

Supported log format

Note

Various AWS log formats are supported by default.

  • Apache common/combined log format
  • Apache common/combined log format with virtual host
  • Amazon S3 access log format
  • Amazon CloudFront access log format
  • AWS Application Load Balancer access log format
  • AWS Network Load Balancer access log format
  • AWS Classic Load Balancer access log format
  • LTSV format
  • TSV format

Usage

Example

Output format

Parsed log lines are sequentially output to Writer. After the parsing is finished, the total result is output.

// Result encapsulates the outcomes of parsing operations, detailing matched, unmatched, excluded,
// and skipped line counts, along with processing time and source information.
type Result struct {
	Total       int           `json:"total"`                // Total number of processed lines.
	Matched     int           `json:"matched"`              // Count of lines that matched the patterns.
	Unmatched   int           `json:"unmatched"`            // Count of lines that did not match any patterns.
	Excluded    int           `json:"excluded"`             // Count of lines excluded based on keyword search.
	Skipped     int           `json:"skipped"`              // Count of lines skipped explicitly.
	ElapsedTime time.Duration `json:"elapsedTime"`          // Processing time for the log data.
	Source      string        `json:"source"`               // Source of the log data.
	ZipEntries  []string      `json:"zipEntries,omitempty"` // List of processed zip entries, if applicable.
	Errors      []Errors      `json:"errors"`               // Collection of errors encountered during parsing.
	inputType   inputType     `json:"-"`                    // Type of input being processed.
}

// Errors stores information about log lines that couldn't be parsed
// according to the provided patterns. This helps in tracking and analyzing
// log lines that do not conform to expected formats.
type Errors struct {
	Entry      string `json:"entry,omitempty"` // Optional entry name if the log came from a zip file.
	LineNumber int    `json:"lineNumber"`      // Line number of the problematic log entry.
	Line       string `json:"line"`            // Content of the problematic log line.
}

The struct Result implements fmt.Stringer as follows:

/* SUMMARY */

+-------+---------+-----------+----------+---------+-------------+--------------------------------+
| Total | Matched | Unmatched | Excluded | Skipped | ElapsedTime | Source                         |
+-------+---------+-----------+----------+---------+-------------+--------------------------------+
|     5 |       4 |         1 |        0 |       0 | 1.16375ms   | sample_s3_contains_unmatch.log |
+-------+---------+-----------+----------+---------+-------------+--------------------------------+

Total     : Total number of log line processed
Matched   : Number of log line that successfully matched pattern
Unmatched : Number of log line that did not match any pattern
Excluded  : Number of log line that did not extract by filter expressions
Skipped   : Number of log line that skipped by line number

/* UNMATCH LINES */

+------------+------------------------------------------------------------------------------------------------------+
| LineNumber | Line                                                                                                 |
+------------+------------------------------------------------------------------------------------------------------+
|          4 | d45e67fa89b012c3a45678901b234c56d78a90f12b3456789a012345c6789d01 awsrandombucket89 [03/Feb/2019:03:5 |
|            | 4:33 +0000] 192.0.2.76 d45e67fa89b012c3a45678901b234c56d78a90f12b3456789a012345c6789d01 7B4A0FABBEXA |
|            | MPLE REST.GET.VERSIONING - "GET /awsrandombucket89?versioning HTTP/1.1" 200 - 113 - 33 - "-" "S3Cons |
|            | ole/0.4"                                                                                             |
+------------+------------------------------------------------------------------------------------------------------+

LineNumber : Line number of the log that did not match any pattern
Line       : Raw log line that did not match any pattern

Customize

The processing of each matched row can be overridden.

p := parser.NewRegexParser(ctx, os.Stdout, parser.Option{
	LineHandler: yourCustomLineHandler,
})

The following function type must be followed:

// LineHandler is a function type that processes each matched line.
type LineHandler func(labels, values []string, isFirst bool) (string, error)

Note

The reason we did not use maps is that the measured results were almost identical when the overhead of setting the order keep is taken into account. (However, we did not take a very rigorous benchmark.)

The following handlers are preset:

  • JSON (default): JSONLineHandler
  • Pretty JSON: PrettyJSONLineHandler
  • key=value pair: KeyValuePairLineHandler
  • LTSV: LTSVLineHandler
  • TSV: TSVLineHandler

Preset Constructors

Functions are provided by default to instantiate the following parsers:

  • Apache common/combined log format: NewApacheCLFRegexParser()
  • Apache common/combined log format with virtual host: NewApacheCLFWithVHostRegexParser()
  • Amazon S3 access log format: NewS3RegexParser()
  • Amazon CloudFront access log format: NewCFRegexParser()
  • AWS Application Load Balancer access log format: NewALBRegexParser()
  • AWS Network Load Balancer access log format: NewNLBRegexParser()
  • AWS Classic Load Balancer access log format: NewCLBRegexParser()

Sample

alpen is an application for parsing and encoding various access logs.

Todo

  • Support for time in filter expressions like: time < 1710141640
  • Refine the specification to allow KeyValuePairLineHandler to be used as logfmt

Author

nekrassov01

License

MIT