No description, website, or topics provided.
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib/embulk/parser
sample_config
sample_data
.gitignore
Gemfile
LICENSE.txt
README.md
Rakefile
embulk-parser-regexp.gemspec

README.md

Regexp parser plugin for Embulk

Overview

  • Plugin type: parser
  • Guess supported: no

Configuration

Obsolete configrations are removed from v1.2.0.

  • regexp: (Ruby's) regular expression with named capture. You can use Regexp::EXTENDED notation. In other words, /pat/x - Ignore whitespace and comments in the pattern.
  • use_timestamp: true or false. Default value is true. @timestamp field add from system time.
  • use_raw: true or false. Default value is true. @raw field add from line.
  • use_undefined: true or false. Default value is false. If the regexp don't match line, @undefined set to true. Else, @undefined field is set to false. If use_undefined would like to be set to true, use_raw is to be set true, because almost fields are empty.

Example

in:
  type: file
  path_prefix: ./sample_data/sample

  parser:
    # line = %q!Jan 07 05:00:01 1696 851/05/ - 192.168.100.84 - MS%5Cadmin [07/Jan/2016:05:00:00 +0900] "GET http://hogehoge.example.com/time HTTP/1.1" - Mozilla%2F4.0+(compatible%3B+MSIE+7.0%3B+Windows+NT+6.1%3B+Win64%3B+x64%3B+Trident%2F5.0%3B+.NET+CLR+2.0.50727%3B+SLCC2) 200 244 997 allow -1 -1 -1 default text%2Fplain 0 - - 22D28522!
    charset: UTF-8
    newline: LF
    type: regexp

    regexp:
      \A
      (?<received_time>\w{3}\s+\d{1,2}\s+\d{2}:\d{2}:\d{2})\s+
      (?<process_id>\d+)\s+
      (?<version>\S+)\s+
      (?<parent_proxy_server_ip>\S+)\s+
      (?<client_ip>\S+)\s+
      (?<computer_name>\S+)\s+
      (?<auth_user_name>\S+)\s+
      \[(?<connection_start_time>[^\]]+)\]\s+
      (?<method_refer>(?:[^\s"]+|"[^"]+"))\s+
      (?<request_info>\S+)\s+
      (?<user_agent>\S+)\s+
      (?<response_code>\S+)\s+
      (?<response_size>\d+)\s+
      (?<request_size>\d+)\s+
      (?<action>\S+)\s+
      (?<object_id>\S+)\s+
      (?<filter_reason_number>\S+)\s+
      (?<uri_category_number>\S+)\s+
      (?<group>\S+)\s+
      (?<content_type>\S+)\s+
      (?<ssl_internal_parameter>\S+)\s+
      (?<post_file_info>\S+)\s+
      (?<virus_name>\S+)\s+
      (?<checksum>\S+)\s*
      \Z
    
    use_raw: true
    use_timestamp: true
    use_undefined: true

out: {type: stdout}

Install

$ embulk gem install embulk-parser-regexp

Preview

$ embulk preview sample_config/sample_config.yml

Customize

If you would like to customize this plugin, you can check the behavior the following.

$ embulk preview sample_config/sample_config.yml -I lib