Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
545 lines (348 sloc) 17.7 KB

NAME

es-search.pl - Provides a CLI for quick searches of data in ElasticSearch daily indexes

VERSION

version 7.0

SYNOPSIS

es-search.pl [search string]

Options:

--help              print help
--manual            print full manual
--filter            Force filter context for all query elements
--show              Comma separated list of fields to display, default is ALL, switches to tab output
--tail              Continue the query until CTRL+C is sent
--top               Perform an aggregation on the fields, by a comma separated list of up to 2 items
--by                Perform an aggregation using the result of this, example: --by cardinality:src_ip
--with              Perform a sub aggregation on the query
--bg-filter         Only used if --top aggregation is significant_terms, applies a background filter
--match-all         Enables the ElasticSearch match_all operator
--interval          When running aggregations, wrap the aggreation in a date_histogram with this interval
--prefix            Takes "field:string" and enables the Lucene prefix query for that field
--exists            Field which must be present in the document
--missing           Field which must not be present in the document
--size              Result size, default is 20, aliased to -n and --limit
--all               Don't consider result size, just give me *everything*
--asc               Sort by ascending timestamp
--desc              Sort by descending timestamp (Default)
--sort              List of fields for custom sorting
--format            When --show isn't used, use this method for outputting the record, supported: json, jsonpretty, yaml
                    json assumes --no-decorator as we assume you're piping through jq
--pretty            Where possible, use JSON->pretty
--no-decorators     Do not show the header with field names in the query results
--no-header         Same as above
--no-implications   Don't attempt to imply filters from statistical aggregations
--fields            Display the field list for this index!
--bases             Display the index base list for this cluster.
--timestamp         Field to use as the date object, default: @timestamp

From App::ElasticSearch::Utilities:

--local         Use localhost as the elasticsearch host
--host          ElasticSearch host to connect to
--port          HTTP port for your cluster
--proto         Defaults to 'http', can also be 'https'
--http-username HTTP Basic Auth username
--http-password HTTP Basic Auth password (if not specified, and --http-user is, you will be prompted)
--password-exec Script to run to get the users password
--noop          Any operations other than GET are disabled, can be negated with --no-noop
--timeout       Timeout to ElasticSearch, default 30
--keep-proxy    Do not remove any proxy settings from %ENV
--index         Index to run commands against
--base          For daily indexes, reference only those starting with "logstash"
                 (same as --pattern logstash-* or logstash-DATE)
--datesep       Date separator, default '.' also (--date-separator)
--pattern       Use a pattern to operate on the indexes
--days          If using a pattern or base, how many days back to go, default: 1

See also the "CONNECTION ARGUMENTS" and "INDEX SELECTION ARGUMENTS" sections from App::ElasticSearch::Utilities.

From CLI::Helpers:

--data-file         Path to a file to write lines tagged with 'data => 1'
--tags              A comma separated list of tags to display
--color             Boolean, enable/disable color, default use git settings
--verbose           Incremental, increase verbosity (Alias is -v)
--debug             Show developer output
--debug-class       Show debug messages originating from a specific package, default: main
--quiet             Show no output (for cron)
--syslog            Generate messages to syslog as well
--syslog-facility   Default "local0"
--syslog-tag        The program name, default is the script name
--syslog-debug      Enable debug messages to syslog if in use, default false
--nopaste           Use App::Nopaste to paste output to configured paste service
--nopaste-public    Defaults to false, specify to use public paste services
--nopaste-service   Comma-separated App::Nopaste service, defaults to Shadowcat

DESCRIPTION

This tool takes a search string parameter to search the cluster. It is in the format of the Lucene query string

Examples might include:

# Search for past 10 days vhost admin.example.com and client IP 1.2.3.4
es-search.pl --days=10 --size=100 dst:"admin.example.com" AND src_ip:"1.2.3.4"

# Search for all apache logs past with status 500
es-search.pl program:"apache" AND crit:500

# Search for all apache logs with status 500 show only file and out_bytes
es-search.pl program:"apache" AND crit:500 --show file,out_bytes

# Search for ip subnet client IP 1.2.3.0 to 1.2.3.255 or 1.2.0.0 to 1.2.255.255
es-search.pl --size=100 dst:"admin.example.com" AND src_ip:"1.2.3.0/24"
es-search.pl --size=100 dst:"admin.example.com" AND src_ip:"1.2.0/16"

# Show the top src_ip for 'www.example.com'
es-search.pl --base access dst:www.example.com --top src_ip

# Tail the access log for www.example.com 404's
es-search.pl --base access --tail --show src_ip,file,referer_domain dst:www.example.com AND crit:404

NAME

es-search.pl - Search a logging cluster for information

OPTIONS

  • help

    Print this message and exit

  • manual

    Print detailed help with examples

  • filter

    Forces filter context for all query parameters, the default is using query context.

  • show

    Comma separated list of fields to display in the dump of the data

      --show src_ip,crit,file,out_bytes
    
  • sort

    Use this option to sort your documents on fields other than the timestamp. Fields are given as a comma separated list:

      --sort field1,field2
    

    To specify per-field sort direction use:

      --sort field1:asc,field2:desc
    

    Using this option together with --asc, --desc or --tail is not possible.

  • format

    Output format to use when the full record is dumped. The default is 'yaml', but 'json' is also supported.

      --format json
    
  • tail

    Repeats the query every second until CTRL+C is hit, displaying new results. Due to the implementation, this mode enforces that only the most recent indices are searched. Also, given the output is continuous, you must specify --show with this option.

  • top

    Perform an aggregation returning the top field. Limited to a single field at this time. This option is not available when using --tail.

      --top src_ip
    

    You can override the default of the terms bucket aggregation by prefixing the parameter with the required bucket aggregation, i.e.:

      --top significant_terms:src_ip
    
  • by

    Perform a sub aggregation on the top terms aggregation and order by the result of this aggregation. Aggregation syntax is as follows:

      --by <type>:<field>
    

    A full example might look like this:

      $ es-search.pl --base access dst:www.example.com --top src_ip --by cardinality:acct
    

    This will show the top source IP's ordered by the cardinality (count of the distinct values) of accounts logging in as each source IP, instead of the source IP with the most records.

    Supported sub agggregations and formats:

      cardinality:<field>
      min:<field>
      max:<field>
      avg:<field>
      sum:<field>
    
  • with

    Perform a subaggregation on the top terms and report that sub aggregation details in the output. The format is:

      --with <aggregation>:<field>:<size>
    

    The default size is 3. The default aggregation is 'terms'.

    field is the only required element.

    e.g.

      $ es-search.pl --base logstash error --top program --size 2 --by cardinality:host --with host:5
    

    This will show the top 2 programs with log messages containing the word error by the cardinality (count distinct host) of hosts showing the top 5 hosts

    Without the --with, the results might look like this:

      112314 sshd
      21224  ntp
    

    The --with option would expand that output to look like this:

      112314   host   bastion-804   12431   sshd
      112314   host   bastion-803   10009   sshd
      112314   host   bastion-805   9768    sshd
      112314   host   bastion-801   8789    sshd
      112314   host   bastion-802   4121    sshd
      21224    host   webapp-324    21223   ntp
      21224    host   mail-42       1       ntp
    

    This may be specified multiple times, the result is more rows, not more columns, e.g.

      $ es-search.pl --base logstash error --top program --size 2 --by cardinality:host --with host:5 --with dc:2
    

    Produces:

      112314   dc     arlington     112314  sshd
      112314   host   bastion-804   12431   sshd
      112314   host   bastion-803   10009   sshd
      112314   host   bastion-805   9768    sshd
      112314   host   bastion-801   8789    sshd
      112314   host   bastion-802   4121    sshd
      21224    dc     amsterdam     21223   ntp
      21224    dc     la            1       ntp
      21224    host   webapp-324    21223   ntp
      21224    host   mail-42       1       ntp
    

    You may sub aggregate using any bucket agggregation as long as the aggregation provides a key element. Additionally, doc_count, score, and bg_count will be reported in the output.

    Other examples:

      --with significant_terms:crime
      --with cardinality:accts
      --with min:out_bytes
      --with max:out_bytes
      --with avg:out_bytes
      --with sum:out_bytes
      --with stats:out_bytes
      --with extended_stats:out_bytes
      --with percentiles:out_bytes
      --with percentiles:out_bytes:50,95,99
      --with histogram:out_bytes:1024
    
  • bg-filter

    Only used if the --top aggregation is significant_terms. Sets the background filter for the significant_terms aggregation.

      es-search.pl --top significant_terms:src_ip method:POST file:\/get\/sensitive_data --bg-filter method:POST
    
  • interval

    When performing aggregations, wrap those aggregations in a date_histogram of this interval. This helps flush out "what changed in the last hour."

  • match-all

    Apply the ElasticSearch "match_all" search operator to query on all documents in the index. This is the default with no search parameters.

  • prefix

    Takes a "field:string" combination and you can use multiple --prefix options will be "AND"'d

    Example:

      --prefix useragent:'Go '
    

    Will search for documents where the useragent field matches a prefix search on the string 'Go '

    JSON Equivalent is:

      { "prefix": { "useragent": "Go " } }
    
  • exists

    Filter results to those containing a valid, not null field

      --exists referer
    

    Only show records with a referer field in the document.

  • missing

    Filter results to those not containing a valid, not null field

      --missing referer
    

    Only show records without a referer field in the document.

  • bases

    Display a list of bases that can be used with the --base option.

    Use with --verbose to show age information on the indexes in each base.

  • fields

    Display a list of searchable fields

  • index

    Search only this index for data, may also be a comma separated list

  • days

    The number of days back to search, the default is 5

  • base

    Index base name, will be expanded using the days back parameter. The default is 'logstash' which will expand to 'logstash-YYYY.MM.DD'

  • timestamp

    The field in your documents that we'll treat as a "date" type in our queries.

    May also be specified in the ~/.es-utils.yaml file per index, or index base:

      ---
      host: es-readonly-01
      port: 9200
      meta:
        bro:
          timestamp: 'record_ts'
        mayans-2012.12.21:
          timestamp: 'end_of_the_world'
    

    Then running:

      # timestamp is set to '@timestamp', the default
      es-search.pl --base logstash --match-all
    
      # timestamp is set to 'record_ts', from ~/.es-utils.yaml
      es-search.pl --base bro --match-all
    
      # timestamp is set to '@timestamp', the default
      es-search.pl --base mayans --match-all
    
      # timestamp is set to 'end_of_the_world', from ~/.es-utils.yaml
      es-search.pl --index mayans-2012.12.21 --match-all
    
  • size

    The number of results to show, default is 20.

  • all

    If specified, ignore the --size parameter and show me everything within the date range I specified. In the case of --top, this limits the result set to 1,000,000 results.

Extended Syntax

The search string is pre-analyzed before being sent to ElasticSearch. The following plugins work to manipulate the query string and provide richer, more complete syntax for CLI applications.

App::ElasticSearch::Utilities::QueryString::AutoEscape

Provide an '=' prefix to a query string parameter to promote that parameter to a term filter.

This allows for exact matches of a field without worrying about escaping Lucene special character filters.

E.g.:

user_agent:"Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1"

Is evaluated into a weird query that doesn't do what you want. However:

=user_agent:"Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1"

Is translated into:

{ term => { user_agent => "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.0 Mobile/15E148 Safari/604.1" } }

Which provides an exact match to the term in the query.

App::ElasticSearch::Utilities::QueryString::Barewords

The following barewords are transformed:

or => OR
and => AND
not => NOT

App::ElasticSearch::Utilities::QueryString::IP

If a field is an IP address uses CIDR Notation, it's expanded to a range query.

src_ip:10.0/8 => src_ip:[10.0.0.0 TO 10.255.255.255]

App::ElasticSearch::Utilities::QueryString::Ranges

This plugin translates some special comparison operators so you don't need to remember them anymore.

Example:

price:<100

Will translate into a:

{ range: { price: { lt: 100 } } }

And:

price:>50,<100

Will translate to:

{ range: { price: { gt: 50, lt: 100 } } }

Supported Operators

gt via >, gte via >=, lt via <, lte via <=

App::ElasticSearch::Utilities::QueryString::Underscored

This plugin translates some special underscore surrounded tokens into the Elasticsearch Query DSL.

Implemented:

_prefix_

Example query string:

_prefix_:useragent:'Go '

Translates into:

{ prefix => { useragent => 'Go ' } }

App::ElasticSearch::Utilities::QueryString::FileExpansion

If the match ends in .dat, .txt, .csv, or .json then we attempt to read a file with that name and OR the condition:

$ cat test.dat
50  1.2.3.4
40  1.2.3.5
30  1.2.3.6
20  1.2.3.7

Or

$ cat test.csv
50,1.2.3.4
40,1.2.3.5
30,1.2.3.6
20,1.2.3.7

Or

$ cat test.txt
1.2.3.4
1.2.3.5
1.2.3.6
1.2.3.7

Or

$ cat test.json
{ "ip": "1.2.3.4" }
{ "ip": "1.2.3.5" }
{ "ip": "1.2.3.6" }
{ "ip": "1.2.3.7" }

We can source that file:

src_ip:test.dat      => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)
src_ip:test.json[ip] => src_ip:(1.2.3.4 1.2.3.5 1.2.3.6 1.2.3.7)

This make it simple to use the --data-file output options and build queries based off previous queries. For .txt and .dat file, the delimiter for columns in the file must be either a tab or a null. For files ending in .csv, Text::CSV_XS is used to accurate parsing of the file format. Files ending in .json are considered to be newline-delimited JSON.

You can also specify the column of the data file to use, the default being the last column or (-1). Columns are zero-based indexing. This means the first column is index 0, second is 1, .. The previous example can be rewritten as:

src_ip:test.dat[1]

or: src_ip:test.dat[-1]

For newline delimited JSON files, you need to specify the key path you want to extract from the file. If we have a JSON source file with:

{ "first": { "second": { "third": [ "bob", "alice" ] } } }
{ "first": { "second": { "third": "ginger" } } }
{ "first": { "second": { "nope":  "fred" } } }

We could search using:

actor:test.json[first.second.third]

Which would expand to:

{ "terms": { "actor": [ "alice", "bob", "ginger" ] } }

This option will iterate through the whole file and unique the elements of the list. They will then be transformed into an appropriate terms query.

App::ElasticSearch::Utilities::QueryString::Nested

Implement the proposed nested query syntax early. Example:

nested_path:"field:match AND string"

Meta-Queries

Helpful in building queries is the --bases and --fields options which lists the index bases and fields:

es-search.pl --bases

es-search.pl --fields

es-search.pl --base access --fields

AUTHOR

Brad Lhotsky brad@divisionbyzero.net

COPYRIGHT AND LICENSE

This software is Copyright (c) 2019 by Brad Lhotsky.

This is free software, licensed under:

The (three-clause) BSD License
You can’t perform that action at this time.