Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In accurate results from elasticsearch DSL query utilizing nested fields. #5041

Closed
esatterwhite opened this issue May 28, 2024 · 0 comments · Fixed by #5093
Closed

In accurate results from elasticsearch DSL query utilizing nested fields. #5041

esatterwhite opened this issue May 28, 2024 · 0 comments · Fixed by #5093
Assignees
Labels
bug Something isn't working project:airmail

Comments

@esatterwhite
Copy link
Collaborator

esatterwhite commented May 28, 2024

Describe the bug
In converting from elasticsearch to quickwit, we've encountered a query that returns a significant number of results for documents, that would appear to not match the query, where as elasticsearch returns zero (0) documents for the same query on an equivalent data set

In many cases, the documents returned do not even include the field(s) specified by the query, let alone match the search query text.

Query Executed

{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "fields": [
              "o_err.message"
            ],
            "query": "Cannot\\ read\\ properties\\ of\\ null\\ \\(reading\\ exclusions\\)*",
            "lenient": true
          }
        }
      ]
    }
  }
}

Expected behavior
Quickwit should only return documents that accurately match the provided query

Configuration:

{
  version: '0.8'
, indexing_settings: {
    commit_timeout_secs: 30
  , merge_policy: {
      type: 'stable_log'
    }
  }
, retention: {
    period: '30 days'
  , schedule: 'daily'
  }
, search_settings: {
    default_search_fields: [
      '_line'
    ]
  }
, doc_mapping: {
    mode: 'dynamic'
  , timestamp_field: '_ts'
  , dynamic_mapping: {
      tokenizer: 'raw'
    , fast: true
    }
  , field_mappings: [
      {
        name: '_account'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: true
      }
    , {
        name: '_app'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_bid'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_cluster'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: true
      }
    , {
        name: '_env'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_file'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_hash'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_host'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_ingester'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: true
      }
    , {
        name: '_ip'
      , type: 'ip'
      , fast: true
      }
    , {
        name: '_ipremote'
      , type: 'ip'
      , fast: true
      }
    , {
        name: '_label'
      , type: 'json'
      , tokenizer: 'raw'
      , fast: true
      }
    , {
        name: '_id'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: true
      }
    , {
        name: '_lid'
      , type: 'u64'
      , coerce: true
      , output_format: 'string'
      , fast: true
      }
    , {
        name: '_line'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'source_code_default'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_logtype'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_mac'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_meta'
      , type: 'json'
      , tokenizer: 'raw'
      , fast: true
      }
    , {
        name: 'line_size'
      , type: 'u64'
      , coerce: true
      , output_format: 'string'
      , fast: true
      }
    , {
        name: '_multiline'
      , type: 'bool'
      , fast: true
      }
    , {
        name: '_parsefail'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_parserids'
      , type: 'array<text>'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_rawline'
      , type: 'array<text>'
      , indexed: false
      }
    , {
        name: '_tag'
      , type: 'array<text>'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: '_ts'
      , type: 'datetime'
      , indexed: true
      , precision: 'milliseconds'
      , fast: true
      , input_formats: [
          'unix_timestamp'
        ]
      , output_format: 'unix_timestamp_millis'
      }
    , {
        name: 'action'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'agent'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'at'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'auth'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'bytes'
      , type: 'u64'
      , coerce: true
      , output_format: 'string'
      , fast: true
      }
    , {
        name: 'clientip'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'connect'
      , type: 'u64'
      , coerce: true
      , output_format: 'string'
      , fast: true
      }
    , {
        name: 'containerid'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'dyno'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'facility'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'fwd'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'hostname'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'httpversion'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'logsource'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'message'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'meta'
      , type: 'json'
      , tokenizer: 'raw'
      , fast: true
      }
    , {
        name: 'method'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'namespace'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'node'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'path'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'pid'
      , type: 'u64'
      , coerce: true
      , output_format: 'string'
      }
    , {
        name: 'pod'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'program'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'referrer'
      , type: 'text'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'request'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'response'
      , type: 'array<u64>'
      , coerce: true
      , output_format: 'string'
      , fast: true
      }
    , {
        name: 'space'
      , type: 'text'
      , record: 'position'
      , tokenizer: 'raw'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'status'
      , type: 'u64'
      , coerce: true
      , output_format: 'string'
      , fast: true
      }
    , {
        name: 'timestamp'
      , type: 'datetime'
      , indexed: true
      , precision: 'milliseconds'
      , fast: true
      , input_formats: [
          'unix_timestamp'
        , 'iso8601'
        , 'rfc3339'
        , 'rfc2822'
        , '%Y/%m/%d %H:%M:%S'
        , '%Y-%m-%d %H:%M' // 2024-04-02 14:28
        , '%Y-%m-%dT%H:%M' // 2024-04-02T14:28
        , '%Y-%m-%d %H:%M:S' // 2024-04-08 19:40:48
        , '%Y-%m-%dT%H:%M:%S.%f' // 2024-04-02T18:28:23.655961
        , '%Y-%m-%d %H:%M:%S.%f' // 2024-04-02 18:28:23.655961
        , '%Y-%m-%d %H:%M:%S:%f' // 2024-04-02 14:29:30:2930
        , '%Y-%m-%d %H:%M:%S %z' // 2024-05-02 09:20:12 -0700
        , '%d/%b/%Y:%H:%M:%S %z' // 02/Apr/2024:14:27:14 +0000
        , '%d/%b/%Y:%H:%M:%S.%f' // 02/Apr/2024:14:31:16.215
        , '%m-%d-%Y %H:%M:%S.%f' // 05-02-2024 16:22:25.065
        , '%b %e %H:%M:%S' // Apr 2 10:26:40
        ]
      , output_format: 'iso8601'
      }
    , {
        name: 'user'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'verb'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    , {
        name: 'k8sclusterkey'
      , type: 'text'
      , tokenizer: 'raw'
      , record: 'position'
      , fast: {
          normalizer: 'lowercase'
        }
      }
    ]
  }
}
  1. Output of quickwit --version
  2. The index_config.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working project:airmail
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants