Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Query Visibility #11008

Open
deshsidd opened this issue Oct 30, 2023 · 3 comments
Open

[RFC] Query Visibility #11008

deshsidd opened this issue Oct 30, 2023 · 3 comments
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Search:Query Insights Search Search query, autocomplete ...etc

Comments

@deshsidd
Copy link
Contributor

Introduction

In the dynamic and ever-evolving realm of search and data retrieval, a deep understanding of search query patterns and system behavior during query executions is imperative. This knowledge serves as the foundation for enhancing query processing, optimizing the user experience, and bolstering the overall query performance.

We propose the implementation of "Query Visibility", an initiative designed to provide a comprehensive visibility of user interactions with the OpenSearch search platform. This Request for Comment (RFC) outlines the problem we are trying to solve, along with the milestones of the major features which we envision to deliver.

Problem Statement

Presently, OpenSearch is often confronted with a notable lack of visibility into the performance of our search queries. The absence of detailed insights makes it challenging to identify the specific areas within the query execution process where delays and bottlenecks occur. When we encounter lengthy query execution times, it becomes hard to pinpoint for users the root causes of these delays.

Proposal

In response to this challenge, we are proposing to introduce comprehensive metrics and tracing capabilities for the search queries. These measures will provide us with enhanced visibility into the execution of search queries, shedding light on various aspects of query performance. This newfound visibility will enable us to analyze query patterns, traffic volumes, popular query attributes, query structures, execution phases, and latency, among other critical metrics.

Through the introduction of Query Visibility, we aim to elevate our ability to gather, analyze, and derive actionable insights from user interactions with the search platform. This heightened visibility is set to drive improvements in query performance, search functionality, and system robustness, ultimately delivering an enhanced experience for OpenSearch users.

Roadmap/Features

This roadmap outlines the work-stream plan and its possible future looking target release versions. We will dive into the specific areas with detailed RFCs covering each feature with in-depth proposal, will be linking here as a followup to this RFC.

1. Capturing Query Patterns

Target OS Release : 2.12/2.13
RFC Link :
Feature Highlights :

  • Extraction and categorization of query patterns from within the search workload on cluster.
  • Categorizing search queries by type.
  • Capturing the hierarchical structure of queries, including nested subqueries.
  • Extraction of field-related information, such as the number and types of fields.
  • Identifying the types of aggregations.
  • Capturing the number and types of fields as part of the response.
  • Utilizing the metrics framework to collect the above information.

2. Top N Queries

Release : 2.12/2.13

  • Aims to provide ability to identify the top-N queries based on latency.
  • Capturing query execution traces facilitated by the Request Tracing Framework.
  • We aim to extend the tracing framework to capture resource utilization and add dimension for top N such as memory and average CPU consumption. In future we will also include caches, and disk usage.
  • Instrumentation in query execution phases (such as query & fetch) along with search operations (such as aggregations/filtering) to figure out the resources used by a query in various spans of its execution.

3. Query Visibility Plugin (APIs and Dashboard)

Release : 2.12/2.13

  • Proposal to build a Plugin & API and dashboard to surface the top N queries, query latency in each query phase, the various query patterns, query shapes and other resource information related to the query execution spans
  • Create a dedicated, user-friendly query analytics dashboard within OpenSearch that provides real-time insights into query patterns, query performance, and resource consumption.
  • The real-time query monitoring dashboard will provide a live feed of incoming queries, their execution times, and resource consumption. This enables users to spot performance issues immediately.

Conclusion

In summary, "Query Visibility" is an initiative to enhance our understanding of query patterns and system behavior during query executions. By capturing query patterns, metrics, and traces, we aim to optimize query processing, improve user experience, and bolster system performance. This initiative promises to provide valuable insights for targeted enhancements, ultimately benefiting our users and the efficiency of our system.

@deshsidd deshsidd added enhancement Enhancement or improvement to existing feature or request untriaged labels Oct 30, 2023
@deshsidd
Copy link
Contributor Author

cc @getsaurabh02

@deshsidd deshsidd added the RFC Issues requesting major changes label Oct 30, 2023
@getsaurabh02
Copy link
Member

Thanks @deshsidd for putting out the proposal. "Query Visibility" indeed is of a very high value to OpenSearch users, especially those running larger workloads and lack ability to track down issues originating due to performance and resource contention.

I believe the primary goal here should be to empower cluster administrators, OS users and community developers by offering them enhanced visibility and extended insights into search request interactions with the query engine of OpenSearch. Analyzing query traffic patterns and other essential performance (execution) metrics will empower system users to make informed decisions that optimize search results, mitigate system performance issues, and help drive community development for overall search experiences.

To attain this objective, I envision the convergence of three layers in alignment with the RFC plan:

1. Request and Resource Tracking Framework : The foundation layer which allows efficient distributed request tracing in OS, while providing abstraction to make tracing information available supporting generic framework like OTel. Additionally, there is a need for comprehensive metric support to effectively monitor the cluster, allowing developers to instrument OpenSearch code paths with minimum overhead

2. Instrumenting the Query Execution Engine: Since all search requests are executed in distributed manner, across multiple phases (such as query/fetch), and support multiple operations (such as Aggregations), there is a need for deeper instrumentation to surface the macro and micro viability information. While Macro visibility could provide high level details of query patterns, query constructs used (such as bool), time taken and and resource utilized (such as Memory) from coordinator and data node perspective. The Micro Visibility will focus on granular insights of query execution; such as breakup of time/resource utilization across operations span (such as top/leaf collector), field options, data structured touched (stored fields vs doc values) and segments visited.

3. One Stop point in time view with Query Visibility Plugin : While instrumentation in point 2 above will surface the Macro and Micro visibility information to users in form of traces and logs which can be sourced to another system for generating aggregation and insights, we would like discuss the possibility of providing an end to end solution to OS users with a Query Visibility Plugin. This will allow users to view point in time or latest (lets say last one hour) of information for real-time query monitoring of incoming queries, their execution times, and resource consumption. This enables users to spot performance issues immediately. Proposal to build a Plugin & API along with dashboard needs to be separately flushed out, quantifying its need/wins for OS users, however the high level idea is to be able to consume the metrics/traces info within the cluster, to surface the top N queries, query latency in each query phase, the various query patterns, query shapes in real time based on need.

cc : @backslasht @msfroh @nknize @dblock @rishabhmaurya @arjunkumargiri @Pallavi-AWS for feedback.

@ansjcy
Copy link
Member

ansjcy commented Dec 1, 2023

I think there are 3 important things we need to think about more as part of the query insight vision.

The first aspect is about the overall framework. We want to design and build a robust framework that efficiently handles data collection, storage, processing, and export. We need to build this framework in a resource efficient manner so that the impact on the search performance would be minimal. Also, we need to make the framework as extensible as possible so that new metrics, and the analysis and insights associated can be added easily. I briefly discussed with the community about the framework design in the top N queries RFC (#11186), and for better visibility, I created a separated issue to track the discussion around the generic query insight framework (#11429).

The second aspect is about identifying and adding metrics and features to query insight. Specifically, what metrics should we instrument to provide better “insights” to the queries? Those metrics could be about cache utilization, query rewrite, and any other stage of the query execution. In my opinion, we need to formulate a strategy for identifying those crucial metrics and come up with plans for their collection, processing, and export (and with the generic framework we created, adding those workflows should be easy), for example, we can do profiling on certain concerning queries to know what stage/workflow we should add datapoints on. I created #11431 to further discuss about this topic.

With those metrics and the framework to aggregate and store those point in time data, we can think about what visualization we can add in the query insight dashboard. IMO the dashboard should have multiple views in terms of different data categorization, and for each type of data, it should provide an overview and drill down view for the search queries. Overview shows the trend (top queries with latency, query categorization) with time, and the drill down view can provide a flow graph to show detailed data (like latency and resource distribution) across different stages of the query, using the data we added and identified in the second aspect. Issue opensearch-project/OpenSearch-Dashboards#5571 was created to further discuss on query insight dashboards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request RFC Issues requesting major changes Search:Query Insights Search Search query, autocomplete ...etc
Projects
Status: In Progress
Status: 🆕 New
Development

No branches or pull requests

6 participants