Skip to content

Latest commit

 

History

History
569 lines (451 loc) · 10.7 KB

query.md

File metadata and controls

569 lines (451 loc) · 10.7 KB

Queries in Pathivu can be used to aggregate log data and perform operations on it. Multiple queries can be piped up to perform a complex action.

Queries can be made using to media in Pathivu:

  • Pathivu Web: This is the web user-interface for Pathivu. It provides a simple UI for querying pathivu. Learn more about it here.
  • Katchi CLI: This is a command line interface which allows seamless querying from the comfort of a terminal. Learn more about how to query using katchi here

Pathivu listens for query requests on port 5180 with the use of an HTTP(s) server. This exposes a simple way of sending commands and queries to the Pathivu backend. It currently supports the following queries:



Index



If you would like to see more queries, feel free to create an issue on our source repository, and we will get back to you.

Before getting started with the various query commands supported by Pathivu, let us enumerate what all features we have while calling a query:

  • Ordering: Ascending and descending order of log output can be decided at the query level.
  • Pagination : You can set a maximum number of logs to show in output and an start displaying logs from a particular offset.
  • Timestamping: We can declare a starting and ending timestamp to only output the logs between a given time-frame.

Selection


Consider the following JSON

{
  "data": [
    {
	"ts": 3,
	"level": "warn",
	"details": {
		"message": "APIKEY not provided",
	}
	"from": "app"
    },
    {
	"ts": 2,
	"level": "fatal"
	"details": {
		"message": "Error connecting to database",
		"error_code": "500"
	},
	"error_code": "500",
	"from": "app"
    }
   ]
}

Pathivu supports two types of search queries, namely fuzzy search and structured query search.

Fuzzy Search

The message keyword is used for fuzzy searching in Pathivu. The fuzziness level is configurable. A simple example is given below:

message = "warn"

This query will give you the following output:

{
  "data": [
    {
	"ts": 3,
	"level": "warn",
	"details": {
		"message": "APIKEY not provided",
	}
	"from": "app"
    }
   ]
}

Go to index

Structured Search

Flattened JSON fields can be used for structured query searches and exact matches. In the following example, we are querying the logs which have error code 500 within an embedded struct.

details.error_code = "500"

This query will give you the following output:

{
  "data": [
    {
	"ts": 3,
	"level": "fatal",
	"details": {
		"message": "APIKEY not provided",
		"error_code": "500"
	}
	"from": "app"
    }
   ]
}

Go to index


Count


Count query can be used to get the total number of logs pertaining to a particular key in the log structure. Count can be of two types, namely base count and aggregated count.

Consider the following log JSON:

{
  "data": [
    {
      "ts": 3,
      "entry": {
        "details": {
          "error_code": "500",
          "message": "Error connecting to database"
        },
        "level": "fatal",
        "from": "backend"
      },
      "source": "demo"
    },
    {
      "ts": 2,
      "entry": {
        "details": {
          "error_code": "500",
          "message": "Error connecting to database"
        },
        "level": "fatal",
        "from": "app"
      },
      "source": "demo"
    },
    {
      "ts": 1,
      "entry": {
        "details": {
          "message": "APIKEY not provided"
        },
        "level": "warn",
        "from": "app"
      },
      "source": "demo"
    }
  ]
}
Base Count

Base count is a powerful command that can be used for counting the number of logs that exist for a particular field. For example, the example below gives the count of all logs with from defined.

count(from) as src

Running this command will give you the following output:

{
  "data": [
    {
      "src": "3"
    }
  ]
}
Aggregated Count

Aggregations can be added in the count query for grouping fields accoring to a particular field. This can be achieved using the by keyword. For example, the following query will count all level fields and group them by the from.

count(level) as level_count by from

Result will looks like

{
  "data": [
    {
      "level_count": "2",
      "from": "app"
    },
    {
      "level_count": "1",
      "from": "backend"
    }
  ]
}

Structured JSON matching can also be used for counting. For example, the following command returns the count of all logs that have the error_code field inside details sub-structure, grouped by from.

count(details.error_code) as error_code_count by from

The output looks like this:

{
  "data": [
    {
      "error_code_count": "1",
      "from": "backend"
    },
    {
      "error_code_count": "1",
      "from": "app"
    }
  ]
}

Go to index


Average


The avg keyword can be used to find the average of numerical fields in a structured logging scheme. It supports aggregations as well.

Let us consider the following log JSON:

{
  "data": [
    {
      "ts": 3,
      "entry": {
        "country": "Afghanistan",
        "details": {
          "latency": 9.82
        },
        "level": "info"
      },
      "source": "demo"
    },
    {
      "ts": 2,
      "entry": {
        "country": "Pakistan",
        "details": {
          "latency": 6.45
        },
        "level": "info"
      },
      "source": "demo"
    },
    {
      "ts": 1,
      "entry": {
        "country": "India",
        "details": {
          "latency": 3.26
        },
        "level": "info"
      },
      "source": "demo"
    }
  ]
}
Base Average

The following query will find the average latency of your service.

avg(details.latency) as average_latency

The output looks like this:

{
  "data": [
    {
      "average_latency": "6.51"
    }
  ]
}
Aggregated Average

Average also supports aggregations. For example, the follwoing query will country-wise average latency.

avg(details.latency) as average_latency by country

The output looks like this:

{
  "data": [
    {
      "average_latency": "3.26",
      "country": "India"
    },
    {
      "average_latency": "6.45",
      "country": "Pakistan"
    },
    {
      "average_latency": "9.82",
      "country": "Afghanistan"
    }
  ]
}

Go to index


Distinct


Distinct elements can be found, aggregated and printed using the distinct keyword. The distinct command also provides a feature to count the number of distinct logs matched.

Base Distinct

Following the example from count, the following command will give you a list of all distinct levels in the logs.

distinct(level)

The output will look something like this:

{
  "data": [
    "fatal",
    "warn"
  ]
}
Count Distinct

In order to find distinct value count, you can use distinct_count keyword. The following command will give you a list of all distinct levels in the logs along with their count.

distinct_count(level)

The output will look something like this:

{
  "data": [
    {
      "fatal": 2
    },
    {
      "warn": 1
    }
  ]
}

Structured JSON matching can also be used here. For example, the following command will return a list of all distinct error codes along with their count.

distinct_count(details.error_code)

The output looks like this:

{
  "data": [
    {
      "500": 2
    }
  ]
}

Go to index


Limit


Limit command can be used to limit the number of responses that we get out of Pathivu query results. For example, in the logs provided here, the following query can be used to limit the number of responses:

limit 1

The output will look like this:

{
  "data": [
    {
      "ts": 3,
      "entry": {
        "country": "Afghanistan",
        "details": {
          "latency": 9.82
        },
        "level": "info"
      },
      "source": "demo"
    }
  ]
}

By default, limits are applied from the latest timestamp in Pathivu.

Go to index


Pipe


Pathivu supports piping as well. Here, you can combine two or more queries, one after the other. This gives Pathivu immense querying capabilities.

Below are a couple of examples as to how piping can be used to make powerful and meaningful queries. Note that all of the queries are performed on the following JSON:

{
  "data": [
    {
      "ts": 3,
      "entry": {
        "country": "Afghanistan",
        "details": {
          "latency": 9.82
        },
        "level": "info",
        "transaction": "succeeded"
      },
      "source": "demo"
    },
    {
      "ts": 2,
      "entry": {
        "country": "Pakistan",
        "details": {
          "latency": 6.45
        },
        "level": "info",
        "transaction": "failed"
      },
      "source": "demo"
    },
    {
      "ts": 1,
      "entry": {
        "country": "India",
        "details": {
          "latency": 3.26
        },
        "level": "info",
        "transaction": "succeeded"
      },
      "source": "demo"
    }
  ]
}

  • The following query will give you all of the failed transaction logs grouped by country.
transaction="failed" | distinct_count(country) as failed_transaction_country_wise

So the output will look something like this:

{
  "data": [
    {
      "Pakistan": 1
    }
  ]
}
  • The following command will give you the count of all info-level logs.
level="info" | count(level) as level_count

The output looks like this:

{
  "data": [
    {
      "level_count": "3"
    }
  ]
}

Go to index


Source


User can specify the sources that the user would like to serach on using source keyword. Multiple sources are mentioned using ,.

source=master,slave

This will output all logs with the source as master and slave.

Go to index