Permalink
Fetching contributors…
Cannot retrieve contributors at this time
481 lines (357 sloc) 13.4 KB

Getting started with YAQL

Introduction to YAQL

YAQL (Yet Another Query Language) is an embeddable and extensible query language that allows performing complex queries against arbitrary data structures. Embeddable means that you can easily integrate a YAQL query processor in your code. Queries come from your DSLs (domain specific language), user input, JSON, and so on. YAQL has a vast and comprehensive standard library of functions that can be used to query data of any complexity. Also, YAQL can be extended even further with user-specified functions. YAQL is written in Python and is distributed through PyPI.

YAQL was inspired by Microsoft LINQ for Objects and its first aim is to execute expressions on the data in memory. A YAQL expression has the same role as an SQL query to databases: search and operate the data. In general, any SQL query can be transformed to a YAQL expression, but YAQL can also be used for computational statements. For example, 2 + 3*4 is a valid YAQL expression.

Moreover, in YAQL, the following operations are supported out of the box:

  • Complex data queries
  • Creation and transformation of lists, dicts, and arrays
  • String operations
  • Basic math operations
  • Conditional expression
  • Date and time operations (will be supported in yaql 1.1)

An interesting thing in YAQL is that everything is a function and any function can be customized or overridden. This is true even for built-in functions. YAQL cannot call any function that was not explicitly registered to be accessible by YAQL. The same is true for operators.

YAQL can be used in two different ways: as an independent CLI tool, and as a Python module.

Installation

You can install YAQL in two different ways:

  1. Using PyPi:

    pip install yaql
  2. Using your system package manager (for example Ubuntu):

    sudo apt-get install python-yaql

HowTo: Use YAQL in Python

You can operate with YAQL from Python in three easy steps:

  • Create a YAQL engine
  • Parse a YAQL expression
  • Execute the parsed expression

Note

The engine should be created once for a set of operators and parser rules. It can be reused for all queries.

Here is an example how it can be done with the YAML file which looks like:

customers_city:
  - city: New York
    customer_id: 1
  - city: Saint Louis
    customer_id: 2
  - city: Mountain View
    customer_id: 3
customers:
  - customer_id: 1
    name: John
    orders:
      - order_id: 1
        item: Guitar
        quantity: 1
  - customer_id: 2
    name: Paul
    orders:
      - order_id: 2
        item: Banjo
        quantity: 2
      - order_id: 3
        item: Piano
        quantity: 1
  - customer_id: 3
    name: Diana
    orders:
      - order_id: 4
        item: Drums
        quantity: 1
import yaql
import yaml

data_source = yaml.load(open('shop.yaml', 'r'))

engine = yaql.factory.YaqlFactory().create()

expression = engine(
    '$.customers.orders.selectMany($.where($.order_id = 4))')

order = expression.evaluate(data=data_source)

Content of the order will be the following:

[{u'item': u'Drums', u'order_id': 4, u'quantity': 1}]

YAQL grammar

YAQL has a very simple grammar:

  • Three keywords as in JSON: true, false, null
  • Numbers, such as 12 and 34.5
  • Strings: 'foo' and "bar"
  • Access to the data: $variable, $
  • Binary and unary operators: 2 + 2, -1, 1 != 2, $list[1]

Data access

Although YAQL expressions may be self-sufficient, the most important value of YAQL is its ability to operate on user-passed data. Such data is placed into variables which are accessible in a YAQL expression as $<variable_name>. The variable_name can contain numbers, English alphabetic characters, and underscore symbols. The variable_name can be empty, in this case you will use $. Variables can be set prior to executing a YAQL expression or can be changed during the execution of some functions.

According to the convention in YAQL, function parameters, including input data, are stored in variables like $1, $2, and so on. The $ stands for $1. For most cases, all function parameters are passed in one piece and can be accessed using $, that is why this variable is the most used one in YAQL expressions. Besides, some functions are expected to get a YAQL expression as one of the parameters (for example, a predicate for collection sorting). In this case, passed expression is granted access to the data by $.

Strings

In YAQL, strings can be enclosed in " and '. Both types are absolutely equal and support all standard escape symbols including unicode code-points. In YAQL, both types of quotes are useful when you need to include one type of quotes into the other. In addition, ` is used to create a string where only one escape symbol ` is possible. This is especially suitable for regexp expressions.

If a string does not start with a digit or __ and contains only digits, _, and English letters, it is called identifier and can be used without quotes at all. An identifier can be used as a name for function, parameter or property in $obj.property case.

Functions

A function call has syntax of functionName(functionParameters). Brackets are necessary even if there are no parameters. In YAQL, there are two types of parameters:

  • Positional parameters

    foo(1, 2, someValue)

  • Named parameters

    foo(paramName1 => value1, paramName2 => 123)

Also, a function can be called using both positional and named parameters: foo(1, false, param => null). In this case, named arguments must be written after positional arguments. In name => value, name must be a valid identifier and must match the name of parameter in function definition. Usually, arguments can be passed in both ways, but named-only parameters are supported in YAQL since Python 3 supports them.

Parameters can have default values. Named parameters is a good way to pass only needed parameters and skip arguments which can be use default values, also you can simply skip parameters in function call: foo(1,,3).

In YAQL, there are three types of functions:

  • Regular functions: max(1,2)

  • Method-like functions, which are called by specifying an object for which the

    function is called, followed by a dot and a function call: stringValue.toUpper()

  • Extension methods, which can be called both ways: len(string), string.len()

YAQL standard library contains hundreds of functions which belong to one of these types. Moreover, applications can add new functions and override functions from the standard library.

Operators

YAQL supports the following types of operators out of the box:

  • Arithmetic: +. -, *, /, mod
  • Logical: =, !=, >=, <=, and, or, not
  • Regexp operations: =~, !~
  • Method call, call to the attribute: ., ?.
  • Context pass: ->
  • Indexing: [ ]
  • Membership test operations: in

Data structures

YAQL supports these types out of the box:

  • Scalars

    YAQL supports such types as string, int. boolean. Datetime and timespan will be available after yaql 1.1 release.

  • Lists

    List creation: [1, 2, value, true] Alternative syntax: list(1, 2, value, true) List elemenets can be accesessed by index: $list[0]

  • Dictionaries

    Dict creation: {key1 => value1, true => 1, 0 => false} Alternative syntax: dict(key1 => value1, true => 1, 0 => false) Dictionaries can be indexed by keys: $dict[key]. Exception will be raised if the key is missing in the dictionary. Also, you can specify value which will be returned if the key is not in the dictionary: dict.get(key, default).

    Note

    During iteration through the dictionary, key can be called like: $.key

  • (Optional) Sets

    Set creation: set(1, 2, value, true)

Note

YAQL is designed to keep input data unchanged. All the functions that look as if they change data, actually return an updated copy and keep the original data unchanged. This is one reason why YAQL is thread-safe.

Basic YAQL query operations

It is obvious that we can compare YAQL with SQL as they both are designed to solve similar tasks. Here we will take a look at the YAQL functions which have a direct equivalent with SQL.

We will use YAML from HowTo: use YAQL in Python as a data source in our examples.

Filtering

Note

Analog is SQL WHERE

The most common query to the data sets is filtering. This is a type of query which will return only elements for which the filtering query is true. In YAQL, we use where to apply filtering queries.

yaql> $.customers.where($.name = John)
- customer_id: 1
  name: John
  orders:
    - order_id: 1
      item: Guitar
      quantity: 1

Ordering

Note

Analog is SQL ORDER BY

It may be required to sort the data returned by some YAQL query. The orderBy clause will cause the elements in the returned sequence to be sorted according to the default comparer for the type being sorted. For example, the following query can be extended to sort the results based on the profession property.

yaql> $.customers.orderBy($.name)
- customer_id: 3
  name: Diana
  orders:
    - order_id: 4
      item: Drums
      quantity: 1
- customer_id: 1
  name: John
  orders:
    - order_id: 1
      item: Guitar
      quantity: 1
- customer_id: 2
  name: Paul
  orders:
    - order_id: 2
      item: Banjo
      quantity: 2
    - order_id: 3
      item: Piano
      quantity: 1

Grouping

Note

Analog is SQL GROUP BY

The groupBy clause allows you to group the results according to the key you specified. Thus, it is possible to group example json by gender.

yaql> $.customers.groupBy($.name)
- Diana:
  - customer_id: 3
    name: Diana
    orders:
      - order_id: 4
        item: Drums
        quantity: 1
- Paul:
  - customer_id: 2
    name: Paul
    orders:
      - order_id: 2
        item: Banjo
        quantity: 2
      - order_id: 3
        item: Piano
        quantity: 1
- John:
  - customer_id: 1
    name: John
    orders:
      - order_id: 1
        item: Guitar
        quantity: 1

So, here you can see the difference between groupBy and orderBy. We use the same parameter name for both operations, but in the output for groupBy name is located in additional place before everything else.

Selecting

Note

Analog is SQL SELECT

The select method allows building new objects out of objects of some collection. In the following example, the result will contain a list of name/orders pairs.

yaql> $.customers.select([$.name, $.orders])
- John:
  - order_id: 1
    item: Guitar
    quantity: 1
- Paul:
  - order_id: 2
    item: Banjo
    quantity: 2
  - order_id: 3
    item: Piano
    quantity: 1
- Diana:
  - order_id: 4
    item: Drums
    quantity: 1

Joining

Note

Analog is SQL JOIN

The join method creates a new collection by joining two other collections by some condition.

yaql> $.customers.join($.customers_city, $1.customer_id = $2.customer_id, {customer=>$1.name, city=>$2.city, orders=>$1.orders})
- customer: John
  city: New York
  orders:
    - order_id: 1
      item: Guitar
      quantity: 1
- customer: Paul
  city: Saint Louis
  orders:
    - order_id: 2
      item: Banjo
      quantity: 2
    - order_id: 3
      item: Piano
      quantity: 1
- customer: Diana
  city: Mountain View
  orders:
    - order_id: 4
      item: Drums
      quantity: 1

Take an element from collection

YAQL supports two general methods that can help you to take elements from collection skip and take.

yaql> $.customers.skip(1).take(2)
- customer_id: 2
  name: Paul
  orders:
    - order_id: 2
      item: Banjo
      quantity: 2
    - order_id: 3
      item: Piano
      quantity: 1
- customer_id: 3
  name: Diana
  orders:
    - order_id: 4
      item: Drums
      quantity: 1

First element of collection

The first method will return the first element of a collection.

yaql> $.customers.first()
- customer_id: 1
  name: John
  orders:
    - order_id: 1
      item: Guitar
      quantity: 1