<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://assets.vespa.ai/logos/Vespa-logo-green-RGB.svg">
  <source media="(prefers-color-scheme: light)" srcset="https://assets.vespa.ai/logos/Vespa-logo-dark-RGB.svg">
  <img alt="#Vespa" width="200" src="https://assets.vespa.ai/logos/Vespa-logo-dark-RGB.svg" style="margin-bottom: 25px;">
</picture>

# pyvespa Query builder

The Query Builder in pyvespa provides a more 'pythonic' way of building YQL query strings and this guide goes through constructing queries for the [Covid-19 Vespa instance](https://cord19.vespa.ai/).

While simple queries can be built with string formatting; building complex queries that contain a lot of dynamic parameters can get difficult and this is where the query builder comes in handy.

The query builder is only responsible to build a YQL string which then can be used for querying using various methods (see query notebook).

First lets install pyvespa that supports query builder:

In [None]:
!pip install pyvespa>=0.52.0

In [2]:
from vespa.application import Vespa
from vespa import querybuilder as qb
from vespa.io import VespaQueryResponse

app = Vespa(url="https://api.cord19.vespa.ai")

At an abstract level a query can be divided into 2 parts: 
 - `QueryField`: the fields part of the schema and needs to be queried on. For example in the Covid-19 schema `title`, `abstract`, etc can be used as fields.
 - `Condition`: Logic operators used to combine the queryfields (and, or, etc)


Lets start with something simple: building a query for fetching documents with a specific keyword in the title.


In [3]:
title_field = qb.QueryField("title")
condition = title_field.contains("vaccine")
yql_query = qb.select(["title", "abstract"]).from_("doc").where(condition)
print(f"Built query: {yql_query}")
resp = app.query(yql=yql_query, hits=1)
resp.hits

Built query: select title, abstract from doc where title contains "vaccine"


[{'id': 'index:content/1/70a738d39b864060c2a36d4c',
  'relevance': 0.3941869377562002,
  'source': 'content',
  'fields': {'title': '<hi>Vaccine</hi> hesitancy due to <hi>vaccine</hi> country of origin, <hi>vaccine</hi> technology, and certification',
   'abstract': 'Vaccine hesitancy is a global health threat which may hinder the widespread acceptance of several COVID-19 vaccines. Following the collection of 2470 responses from an anonymous questionnaire distributed between October and November 2020 across Israel, we analyzed the responses of physicians, life science graduates (biology, virology, chemistry, etc.), and the general public to whether they would obtain a COVID-19 vaccine with particular vaccine characteristics such as vaccine country of origin, technology, side effect profile, efficacy, and other attributes. Physicians and life science graduates were least likely to accept a vaccine based on mRNA technology (30%) while the general<sep />'}}]

While the `select()` accepts a list of QueryFields you can also give a string.

In [4]:
yql_query = qb.select("title, abstract").from_("doc").where(condition)
print(f"Built query: {yql_query}")
resp = app.query(yql=yql_query, hits=1)
resp.hits

Built query: select title, abstract from doc where title contains "vaccine"


[{'id': 'index:content/1/70a738d39b864060c2a36d4c',
  'relevance': 0.3941869377562002,
  'source': 'content',
  'fields': {'title': '<hi>Vaccine</hi> hesitancy due to <hi>vaccine</hi> country of origin, <hi>vaccine</hi> technology, and certification',
   'abstract': 'Vaccine hesitancy is a global health threat which may hinder the widespread acceptance of several COVID-19 vaccines. Following the collection of 2470 responses from an anonymous questionnaire distributed between October and November 2020 across Israel, we analyzed the responses of physicians, life science graduates (biology, virology, chemistry, etc.), and the general public to whether they would obtain a COVID-19 vaccine with particular vaccine characteristics such as vaccine country of origin, technology, side effect profile, efficacy, and other attributes. Physicians and life science graduates were least likely to accept a vaccine based on mRNA technology (30%) while the general<sep />'}}]

The same query can be extented by chaining it with the `set_timeout()` to change the default timeout:

In [5]:
yql_query = qb.select("title, abstract").from_("doc").where(condition).set_timeout(70)
print(f"Built query: {yql_query}")
resp = app.query(yql=yql_query, hits=1)
resp.hits

Built query: select title, abstract from doc where title contains "vaccine" timeout 70


[{'id': 'index:content/1/70a738d39b864060c2a36d4c',
  'relevance': 0.3941869377562002,
  'source': 'content',
  'fields': {'title': '<hi>Vaccine</hi> hesitancy due to <hi>vaccine</hi> country of origin, <hi>vaccine</hi> technology, and certification',
   'abstract': 'Vaccine hesitancy is a global health threat which may hinder the widespread acceptance of several COVID-19 vaccines. Following the collection of 2470 responses from an anonymous questionnaire distributed between October and November 2020 across Israel, we analyzed the responses of physicians, life science graduates (biology, virology, chemistry, etc.), and the general public to whether they would obtain a COVID-19 vaccine with particular vaccine characteristics such as vaccine country of origin, technology, side effect profile, efficacy, and other attributes. Physicians and life science graduates were least likely to accept a vaccine based on mRNA technology (30%) while the general<sep />'}}]

Which can then further be chained with the `limit()`; And you dont have to worry about the order of the function chain because the query builder will safely handel the order and generate a valid YQL string.

In [6]:
yql_query = (
    qb.select("title, abstract")
    .from_("doc")
    .where(condition)
    .set_timeout(70)
    .set_limit(2)
)
print(f"Built query: {yql_query}")
resp = app.query(yql=yql_query)
resp.hits

Built query: select title, abstract from doc where title contains "vaccine" limit 2 timeout 70


[{'id': 'index:content/1/70a738d39b864060c2a36d4c',
  'relevance': 0.3941869377562002,
  'source': 'content',
  'fields': {'title': '<hi>Vaccine</hi> hesitancy due to <hi>vaccine</hi> country of origin, <hi>vaccine</hi> technology, and certification',
   'abstract': 'Vaccine hesitancy is a global health threat which may hinder the widespread acceptance of several COVID-19 vaccines. Following the collection of 2470 responses from an anonymous questionnaire distributed between October and November 2020 across Israel, we analyzed the responses of physicians, life science graduates (biology, virology, chemistry, etc.), and the general public to whether they would obtain a COVID-19 vaccine with particular vaccine characteristics such as vaccine country of origin, technology, side effect profile, efficacy, and other attributes. Physicians and life science graduates were least likely to accept a vaccine based on mRNA technology (30%) while the general<sep />'}},
 {'id': 'index:content/1/93277

You can also invert conditions after building them by just adding the `~` symbol in front of the condition.

In [7]:
condition = ~condition
yql_query = qb.select("title, abstract").from_("doc").where(condition)
print(f"Built query: {yql_query}")
resp = app.query(yql=yql_query, hits=1)
resp.hits

Built query: select title, abstract from doc where !(title contains "vaccine")


[{'id': 'index:content/0/002ce120e92209b150769c39',
  'relevance': 0.0,
  'source': 'content',
  'fields': {'title': 'Novel functions of the alphavirus nonstructural protein nsP3 C-terminal region.',
   'abstract': 'The functions of the alphavirus-encoded nonstructural protein nsP3 during infection are poorly understood. In contrast, nsP1, nsP2, and nsP4 have known enzymatic activities and functions. A functional analysis of the C-terminal region of nsP3 of Semliki Forest virus revealed the presence of a degradation signal that overlaps with a sequence element located between nsP3 and nsP4 that is required for proteolytic processing. This element was responsible for the short half-life (1 h) of individually expressed nsP3, and it also was functionally transferable to other proteins. Inducible<sep />'}}]

Now lets say you wanted to query over multiple fields and combine them in a logical way along with mixing, matching logic operators:
 - `all()` for logical AND.
 - `any()` for local OR.

In [8]:
title = qb.QueryField("title")
abstract = qb.QueryField("abstract")
source = qb.QueryField("source")

condition_1 = title.contains("vaccine")
condition_2 = abstract.contains("bad")
condition_3 = source.contains("PMC")
condition_4 = title.contains("anti")
condition = qb.all(condition_1, ~condition_2, qb.any(condition_3, ~condition_4))

yql_query = qb.select(["title", "abstract"]).from_("doc").where(condition)
print(f"Built query: {yql_query}")
resp = app.query(yql=yql_query, hits=1)
resp.hits

Built query: select title, abstract from doc where title contains "vaccine" and !(abstract contains "bad") and (source contains "PMC" or !(title contains "anti"))


[{'id': 'index:content/1/bbe62ff50f44a1295c3328a2',
  'relevance': 0.3959298571461784,
  'source': 'content',
  'fields': {'title': "<hi>Vaccines</hi>: Kids' <hi>vaccine</hi> guards adults too, for now"}}]

The true power comes when you build each condition separately, enhancing long-term maintainability and readability.