# Database Querying with a Uniform Syntax

In my academic days, before Google Scholar came along, I was a big user of [PubMed](https://pubmed.ncbi.nlm.nih.gov/) for searching through the scientific literature. I became very familiar with PubMed's query syntax - things of the sort:

``` parsons sp[au] and (intestine[title] or channel[title]) ```

When I was writing a search engine at my first industry job, I thought this would make a great interface. The UI would only need a single text box for inputting the query. I had seen too many times in UIs the following pattern. Each field is queried with a separate UI element(s), e.g. a text box for each string field, a check box for each boolean field, a text box and drop-down comparator for each numeric field. The fields can only be combined in limited ways (all OR or all AND) or else there has to be a bunch more UI elements for selecting AND, OR, NOT. Boolean expression can certainly not be combined in parentheses.

Implementing a simple PubMed-style query, requires a parser to parse the query. I came across the [Pyparsing package](https://pyparsing-docs.readthedocs.io/en/latest/index.html) which was included in some other package I was using (I can't remember which) and from there the booklet, "Getting started with Pyparsing" by Paul McGuire (2007, O'Reilly). A section of that book ("Parsing a search string") essentially gave me the code I was looking for. Perfect!

In this repository I have written an class (`Query`) that does the parsing outlined in McGuire. This is a base (abtract) class. It can be inherited from to implement the same PubMed-like query interface for different kinds of database. `ObjectListQuery(Query)` is used to query a list of objects of any type (class objects, dictionaries, etc.). Fields are added simply with a one-line specification. `SQLQuery(Query)` is used to generate SQL queries. You can write your own subclass!


#### References

McGuire (2007). Getting started with Pyparsing. O'Reilly.

In [1]:
import datetime
from dataclasses import dataclass
import sys
import os

sys.path.append(os.path.abspath(".."))
from resume.query.objectlistquery import ObjectListQuery

## Querying a list of objects

The `ObjectListQuery` class lets you query any list of objects. The objects can be of any type.

Say we have a list of dictionaries, each dictionary describing a specimen in a zoological museum.

In [2]:
museum_specimens = [
    {
        "name": "zebra", "caught": datetime.datetime(2010, 3, 15),
        "dimensions": {"height": 1.4, "length": 1.8}, 
        "appendages": {"legs": 4}
    },
    {
        "name": "monkey", "caught": datetime.datetime(2002, 4, 11),
        "dimensions": {"height": 1.2}, 
        "appendages": {"legs": 2, "arms": 2}
    },
    {
        "name": "duck", "caught": datetime.datetime(1987, 11, 3),
        "dimensions": {"height": 0.15, "length": 0.24}, 
        "appendages": {"legs": 2, "wings": 2}, 
        "abilities": {"flys": True}
    },
    {
        "name": "whale", "caught": datetime.datetime(1910, 1, 15),
        "dimensions": {"height": 2.1, "length": 5.6}, 
        "appendages": {"legs": 2}
    },
    {
        "name": "millipede", "caught": datetime.datetime(1950, 7, 21),
        "dimensions": {"height": 0.005, "length": 0.04}, 
        "appendages": {"legs": 1000}
    },
]

We initiate an instance of `ObjectListQuery`, specifying the attributes ('fields') we want to search by.

Each field is specified by,

`(<full name>, <abbreviation>, <type>) : <function that returns its value from an object>`

Either the field's full name or abbreviation can be used in the search query, case insensitively. The type is used for type checking and interpretation of search operands. The function is used to retreive values from the objects and to index them.

In [3]:
querier = ObjectListQuery(fields={
    ("name", "nm", str): (lambda x: x["name"]),
    ("caught", "cg", datetime.datetime): (lambda x: x["caught"]),
    ("height", "hg", float): (lambda x: x["dimensions"]["height"]),
    ("legs", "lg", int): (lambda x: x["appendages"]["legs"]),
    ("arms", "ar", int): (lambda x: x["appendages"]["arms"]),
    ("flys", "fy", bool): (lambda x: x["abilities"]["flys"]),
})

Now add the specimens to the querier.

In [4]:
querier.add_objects(museum_specimens)

Now make some queries.

Search terms must be formatted as:

- String fields: can include a wildcard asterisk.

- Boolean fields: true is indicated by 't' or 'T'. false is anything else.

- Numeric (int or float) and datetime fields: must start with either >, < or = comparators.

In [5]:
querier.query("(<2000-01-01[caught] or z*[nm]) AND <10[LEGS]")

[{'name': 'zebra',
  'caught': datetime.datetime(2010, 3, 15, 0, 0),
  'dimensions': {'height': 1.4, 'length': 1.8},
  'appendages': {'legs': 4}},
 {'name': 'whale',
  'caught': datetime.datetime(1910, 1, 15, 0, 0),
  'dimensions': {'height': 2.1, 'length': 5.6},
  'appendages': {'legs': 2}},
 {'name': 'duck',
  'caught': datetime.datetime(1987, 11, 3, 0, 0),
  'dimensions': {'height': 0.15, 'length': 0.24},
  'appendages': {'legs': 2, 'wings': 2},
  'abilities': {'flys': True}}]

In [6]:
querier.query(">1.2[HEIGHT] AND >2[legs]")

[{'name': 'zebra',
  'caught': datetime.datetime(2010, 3, 15, 0, 0),
  'dimensions': {'height': 1.4, 'length': 1.8},
  'appendages': {'legs': 4}}]

In [7]:
querier.query(">1.2[HEIGHT] AND NOT >2[legs]")

[{'name': 'whale',
  'caught': datetime.datetime(1910, 1, 15, 0, 0),
  'dimensions': {'height': 2.1, 'length': 5.6},
  'appendages': {'legs': 2}}]

The objects can be of any type - we only need to specify the appropriate function for retrieving a field's value from the object.

For example,

In [8]:
@dataclass
class Specimen:
    name: str
    caught: datetime.datetime
    height: float
    legs: int
    arms: int

specimens_as_class_objects = [
    Specimen(name="horse", caught=datetime.datetime(1897, 10, 10), height=1.4, legs=4, arms=0),
    Specimen(name="cat", caught=datetime.datetime(1967, 5, 21), height=0.4, legs=4, arms=0),
    Specimen(name="gorilla", caught=datetime.datetime(1978, 2, 10), height=0.8, legs=2, arms=2),
    Specimen(name="spider", caught=datetime.datetime(1942, 10, 1), height=0.03, legs=8, arms=0),
    Specimen(name="lobster", caught=datetime.datetime(2005, 6, 12), height=0.1, legs=6, arms=2),
]

querier_2 = ObjectListQuery(fields={ 
    ("name", "nm", str): (lambda x: x.name),
    ("caught", "ct", datetime.datetime): (lambda x: x.caught),
    ("height", "ht", float): (lambda x: x.height),
    ("legs", "lg", int): (lambda x: x.legs),
    ("arms", "ar", int): (lambda x: x.arms),
})

querier_2.add_objects(specimens_as_class_objects)

querier_2.query(">2[legs] AND *er[name]")

[Specimen(name='lobster', caught=datetime.datetime(2005, 6, 12, 0, 0), height=0.1, legs=6, arms=2),
 Specimen(name='spider', caught=datetime.datetime(1942, 10, 1, 0, 0), height=0.03, legs=8, arms=0)]

## SQL databases