Home

Neil O'Toole edited this page Nov 27, 2017 · 34 revisions

sq - simple queryer for structured data - provides uniform access to data in many common formats. This includes traditional SQL-style databases, and document formats such as Excel or CSV.

Download

Where's the source code? sq is still in private alpha testing, it will be open-sourced at some point. In the meantime, you can still open issues here.


A typical session might look like:

sq session

sq is developing rapidly, the commands in the above image may be out of date. Use sq help for the latest usage details.

In the simplest narrative, you use sq to query a datasource and output JSON.

> sq '.user | .uid, .username, .email'
[
  {
    "uid": 1,
    "username": "neilotoole",
    "email": "neilotoole@apache.org"
  },
  {
    "uid": 2,
    "username": "ksoze",
    "email": "kaiser@soze.org"
  },
  {
    "uid": 3,
    "username": "kubla",
    "email": "kubla@khan.mn"
  }
]

sq defines its own query language (formally known as SLQ), which takes much of its inspiration from the excellent jq utility. However, for SQL-based datasources, you can also revert to DB-native SQL if you prefer. There are many output formats available in addition to JSON.

> sq --native --table --header 'SELECT uid, username, email FROM user'
uid  username    email
1    neilotoole  neilotoole@apache.org
2    ksoze       kaiser@soze.org
3    kubla       kubla@khan.mn

sq aims for terseness; in typical usage, that command would use shorthand flags:

> sq -nth 'select uid, username, email from user'
uid  username    email
1    neilotoole  neilotoole@apache.org
2    ksoze       kaiser@soze.org
3    kubla       kubla@khan.mn

Use sq inspect to get schema information.

 > sq inspect -th @pq1
HANDLE   NAME   FQ NAME       SIZE   TABLES  LOCATION
@pq1     pqdb1  pqdb1.public  7.1MB  4       postgres://sq:sq@localhost/pqdb1?sslmode=disable

TABLE       ROWS  SIZE     NUM COLS  COL NAMES                                                                     COL TYPES
tblall      2     104.0KB  7         col_id, col_int, col_int_n, col_varchar, col_varchar_n, col_blob, col_blob_n  integer, integer, integer, varchar(255), varchar(255), bytea, bytea
tbladdress  2     48.0KB   7         address_id, street, city, state, zip, country, uid                            integer, varchar(255), varchar(255), varchar(255), varchar(255), varchar(255), integer
tbluser     6     64.0KB   3         uid, username, email                                                          integer, varchar(255), varchar(255)
tblorder    0     16.0KB   6         order_id, uid, item_id, address_id, quantity, description                     integer, integer, integer, integer, integer, varchar(255)

Omit the -th (i.e. default JSON output) to get fuller schema information.

Use sq --help to see the available commands.

Basic usage

# Add a datasource... yeah, the mysql driver URL could be prettier
> sq add 'mysql://root:root@tcp(localhost:33306)/sq_my1' @my1

# Set the active datasource
> sq use @my1

# List datasources
> sq ls
HANDLE     DRIVER    LOCATION
@pq1       postgres  postgres://pqgotest:password@localhost/sq_pq1
@my1       mysql     mysql://root:root@tcp(localhost:3306)/sq_my1


# Execute a query
> sq '.user'                  # get all rows and cols from table "user"
> sq '.user | .uid, .email'   # get cols "uid" and "email" from table "user"

Use sq ping to check on the health of your datasources.

> sq ping         # ping active datasource
> sq ping @my1    # ping @my1 datsource
> sq ping --all   # ping all datasources

Output of sq ping --all:

sq ping

SLQ syntax

SLQ is the formal name of SQ's query language, heavily inspired by the excellent jq utility. Some basic usage examples below.

Join

Currently (sq 0.55.2) only one join per query.

> sq '.user, .address | join(.uid) | .username, .city, .zip'
> sq '.user, .order | join(.uid == .order_uid) | .email, .order_id'

Cross data source join

You can join across data sources. Note that the current implementation is very naive (slow and resource-intensive), basically it results in full-table copy of each joined table. This can and will be improved.

Given some datasources like this:

> sq ls
HANDLE            DRIVER    LOCATION
@my1              mysql     mysql://root:root@tcp(localhost:3306)/sq_my1
@pq1              postgres  postgres://sq:sq@localhost/sq_pq1?sslmode=disable
@excel1           xlsx      /Users/neilotoole/testdata/test.xlsx

You can do joins like this:

> sq '@my1.user, @pq1.tbladdress | join(.uid) | .username, .email, .city'
[
  {
    "username": "ksoze",
    "email": "kaiser@soze.org",
    "city": "Washington"
  },
  {
    "username": "kubla",
    "email": "kubla@khan.mn",
    "city": "Ulan Bator"
  }
]

Or join an Excel spreadsheet with a database:

> sq '@excel1.user_sheet, @pq1.tbladdress | join(.A == .uid)'
[
  {
    "A": 2,
    "B": "ksoze",
    "C": "kaiser@soze.org",
    "address_id": 1,
    "street": "1600 Penn",
    "city": "Washington",
    "state": "DC",
    "zip": "12345",
    "country": "US",
    "uid": 2
  },
  {
    "A": 3,
    "B": "kubla",
    "C": "kubla@khan.mn",
    "address_id": 2,
    "street": "999 Coleridge St",
    "city": "Ulan Bator",
    "state": "UB",
    "zip": "888",
    "country": "MN",
    "uid": 3
  }
]

conditional select (WHERE clause)

> sq '@my1.tbluser | .uid > 2 && .username == "tesla" || .username == "julius"'

range / row select

Select specific rows from the result set (similar to SQL's LIMIT X OFFSET Y).

> sq '.user | .[0]'      # get first row
> sq '.user | .[3:7]     # get rows 3 thru 7
> sq '.user | .[5:]      # get all rows from 5 onwards