# Using the Druid SQL API
<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

This tutorial works through several examples of using [request parameters](https://druid.apache.org/docs/latest/querying/sql-api/) on the Druid SQL API, including:

* How to change how the format of the results returned.
* How to supply parameters to the SQL query.
* How to affect query execution.

## Prerequisites

This tutorial works with Druid 27.0.0 or later.

#### Run with Docker

Launch this tutorial and all prerequisites using the `druid-jupyter` profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see the Learn Druid repository [readme](https://github.com/implydata/learn-druid).

## Initialization

The following cells set up the notebook and learning environment ready for use.

### Set up and connect to the learning environment

Run the next cell to set up the Druid Python client's connection to Apache Druid.

If successful, the Druid version number will be shown in the output.

In [None]:
import druidapi
import os

if 'DRUID_HOST' not in os.environ.keys():
    druid_host=f"http://localhost:8888"
else:
    druid_host=f"http://{os.environ['DRUID_HOST']}:8888"
    
print(f"Opening a connection to {druid_host}.")
druid = druidapi.jupyter_client(druid_host)

display = druid.display
sql_client = druid.sql
status_client = druid.status

status_client.version

### Load example data

Run the following cell to create a table called `example-wikipedia-queryapi`. The statement only ingests certain data from the source that will be used in this notebook.

When completed, you'll see a description of the final table.

In [None]:
sql='''
REPLACE INTO "example-wikipedia-queryapi" OVERWRITE ALL
WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"http","uris":["https://druid.apache.org/data/wikipedia.json.gz"]}',
    '{"type":"json"}'
  )
) EXTEND ("isRobot" VARCHAR, "channel" VARCHAR, "timestamp" VARCHAR, "flags" VARCHAR, "isUnpatrolled" VARCHAR, "page" VARCHAR, "diffUrl" VARCHAR, "added" BIGINT, "comment" VARCHAR, "commentLength" BIGINT, "isNew" VARCHAR, "isMinor" VARCHAR, "delta" BIGINT, "isAnonymous" VARCHAR, "user" VARCHAR, "deltaBucket" BIGINT, "deleted" BIGINT, "namespace" VARCHAR, "cityName" VARCHAR, "countryName" VARCHAR, "regionIsoCode" VARCHAR, "metroCode" BIGINT, "countryIsoCode" VARCHAR, "regionName" VARCHAR))
SELECT
  TIME_PARSE("timestamp") AS "__time",
  "channel"
FROM "ext"
PARTITIONED BY DAY
'''

display.run_task(sql)
sql_client.wait_until_ready('example-wikipedia-queryapi')
display.table('example-wikipedia-queryapi')

<!-- Include these cells if you need additional Python modules -->

### Import additional modules

Run the following cell to import additional Python modules that you will use to X, Y, Z.

In [None]:
import json
import requests

Also run the following, which set up a Python function you will use to send raw requests to the Druid API:

In [None]:
def postRequest(definition):
    x = requests.post(druid_host + '/druid/v2/sql', json=definition)

    if "error" in x.text:
        raise Exception('Not able to complete the request. \n\n'+x.text)
    else:
        return(x)

## Submit an SQL query

The `query` parameter accepts a SQL query as a string.

Run the following cell to set a variable, `sql`, that contains a simple query.

In [None]:
sql='''
SELECT
  COUNT(*) AS "events"
FROM "example-wikipedia-queryapi"
WHERE TIME_IN_INTERVAL("__time",'2016-06-27T04/PT1H')
'''

Set up a basic JSON object, `query_definition`, that contains the string above by running the following cell.

In [None]:
query_definition = {
    "query": sql
}

Now call the `postRequest` function with the JSON object as a parameter.

The text body of the response from Druid is then printed.

In [None]:
print(postRequest(query_definition).text)

## Set the format of the query results

Use the `resultFormat` property to control the [format](https://druid.apache.org/docs/latest/querying/sql-api/#result-formats) of the results.

First, run the cell below to set a more complex query in the `sql` variable.

In [None]:
sql='''
SELECT
  TIME_FLOOR("__time",'PT10M'),
  COUNT(*)
FROM "example-wikipedia-queryapi"
WHERE TIME_IN_INTERVAL("__time",'2016-06-27T04/PT1H')
GROUP BY 1
'''

The default `resultFormat` is `object`, returning a JSON object.

In the following cell, the result of `postRequest` is parsed as JSON so that it can be printed in a pretty format.

In [None]:
sql='''
SELECT
  TIME_FLOOR("__time",'PT10M') AS "period",
  COUNT(*) as "events"
FROM "example-wikipedia-queryapi"
WHERE TIME_IN_INTERVAL("__time",'2016-06-27T04/PT1H')
GROUP BY 1
'''

query_definition = {
    "query": sql
}

print(json.dumps(json.loads(postRequest(query_definition).text), indent=2))

In the following cell, the `resultFormat` is explicitly set as CSV.

Run this to see the result.

In [None]:
query_definition = {
    "query": sql,
    "resultFormat": "csv"
}

print(postRequest(query_definition).text)

## Adding a header to results

Take a look at the following cell, where the definition has been updated to include a `header` property.

Run it to see the result from the query API.

In [None]:
query_definition = {
    "query": sql,
    "resultFormat": "csv",
    "header": "true"
}

print(postRequest(query_definition).text)

Run the next cell to send another request to the API.

Notice that:

* `resultFormat` has been removed so that the result is in JSON format.
* The `typesHeader` and `sqlTypesHeader` have been set explicitly to `true`.

In [None]:
query_definition = {
    "query": sql,
    "typesHeader": "true",
    "sqlTypesHeader": "true",
    "header": "true"
}

print(json.dumps(json.loads(postRequest(query_definition).text), indent=2))

## Sending query parameters

The `parameters` property allows for parameters to be passed through to the SQL.

Running the next cell will update `sql` to a query that contains three parameters, "`?`", to filter the COUNT of events on `channel` in three ways.

In [None]:
sql='''
SELECT
  TIME_FLOOR("__time",'PT10M') AS "period",
  COUNT(*) FILTER (WHERE "channel" LIKE CAST(? AS VARCHAR)) AS "events-1",
  COUNT(*) FILTER (WHERE "channel" LIKE CAST(? AS VARCHAR)) AS "events-2",
  COUNT(*) FILTER (WHERE "channel" LIKE CAST(? AS VARCHAR)) AS "events-3"
FROM "example-wikipedia-queryapi"
WHERE TIME_IN_INTERVAL("__time",'2016-06-27T04/PT1H')
GROUP BY 1
'''

Now run the cell below to update `query_definition` and execute the query.

Notice that the type of each of the `parameters` have a type that can be successfully CAST.

In [None]:
query_definition = {
    "query": sql,
    "resultFormat": "csv",
    "header": "true",
    "parameters": [
        { "type" : "VARCHAR", "value": "#en%" },
        { "type" : "VARCHAR", "value": "#fr%" },
        { "type" : "VARCHAR", "value": "%" }
    ]
}

print(postRequest(query_definition).text)

## Sending query context

Using SQL [query context parameters](https://druid.apache.org/docs/latest/querying/sql-query-context), various aspects of query execution can be controlled, including the use of approximation for COUNT DISTINCT and TopN-type queries. Open the notebooks on [COUNT DISTINCT](./03-approxCountDistinct.ipynb) and [TopN](./02-approx-ranking.ipynb)-type queries for examples.

Run the following cell to store the result of a query in `query_result` where a context parameter, `sqlQueryId`, has been used.

In [None]:
query_definition = {
        "query": sql,
    "resultFormat": "csv",
    "header": "true",
    "parameters": [
        { "type" : "VARCHAR", "value": "#en%" },
        { "type" : "VARCHAR", "value": "#fr%" },
        { "type" : "VARCHAR", "value": "%" }
    ],
    "context":
        { "sqlQueryId" : "dashboard-panel6-userquery" }
}

query_result = postRequest(query_definition)

Run the following cell to see the headers returned in the request, which include the value you set in the `sqlQueryId` in `X-Druid-SQL-Query-Id`.

In [None]:
print(json.dumps(dict(query_result.headers), indent=2))

## Clean up

Run the following cell to remove the table used in this notebook from the database.

In [None]:
druid.datasources.drop("example-wikipedia-queryapi")

## Summary

* The Druid SQL API endpoint takes a JSON object with a SQL query
* Additional properties control result formats, enable parameterization, and changes to how the query is executed

## Learn more

* Try out other [result formats](https://druid.apache.org/docs/latest/querying/sql-api/#result-formats)
* See how context parameters can be used to control approximation in the notebooks on [COUNT DISTINCT](./03-approxCountDistinct.ipynb) and [TopN](./02-approx-ranking.ipynb)