# Querying Data in DQ0 Demo

In order to query data you will need:
* Installed DQ0 SDK. Install with `pip install dq0-sdk`
* Installed DQ0 CLI.
* Proxy running and registered from the DQ0 CLI with `dq0 proxy add ...`
* Valid session of DQ0. Log in with `dq0 user login`
* Running instance of DQ0 CLI server: `dq0 server start`

* DQ0 Project with Data attached to it - ideally using a database backend like PostgreSQL.
Keep in mind that a query is always executed within the context of a project.

Start by importing the core classes

In [None]:
%cd ../

In [None]:
# import dq0-sdk api
from dq0.sdk.cli import Project, Data, Query

## 1. Create or load a project
Let's create a new project for our query use case.

In [None]:
# create a project with name 'query_1'. Automatically creates the 'query_1' directory and changes to this directory.
project = Project(name='query_1', project_type='query')

Alternatively, you can load an existing project by first cd'ing into this directory and then call Project.load()
This will read in the .meta file of this directory.

In [None]:
# %cd ../dq0-cli/MyNewProject

In [None]:
# Alternative: load a project from the current model directory
# project = Project.load()

Check if the project was successfully set up by checking it's UUID. If this field is empty, something went wrong.

In [None]:
project.project_uuid

## 2. Check the data source

All datasets should already be attached. You will need a valid dataset for this demo, ideally using a database backend.

In [None]:
# first get some info about available data sources
sources = Data.get_available_data_sources()

# get info about the first source
info = Data.get_data_info(sources[0])
info

In [None]:
# print information about column types and values, description. This may be helpful for creating your queries.
info['data_name']

In [None]:
info['data_type']

In [None]:
info['data_description']

In [None]:
# set data
data = sources[0]

# alternatively, if you already know the name of the dataset:
# data = Data('name_of_dataset')

## 3. Create Query

Once we have a project with data attached to it we can create our query. Think of this object like a query manager that can create multiple query runs.

In [None]:
query = Query(project)

Now we can use this Query instance to start the actual query runs. But first we must specify which datasets we want to query:

In [None]:
query.for_data(data)

Prepare your query statement.

In [None]:
stmt = """SELECT SUM(active_complaint), 
    COUNT(*) as tx_count, c.loyalty_tiers 
    FROM LR.cpg_segments as c 
    WHERE c.loyalty_tiers = 'silver' 
    AND c.active_complaint > 0 
    GROUP BY loyalty_tiers 
    ORDER BY tx_count 
    DESC LIMIT 600"""

## Execute query
We can now pass this statement to execute() method, which returns a new QueryRunner instance. We will use this to check our queries progress/state and results. Keep in mind that queries are executed asynchronously. 

In [None]:
args = {
    'entry-point': 'execute',
    'epsilon': '100',
    'private-column': 'idl',
    'tau': '0'
}
run = query.execute(stmt, args)

In [None]:
# check status
run.get_state()

# Or wait for the query to finish - careful, this may take a while!
run.wait_for_completion(verbose=True)

# Once its finished, we can get the results
result = run.get_results()
print(result)

In [None]:
run.get_error()

#### Displaying Errors

Of course, not all of your queries will finish without errors. If get_state returns an 'error', call the .get_error() method to show some more details:

In [None]:
run2 = query.execute('foo', args)
run2.wait_for_completion()
run2.get_error()

#### Setting query parameters
The above run uses the default parameters defined by the execute() method for this query. These are:

    * epsilon: float; Epsilon value for differential private query. Default: 1.0
            
    * tau: float; Tau threshold value for private query. Default: 0.0
            
    * private_column: string; Private column for this query. Leave empty or omit for default value from metadata.

Naturally we can adjust these:

In [None]:
run3 = query.execute(stmt, epsilon=1.5, tau=100, private_column='idl')
run3.wait_for_completion(verbose=True)
run3.get_results()
# the results are now also stored in run2.state.results

#### Visualizing results
The get_results() method returns the raw result payload as a string. Usually, this payload comes in CSV format. Here we pandas to display this data.

In [None]:
import pandas as pd
from io import StringIO

result_str = run3.state.results

df = pd.read_csv(StringIO(result_str))
df