# Gaia Data Workshop - Heidelberg, November 21-24, 2016 
## The Gaia Service at AIP
Gal Matijevic // gmatijevic@aip.de
### Hands-on Tutorial

This notebook will cover the access of the AIP's Gaia service through the UWS (Universal Worker Service) interface.

First, let us import the packages we will need in this tutorial:

In [5]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

import uws.UWS.client as client

The connection to the database is established very easly through the `Client` object. We need to supply the url and the user credentials.

In [3]:
url = 'https://gaia.aip.de/uws/query'
username = ''
password = ''

In [6]:
cli = client.Client(url, username, password)

To list all `PENDING` or `COMPLETED` jobs we can use the `get_job_list()` function (it might take a second or two):

In [None]:
filters = {'phases': ['PENDING', 'COMPLETED']}
job_list = cli.get_job_list(filters)
for ref in job_list.job_reference:
    print ref.ownerId, ref.creationTime, ref.phase[0]

Similar job list can also be shown for other `phases` such as `ABORTED`, `QUEUED` and so on. Jobs can also be listed based on the time of their creation time or their consequtive number.

Adding a new job to the stack is done with the `new_job()` function. It requires a query and a queue to be passed to it. We wrap both into a dictionary called `parameters`:

In [None]:
parameters = {'query': 'SELECT ra,`dec` FROM GDR1.tgas_source LIMIT 10', 'queue': 'long'}
job = cli.new_job(parameters)
print job.phase

We can now run it with `run_job()`. To see the phase of the job we use the `get_job()` to query it every 10 seconds and see if the phase has changed from `QUEUED`.

In [None]:
run = cli.run_job(job.job_id)
job = cli.get_job(run.job_id, wait='10', phase='QUEUED')
print job.phase[0]

If it is still `EXECUTING` we can re-check the phase with

In [None]:
print job.phase[0]

If we check the job list in the web interface we will see the submitted job in the list on the left.

Now we need to fetch the results that the query has generated. We can download the data contained in various formats. We will be using the `csv` format as it is easly parsed by the `pandas` package:

In [None]:
if job.phase[0] == 'COMPLETED':
    fileurl = str(job.results[0].reference)
    cli.connection.download_file(fileurl, username, password, file_name='res.csv')

In [None]:
data = pd.read_csv('res.csv')

In [None]:
data

Let us delete the job from the stack so it does not hog our limited user space. We do that using the `delete_job` function:

In [None]:
cli.delete_job(job.job_id)

The operation of submitting a query, downloading a file, converting it to a `pandas`, and deleteing a job will be something will re-use again so let us wrap this procedure into a couple of functions:

In [2]:
def submit_query(client, query, queue):
    parameters = {'query': query, 'queue': queue}
    job = client.new_job(parameters)
    run = client.run_job(job.job_id)
    
    return run
    
def get_data(client, run, username, password, wait='10'):
    job = client.get_job(run.job_id, wait=wait, phase='QUEUED')
    
    if job.phase[0] == 'COMPLETED':
        fileurl = str(job.results[0].reference)
        client.connection.download_file(fileurl, username, password, file_name='res.csv')
        data = pd.read_csv('res.csv')
        success = client.delete_job(job.job_id)
        return data
    
    return None

Executing a query and fetching the results can now be done in a couple of lines:

In [7]:
query = 'SELECT ra,`dec` FROM `GDR1`.`tgas_source` WHERE FLOOR(source_id / (POW(2, 35) * POW(4, 12 - 6))) = 824;'
run = submit_query(cli, query, queue='long')

In [10]:
data = get_data(cli, run, username, password)

In [11]:
data.shape

AttributeError: 'NoneType' object has no attribute 'shape'

Let us plot them:

In [None]:
ax = plt.subplot(111, projection="aitoff")
ax.scatter(data['ra'] - 180, data['dec'], s=2, lw=0)
plt.show()