<a href="https://colab.research.google.com/github/mafux777/Alation_Article/blob/master/notebooks/Work_with_Queries.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Downloading Query Results programmatically

In [103]:
import requests
import pandas as pd

In [104]:
token = "p7uqevYeWBkWNA2X6WLWUEZJAysvnjTktNiDMqEQGX4"
host = "https://beta-sandbox.alationproserv.com"

def generic_get_text(api):
  return requests.get(host+api, headers=dict(token=token)).text

def generic_get_json(api):
  return requests.get(host+api, headers=dict(token=token)).json()

We will hard code the query ID here used for the example, all other params are obtained via APIs.

Double check the text of the query (this will be the most recently published):

In [105]:
query_id = 44

query_text = generic_get_text(f"/integration/v1/query/{query_id}/sql/")
query_text

'SELECT *  FROM public.rdbms_columns '

When was the query last executed? This gets us the valuable execution ID.

In [106]:
exec = generic_get_json(f"/integration/v1/query/{query_id}/result/latest/")
exec

{'id': 51,
 'truncated': False,
 'execution_event': {'executed_at': '2023-01-23T04:29:00.560603Z',
  'canceled': False,
  'execution_error': None,
  'finished': True,
  'finished_at': '2023-01-23T04:29:01.092257Z',
  'seconds_taken': 0.531654}}

If we want even more details, we could use this API instead:

In [107]:
execs = requests.get(host + f"/integration/v1/query/{query_id}/execution_event/", headers=dict(token=token)).json()

Alation calls this an execution event, which coincides with either a user clicking on "run" or the query running on schedule.

In [108]:
pd.json_normalize(execs).sort_values('result.id', ascending=False).set_index('id')

Unnamed: 0_level_0,num_result_rows,execution_error,canceled,ts_executed,elapsed_seconds,index_in_batch,batch_id,session_id,query_id,db_username,...,result.deleted,result.url,result.storage_status,result.data_schema,result.query.id,result.query.title,result.query.description,datasource.id,datasource.title,datasource.url
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
148,1000,,False,2023-01-23T04:29:00.560603Z,0.531654,0,130,49,44,matthias.funke@alation.com,...,False,/execution_result/51/,STORED_ALL,"[{'original_name': 'dim_load_id', 'name': 'dim...",44,All RDBMS Columns Exported,<p>For test purposes</p>\n,1,Alation Analytics,/data/1/
147,1000,,False,2023-01-23T04:24:59.322473Z,0.240971,0,129,48,44,matthias.funke@alation.com,...,False,/execution_result/50/,STORED_ALL,"[{'original_name': 'dim_load_id', 'name': 'dim...",44,All RDBMS Columns Exported,<p>For test purposes</p>\n,1,Alation Analytics,/data/1/


To download the result of the latest execution, we don't need the dataframe, we can refer to the result of the earlier API call (exec). 

In [109]:
download = f"/integration/v1/result/{exec['id']}/csv/"

In [110]:
r = generic_get_text(download)

To create a dataframe, we wrap the string into a StringIO object, making it appear as a file to the Pandas API. 

In [112]:
import io

df = pd.read_csv(io.StringIO(r))
df

Unnamed: 0,dim_load_id,dim_ts_created,dim_checksum,id,user_id,ts_created,ts_updated,ts_deleted,deleted,column_id,...,expr_title,norm_type,is_type_derived,position,sensitive,excluded,is_transformed,attribute_url,sensitivity,classification
0,1,2022-06-16 20:57:16.973927+00,310054800,1,,2022-06-15 15:34:11.692637+00,2022-06-15 15:34:19.837969+00,,False,1,...,,STRING,False,1,False,False,,/attribute/1/,,
1,1,2022-06-16 20:57:19.514006+00,512560782,2,,2022-06-15 15:34:11.692637+00,2022-06-15 15:35:31.787642+00,,False,2,...,,STRING,False,2,False,False,,/attribute/2/,,
2,1,2022-06-16 20:57:19.496668+00,981339989,3,,2022-06-15 15:34:11.692637+00,2022-06-15 15:34:18.613225+00,,False,3,...,,STRING,False,3,False,False,,/attribute/3/,,
3,1,2022-06-16 20:57:16.974033+00,3105227569,4,,2022-06-15 15:34:11.692637+00,2022-06-15 15:35:33.230795+00,,False,4,...,,,False,4,False,False,,/attribute/4/,,
4,1,2022-06-16 20:57:19.496882+00,1322979056,5,,2022-06-15 15:34:11.692637+00,2022-06-15 15:34:19.510486+00,,False,5,...,,,False,5,False,False,,/attribute/5/,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15732,97,2022-12-27 18:01:50.486377+00,464785358,7950,,2022-09-07 20:13:09.600692+00,2022-09-07 20:13:09.600692+00,,False,7950,...,,STRING,False,2,False,False,,/attribute/7950/,,
15733,97,2022-12-27 18:01:50.486581+00,1460079435,7951,,2022-09-07 20:13:09.600692+00,2022-09-07 20:13:09.600692+00,,False,7951,...,,STRING,False,1,False,False,,/attribute/7951/,,
15734,97,2022-12-27 18:01:50.486814+00,1886981156,7952,,2022-09-07 20:13:09.600692+00,2022-09-07 20:13:09.600692+00,,False,7952,...,,STRING,False,1,False,False,,/attribute/7952/,,
15735,97,2022-12-27 18:01:50.486697+00,2507154400,7953,,2022-09-07 20:13:09.600692+00,2022-09-07 20:13:09.600692+00,,False,7953,...,,STRUCT,False,5,False,False,,/attribute/7953/,,


Note how the resulting dataframe can contain much more than the 1000 rows initially indicated by the API.