# Example: How to cross-match data with SkyQuery

SciServer Compute can talk to other components of SciServer through a series of <em>modules</em>, one for each component. This example notebook shows how to use the <strong><code>SciServer.SkyQuery</code></strong> module to perform sophisticated cross-matches among astronomical datasets.

You are welcome (encouraged!) to copy these examples into another folder and modify them to meet your needs. You can use them as a starting point to create your own scripts. Please do not edit this notebook directly, because your edits may be overwritten if changes to the SciServer modules require changes to these example notebooks.

To run the example Python scripts in this notebook, click in any of the Code cells below (the ones with the gray backgrounds). Click the play button at the top of the window (just below the menubar) to run the script, or pres Shift-Enter. The output of each cell's script will appear directly below the cell.

## Import modules

Like any Python modules, the SciServer modules must be imported before being used. The next code block first imports the SciServer modules you will need for this example notebook, then imports some other required modules. Comments in the code block explain what each module does. To learn how to import other modules, see the Python 3.5 import documentation (https://docs.python.org/3.5/reference/import.html), or the documentation of the module(s) you are trying to import.

In [None]:
from SciServer import SkyQuery
print('Imported SciServer modules')     # Work with SkyQuery through Comptue

from pprint import pprint
print('Imported other needed modules')  # print data structures in readable format (https://docs.python.org/3.5/library/pprint.html)

## List available datasets

The function <strong><code>SkyQuery.listAllDatasets()</code></strong> (takes no parameters) lists all the datasets available to you through SkyQuery.

The function returns a list of dictionaries, one for each available dataset. You can cross-match among any of the datasets by using its <code>name</code>.

In [None]:
datasets = SkyQuery.listAllDatasets()
pprint(datasets)

## List the contents of your MyDB

The function <strong><code>SkyQuery.getDatasetInfo(datasetName)</code></strong> returns the contents of a specified database. You can use it to show the contents of any of the databases you found in the last code cell; here, we run in with <code>MyDB</code> to show the contents of your MyDB.

The function returns a list of dictionaries, one for table in the specified dataset. Each table has a <strong><code>name</code></strong> by which it can be referred.

In [None]:
info = SkyQuery.getDatasetInfo("MyDB")
print(info)
print('\n')

tables = SkyQuery.listDatasetTables("MyDB")
pprint(tables)

## Manage jobs

Like CasJobs, SkyQuery manages its requests (both queries and cross-match requests) as <strong>jobs</strong>. You can submit the request using the <strong><code>SkyQuery.SubmitJob(query, queue)</code></strong> function. 


When you submit a query or a cross-identification request to SkyQuery, it creates a job an assigns a <em>jobID</em> that you can use to check on the status of the query. 

In [None]:
#Define query

SkyQuery_Query = "select 4.5 as Column1, 5.5 as Column2"

#submit a query as a job
jobId = SkyQuery.submitJob(query=SkyQuery_Query, queue="quick")
print(jobId)

In [None]:
#get status of a submitted job

jobId = SkyQuery.submitJob(query=SkyQuery_Query, queue="quick")
jobDescription = SkyQuery.getJobStatus(jobId=jobId)
pprint(jobDescription)

In [None]:
# wait for a job to be finished and then get the status

jobId = SkyQuery.submitJob(query=SkyQuery_Query, queue="quick")
jobDescription = SkyQuery.waitForJob(jobId=jobId, verbose=True)

print('\n')
print("jobDescription:")

pprint(jobDescription)

In [None]:
# cancel a job that is running, and then get its status to prove it is cancelled

jobId = SkyQuery.submitJob(query=SkyQuery_Query, queue="long")
SkyQuery.cancelJob(jobId)

print("job status:")
pprint(SkyQuery.getJobStatus(jobId=jobId))

In [None]:
#list available job queues

queueList = SkyQuery.listQueues()
pprint(queueList)

In [None]:
# Get a list of all the jobs you have run. Retruns two dictionaries: one for the Quick queue and one for the Long queue.

quickJobsList = SkyQuery.listJobs('quick')
longJobsList = SkyQuery.listJobs('long')
print('Quick Jobs:')
pprint(quickJobsList)
print('\n')
print('Long jobs:')
pprint(longJobsList)

In [None]:
#define a csv table to be uploaded to into MyDB in SkyQuery

SkyQuery_TestTableName = "TestTable_SkyQuery"
SkyQuery_TestTableCSV = u"Column1,Column2\n15.5,16.5\n"

print('Test case variables set.')

In [None]:
# Uploading your sample CSV file. Returns true.
# This works only if the table you specified does not already exist.

result = SkyQuery.uploadTable(uploadData=SkyQuery_TestTableCSV, tableName=SkyQuery_TestTableName, datasetName="MyDB", format="csv")
print(result)

In [None]:
# Downloading the contents of a table in your MyDB (returns a pandas dataframe)

table = SkyQuery.getTable(tableName=SkyQuery_TestTableName, datasetName="MyDB", top=10)
table

In [None]:
#list tables inside dataset

tables = SkyQuery.listDatasetTables("MyDB")
pprint(tables)

In [None]:
#get info for a specified table:

info = SkyQuery.getTableInfo(tableName="webuser." + SkyQuery_TestTableName, datasetName="MyDB")
pprint(info)

In [None]:
#get dataset table columns info

columns = SkyQuery.listTableColumns(tableName="webuser." + SkyQuery_TestTableName, datasetName="MyDB")
pprint(columns)

In [None]:
#drop (or delete) table from dataset.

result = SkyQuery.dropTable(tableName=SkyQuery_TestTableName, datasetName="MyDB");
print(result)