### Query SDSS DR12 for interesting objects, create a URL for retrieving a JPG cutout, show these and store them on SciDrive from which they can be served to world via public URL.

This demo show cases various components of SciServer and in particular how SciServer/Compute (i.e. Jupyter notebooks) allow one to communicate with these. IN particular:
1. Single-sign-on authentication through Keystone tokens
1. import special purpose libraries written for SciServer actions
1. Querying relational databases registered in CasJobs (SciServer's database frontend and batch query engine)
1. Manipulating query results in python code (visualization)
1. Storing results on local scratch disk as an HDF5 file for later reuse
1. Alternatively, storing query result in one's private database, MyDB
1. Based on query result retrieve images from persistents store, available to notebook because the docker container was created with a link to the corresponding volume container and show the images on the screen.
1. Write images to the sharable, dropbox-like SciDrive. There they can be found through the UI interface and shared with colleagues.

# 2. Import SciServer libraries 
The SciServer team has written a number of libraries, generally prefixed by <tt>SciServer</tt>, that assist in various functions. The next block imports those, together with some standard python libraries. 

In [None]:
# SciServer libraries
import SciServer.CasJobs as cj
import SciServer.SciDrive as scid
import SciServer.SkyServer as skys
import SciServer.Authentication as auth
# external libraries
import numpy as np
import pandas
import matplotlib.pyplot as plt
import skimage.io
import json

In [None]:
# some special settings
# ensure columns get written completely in notebook
pandas.set_option('display.max_colwidth', -1)
# do *not* show python warnings 
import warnings
warnings.filterwarnings('ignore')

# 3. Query an astronomy database (SDSS/DR12)
Write SQL statement and send it to CasJobs' REST API. Uses synchronous mode as the query is quite small. asynch mode is available, this will submit job to queue and the result will be stored in a table in MyDB or MyScratch/DB. 

TODO make example with batch query mode.

In [None]:
# query obtained from SkyServer interface
# Queries the Sloan Digital Sky Survey's Data Release 12.
# For schema and documentation see http://skyserver.sdss.org
#
# This query finds galaxies in the SDSS database that have a spectrum taken and which have a size (petror90_r)
# larger than 10 arcsec.
# We return 
query="""
SELECT TOP 16 p.objId,p.ra,p.dec,p.petror90_r
  FROM galaxy AS p
   JOIN SpecObj AS s ON s.bestobjid = p.objid
WHERE p.u BETWEEN 0 AND 19.6
  AND p.g BETWEEN 0 AND 17
  AND p.petror90_r > 10
"""

# query CasJobs table. Using DR12 as context. I.e. a connection is made to DR12 when running the query.
gals = cj.executeQuery(query, "dr12")

In [None]:
# show the table
gals

# 4. Simple Plot

In [None]:
plt.scatter(gals['ra'], gals['dec'])
plt.show() 

# 5. Storing results on local scratch disk as an HDF5 file for later reuse
After running the next script view the folder <tt>persistent/science demos/</tt> of the Jupyter notebook to see the files.

In [None]:
# store result as HDF5 file 
h5store = pandas.HDFStore('GalaxyThumbSample.h5')
h5store['galaxies']=gals
h5store.close()

# store result as CSV file
gals.to_csv('GalaxyThumbSample.csv')

# 6. Retrieve cut-outs/thumbnails of galaxies,show them on screen, write them to SciDrive and retrieve public URLs
SkyServer has a service that will produce a JPG cut-out of certain dimensions around a specified position. It uses a pre-defined image pyramid for this. We can construct the URL of the service form the query results and retrieve the images using standard python functions. We use the Petrosian radius along with the pixel size and desired width to set the appropriate scale for each cutout so as to produce a postage stamp image of each object.

In [None]:
# create a container for the images in SciDrive
container = 'thumbnails_20171210e'
try : # if the container does not exist, this request will throw an exception, bit of a hack...
    url = scid.publicUrl(container)
except Exception :
    scid.createContainer(container)

In [None]:
# get thumbnail cutout, write to scidrive and plot on screen.
# store public URLs for the scidrive images
width=200
height=200
pixelsize=0.396
plt.figure(figsize=(15, 15))
subPlotNum = 1


puburls=[]
tempfile='/home/idies/workspace/scratch/TMP.jpg'
for index,gal in gals.iterrows():
    scale=2*gal['petror90_r']/pixelsize/width

    # perform image cutout using skyserver 
    img=skys.getJpegImgCutout(ra=gal['ra'], dec=gal['dec'], scale=scale, width=width, height=height)
    
    # save the image to a JPG file in SciDrive
    # TODO find out what data type to use to avoid having to write to/read from scratch file
    # preferrable would be something like next, but this particular solution does not work.
    #    scid.upload(scidrivename_name, data=img) 
    skimage.io.imsave(tempfile,img)
    scidrivename_name = container+"/new_"+str(index)+".jpg"
    scid.upload(scidrivename_name, localFilePath=tempfile) 
        
    # store the public URL of the newly uploaded SciDrive file
    puburls.append(scid.publicUrl(scidrivename_name))

    plt.subplot(4,4,subPlotNum)
    subPlotNum += 1
    plt.imshow(img)
    # show the object identifier (objId) above the image.
    plt.title(gal['objId'])

gals['pubURL']=puburls


### Check after ... http://www.scidrive.org

In [None]:
print(puburls[0])

# 7. Store result in MyDB table
### Check before ...  http://skyserver.sdss.org/CasJobs/MyDB.aspx

In [None]:
# add column with public urls to the galaxies table ...
gals['pubURL']=puburls
gals

In [None]:
# create table in MyDB. Delete it first if it already exists
thumbsTable='GalaxyThumbs'
ddl = """
IF EXISTS (select * from information_schema.tables where table_name='{0}') 
BEGIN
    DROP TABLE {0}
END
CREATE TABLE {0}(objId bigint, ra real, dec real, petror90_r real, pubURL varchar(128))""".format(thumbsTable)

response = cj.executeQuery(ddl)

In [None]:
# upload directly form DataFrame
response=cj.uploadPandasDataFrameToTable(gals,thumbsTable)


### Check after ...  http://skyserver.sdss.org/CasJobs/MyDB.aspx
