# SDSS SQL Tutorial
---

### Names: [Insert names here]

**Before you do anything else, go to *File -> Save As* and change the filename to include your name or initials. Make any requested edits to that copy.**

**New Code**

* Simple SQL query.
* Select objects
  * within a given range of RA and Dec.
  * based on object classification.
  * based on redshift.
  * based on numerical constraints (e.g., the color is above a certain value)
* Extract
  * RA, Dec
  * Magnitude at different bands
  * redshift
  * Object ID
* Return a sorted list of object. 
* Join tables. 
* Access the data returned by an SQL query within python.


In the first part of this lab, you were (briefly) introduced to the world of online astronomical databases, and shown how to perform simple searches. The funcionality of these databases extends well beyond what you saw in the previous lab. One of the powerful features of these databases is the ability to select out data that fit a certain set of criteria (beyond simply a location on the sky). This can include e.g., selecting all of the bright, well-resolved, nearby galaxies, in order to make your own Hubble classification diagram. Or selecting out all of the stars within a cluster (based on RA, Dec, and distance) in order to create an HR diagram of the cluster, and hence measure its age.

These complex queries are not possible with the simply query tools introduced earlier. To perform these tasks you need to learn about the Structured Query Language (SQL) a programming language for performing queries. To learn SQL, you will work through the SDSS SQL tutorial:
([http://skyserver.sdss.org/dr14/en/help/howto/search/searchhowtohome.aspx](http://skyserver.sdss.org/dr14/en/help/howto/search/searchhowtohome.aspx)). 

The tutorial can be completed entirely in the pages linked above, but SQL queries can also be completed within python. This notebook demonstrates how to call SQL queries within python. As you work through the online activity, record and execute your answers in the notebook below. 

It can help to place the commands into the SDSS pages, and then copy them here once you have them correct. This is because the SDSS pages return more helpful error messages than python. 

### SQL Queries in Python

The next two cells install and load in the necessary packages.

In [None]:
!pip install git+http://github.com/astropy/astroquery.git#egg=astroquery

In [None]:
# First, suppress some warnings and import useful packages
import warnings
warnings.filterwarnings('ignore',module='astropy.io.votables.tree')
warnings.filterwarnings('ignore',message='.*unclosed..socket')

import astropy.units as u
from astropy.coordinates import SkyCoord, ICRS
from astroquery.sdss import SDSS #package that allows queries of the SDSS database


The SQL query is recorded in a string variable, and then this string is sent to SDSS using the `query_sql` function.

In [None]:
# First Query

# input the query as a string, and then submit the streing to SDSS
query_string = '''select ra,dec
from specObj
where ra BETWEEN 140 and 141 AND
dec BETWEEN 20 and 21'''

data = SDSS.query_sql(query_string,verbose=False)

# We can now print the results
print(data)

The result is an astropy Tables object [(http://docs.astropy.org/en/stable/api/astropy.table.Table.html#astropy.table.Table)](http://docs.astropy.org/en/stable/api/astropy.table.Table.html#astropy.table.Table), which is similar to a dictionary.

In [None]:
# The column names
print(data.keys())

# Access individual columns
print(data['ra'][:5])

From here you can complete the rest of the tutorial, recording your answers in the cells below. The examples have also been copied here for your records. The tutorial covers a lot of material related to SQL, but we will only cover the beginning of the tutorial.


> **Practice 1:** What objects has the SDSS seen in a smaller area of the sky near ra = 140.5, dec = 20.5 (the same area you searched in the previous query)?
>
> Modify the previous query so it will return ra and dec of objects where the ra is between 140.25 and 140.75 and dec is between 20.25 and 20.75. How many objects did the query return?

In [None]:
# Practice 1
query_string = '''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

> **Practice 2:** Which of the objects you found in Practice 1 are galaxies? Modify your query so that it returns the ra, dec, and the best object ID for galaxies (and only galaxies) whose ra is between 194.25 and 194.75 and whose dec is between 2.25 and 2.75.

In [None]:
# Practice 2
query_string = '''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: Logical Operators
query_string = '''select top 10
    z, ra, dec, bestObjID
from
    specObj
where
    class = 'galaxy'
    and z > 0.3
    and zWarning = 0
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: Logical Operators
query_string = '''select top 10
    z, ra, dec, bestObjID
from 
    specObj
where
    (class = 'galaxy' or class = 'qso')
    and z > 0.3
    and zWarning = 0
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: Logical Operators
query_string = '''select top 10
    ra, dec, modelMag_u, modelMag_g, modelMag_r, modelMag_i, modelMag_z, objID
from 
    photoObj
where
    type = 6
    and modelMag_u - modelMag_g < 0.5
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)


> **Practice 3:** What are the reddest galaxies in the area of the sky near ra=141?
>
> Write a query to search for galaxies between ra = 140.9 and ra = 141.1 brighter than g=18.0 for which u-g>2.2. Retrieve the Object ID, ra, dec, and the five final magnitudes.

In [None]:
# Practice 3
query_string = '''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

> **Practice 4:** What are the highest-redshift quasars in the SDSS database?
>
> Write a query to search for quasars for which we have obtained spectra (search the specObj table) with redshifts greater than 4.5 and good measurements (zWarning=0). Retrieve each quasar's Photo ID, ra, dec, and redshift.

[Note: Redshift, indicated by the symbol $z$, refers to the doppler shift of distant objects. Due to the expansion of the universe, nearly all galaxies and distant objects appear to be moving away from us, causing their light to be redshifted, with larger redshifts corresponding to larger distances.]

In [None]:
# Practice 4
query_string = '''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: Joining
query_string = '''select top 100
    x.plate, x.mjd,
    s.fiberID,
    p.modelMag_u, p.modelMag_g, p.ModelMag_r, p.ModelMag_i, p.ModelMag_z,
    p.ra, p.dec,
    s.z, p.ObjID
from photoObj p
join specObJ s on s.bestobjid = p.objid
join plateX x on x.plateID = s.plateID
where
    s.class = 'qso'
    and s.zwarning = 0
    and s.z between 0.3 and 0.4
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

> **Practice 5:** How can you look up image data, plates, and spectra of moderately bright galaxies?
>
> Write a query to find 100 galaxies for which we have spectra that have g magnitude between 17 and 17.4 and redshift less than 0.05. For each galaxy retrieve the object ID, the five magnitudes, the redshift, the plate/MJD number, and the fiber number.

In [None]:
# Practice 5
query_string = '''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

### Putting it all together

We have seen how to collect information from different databases. Now lets put it together.

> **Q:** Lets examine the Virgo galaxy cluster, by estimating its total mass based on its velocity dispersion, and looking at the colors of the galaxies. To calculate the mass use the virial theorem, $M=2Rv^2/G$, where $R$ is the radius of the cluster (assume $R$=7.5 million light years), $G$ is the gravitational constant, and $v$ is the standard deviation of the velocities, which you can derive from the standard deviation of the redshifts $z$ ($z=vc$, where $c$ is the speed of light). Also, make a plot of g-i vs i for the galaxies within the Virgo cluster to show how galaxy color changes with brightness.
Query the SDSS database for all galaxies within 5 degrees of the center of the Virgo cluster, with a redshift within 50% of the average value for the Virgo cluster. You will need to look up the position of the Virgo cluster and its average redshift, from the SDSS database extract values you need to calculate the cluster mass and make the color-magnitude diagram. 

In cgs units, 1 light year = 9.5x10<sup>17</sup>cm, G = 6.67x10<sup>-8</sup> cm<sup>3</sup>/g/s<sup>2</sup>, c=2.99x10<sup>10</sup>cm/s, and the mass of the Sun is 1.99x10<sup>33</sup> g.

In [None]:
query_string ='''

'''
data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
### Calculate the mass of the cluster here.

In [None]:
### Plot g-i vs i here.

#### To turn in this lab, either email me (kmf4) a copy of this notebook, or place your copy in Shared/Astro211_F25/Lab1 folder in the Astro server.

SQL is a common language for querying all kinds of databases (e.g., the office of campus relations uses SQL when querying the Williams alumni database). The International Virtual Observatory Association supports use of the Astronomy Database Query Language (ADQL), and is used by e.g. GAIA [https://www.gaia.ac.uk/data/gaia-data-release-1/adql-cookbook](https://www.gaia.ac.uk/data/gaia-data-release-1/adql-cookbook). Sincd ADQL is built on from SQL, much of the syntax is the same. 