# SDSS SQL Tutorial
---

In the first part of this lab, you were (briefly) introduced to the world of online astronomical databases, and shown how to perform simple searches. The functionality of these databases extends well beyond what you saw in the previous lab. One of the powerful features of these databases is the ability to select out data that fit a certain set of criteria (beyond simply a location on the sky). This can include e.g. selecting all of the bright, well-resolved, nearby galaxies, in order to make your own Hubble classification diagram. Or selecting out all of the stars within a cluster (based on RA, Dec, and distance) in order to create an HR diagram of the cluster, and hence measure its age.

These complex queries are possible, although not with the simple query tools introduced earlier. To perform these tasks you need to learn about the Structured Query Language (SQL) a programming language for performing these queries. To learn SQL, you will work through SDSS SQL tutorial ([http://skyserver.sdss.org/dr14/en/help/howto/search/searchhowtohome.aspx](http://skyserver.sdss.org/dr14/en/help/howto/search/searchhowtohome.aspx)). 

The tutorial can be completed entirely online, but SQL queries can also be completed within python. This notebook demonstrates how to call SQL queries within python. As you work through the online, record and execute your answers in the notebook below. 

### SQL Queries in Python

The next two cells show how to execute the initial SQL query in the tutorial. 

In [None]:
# First, suppress some warnings and import useful packages
import warnings
warnings.filterwarnings('ignore',module='astropy.io.votables.tree')
warnings.filterwarnings('ignore',message='.*unclosed..socket')

import astropy.units as u
from astropy.coordinates import SkyCoord, ICRS
from astroquery.sdss import SDSS #package that allows queries of the SDSS database. 

Basically, the SQL query is recorded in a string variable, and then this string is sent to SDSS using the `query_sql` function.

In [None]:
# First Query

# input the query as a string, and then submit the string to SDSS
query_string='''select ra,dec
from specObj
where ra BETWEEN 140 and 141 AND
dec BETWEEN 20 and 21'''

data = SDSS.query_sql(query_string,verbose=False)

# We can now print the results
print(data)

The result is an astropy Tables object [(http://docs.astropy.org/en/stable/api/astropy.table.Table.html#astropy.table.Table)](http://docs.astropy.org/en/stable/api/astropy.table.Table.html#astropy.table.Table), which is similar to a dictionary. 

In [None]:
# The column names
print(data.keys())

# Access individual columns
print(data['ra'][:5])

From here you can complete the rest of the tutorial, recording your answers in the cells below. The examples have also been copied here for your records. While this tutorial does cover a lot of material related to SQL, you only need to go as far as you can before the lab ends (you can save the rest for another time). 

> **Practice 1**: What objects has the SDSS seen in a smaller area of the sky near ra = 140.5, dec = 20.5 (the same area you searched in the previous query)?
>
> Modify the previous query so it will return ra and dec of objects where the ra is between 140.25 and 140.75 and dec is between 20.25 and 20.75. How many objects did the query return?

In [None]:
# Practice 1
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

> **Practice 2**: Which of the objects you found in Practice 1 are galaxies? Modify your query so that it returns the ra, dec, and the best object ID for galaxies (and only galaxies) whose ra is between 194.25 and 194.75 and whose dec is between 2.25 and 2.75.

In [None]:
# Practice 2
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: Logical Operators
query_string='''select top 10
    z, ra, dec, bestObjID
from
    specObj
where
    class = 'galaxy' 
    and z > 0.3 
    and zWarning = 0
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
#Sample Query: Logical Operators
query_string='''select top 10
    z, ra, dec, bestObjID
from
    specObj
where
    (class = 'galaxy' or class = 'qso')
    and z > 0.3
    and zWarning = 0
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: Logical Operators
query_string='''select top 10
    ra, dec, modelMag_u, modelMag_g, modelMag_r, modelMag_i, modelMag_z, objID
from
    photoObj
where
    type = 6
    and modelMag_u - modelMag_g < 0.5
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

> **Practice 3**: What are the reddest galaxies in the area of the sky near ra=141?
> 
> Write a query to search for galaxies between ra = 140.9 and ra=141.1 brighter than g=18.0 for which u-g>2.2. Retrieve the Object ID, ra, dec, and the five final magnitudes.

In [None]:
# Practice 3
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

> **Practice 4**: What are the highest-redshift quasars in the SDSS database?
> 
> Write a query to search for quasars for which we have obtained spectra (search the specObj table) with redshifts greater than 4.5 and good measurements (zWarning = 0). Retrieve each quasar's Photo ID, ra dec, and redshift

In [None]:
# Practice 4
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: Joining
query_string='''select top 100
    x.plate, x.mjd,
    s.fiberID,
    p.modelMag_u, p.modelMag_g, p.ModelMag_r, p.ModelMag_i, p.ModelMag_z,
    p.ra, p.dec,
    s.z, p.ObjID
from photoObj p
join specObj s on s.bestobjid = p.objid
join plateX x on x.plateID = s.plateID
where
    s.class = 'qso'
    and s.zwarning = 0
    and s.z between 0.3 and 0.4
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

> **Practice 5**: How can you look up image data, plates, and spectra of moderately bright galaxies?
> 
> Write a query to find 100 galaxies for which we have spectra that have g magnitude between 17 and 17.4 and redshift less than 0.05. For each galaxy retrieve the object ID, the five magnitudes, the redshift, the plate/MJD number, and the fiber number.

In [None]:
# Practice 5
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

---
You can **stop here**, or continue on if you have time.

---

In [None]:
# Sample Query: Aggregate functions
query_string='''select 
    min(dec) as min_dec, max(dec) as max_dec, avg(dec) as avg_dec
from
    photoObj
where
    run = 5112
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: The Group By command
query_string='''select 
    class, count(z) as num_redshift
from 
    specObj
where
    z between 0.5 and 1
group by
    class
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: The Order By command
query_string='''select mjd,plate
from
    plateX
where
    plate <= 275
order by mjd
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

> **Practice 7**: What are the northernmost and southernmost objects with spectra measured by the SDSS?

In [None]:
# Practice 7
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)

# We can now print the results
print(data)

> **Practice 8**: What is the redshift of the nearest galaxy whose spectrum was measured by the SDSS with high confidence (zWarning=0)? 
>
> Compare the distance you found to the distance to the Andromeda Galaxy (2 million light-years) and the Whirlpool Galaxy (37 millions light-years). Does the distance you found seem reasonable?

In [None]:
# Practice 8
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data['redshift'][0],data['redshift'][0]/(7.11e-11))

> **Practice 9**: What field has galaxies with the highest average redshifts in run=5112, camcol=1?
>
> Be sure you are searching fields (as run-camcol-field) for galaxies. Also look at how many spectrally measured galaxies are in the field - make sure you don't pick a field with only one or two galaxies! Also note that this query will probably take a long time to execute.

In [None]:
# Practice 9
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

---
You can **stop here**, or continue on if you have time.

---

In [None]:
# Sample Query: Functions
# Search for all objects within 5 arcminutes of ra=140, dec = 20
query_string='''SELECT
    p.ObjID, p.ra, p.dec, p.u, p.g, p.r, p.i, p.z
FROM photoObj p
JOIN dbo.fGetNearbyObjEq(140,20,5) n ON n.objID = p.objID
WHERE
    p.type = 3
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: Flags
# Find English names for flags of all stars around the point 175,1
query_string='''SELECT
    p.ObjID, p.flags, dbo.fPhotoFlagsN(p.flags)
FROM photoObj p
JOIN dbo.fGetNearbyObjEq(140,20,5) n ON n.objID = p.objID
WHERE
    p.type = 6
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
# Sample Query: Flags
query_string='''SELECT
    p.ObjID, p.ra, p.dec, dbo.fPhotoFlagsN(p.flags)
FROM photoObj p
JOIN dbo.fGetNearbyObjEq(140,20,5) n ON n.objID = p.objID
WHERE
    (p.flags & dbo.fPhotoFlags('SATURATED')) = 0
'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

> **Practice 10**: In the field 5112-6-119, what percentage of all objects detected by SDSS are too close to the edge of their fields to be trusted?
>
> Hint: Use two searches, one with a flag and one without. Search run=5112, camcol=6, field=119

In [None]:
#Practice 10
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

In [None]:
#Practice 10
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)
print(data)

> **Practice 11**: Choose a galaxy cluster from SkyServer's Famous Places tool. Write a query to select galaxies in the cluster, and only galaxies in the cluster
>
> Hint: After you pick a cluster, use the Navigation Tool to examine the cluster. Guess which galaxies belong to the cluster - you should be able to tell just by looking. Click on 5-10 galaxies and save them in your online notebook. Open the notebook to look for features that the cluster galaxies have in common. Guess the center position and radius of the galaxies. Then, write a query that uses what you have learned to search for the cluster galaxies.

In [None]:
# Practice 11
query_string='''

'''

data = SDSS.query_sql(query_string,verbose=False)

# We can now print the results
print(data)

- Challenge 1: What percentage of galaxies have spectral redshifts measured? What percentage have photometric redshifts taken? What are the advantages of using photometric redshifts? Try to compare photometric and spectral redshifts; how accurate are photometric redshifts?
- Challenge 2: What are the limits in ra and dec of stripes 42 and 43, two of the SDSS's diagonal stripes?
- Challenge 3: Look at colors and spectra of stars, and find stars consistent with white dwarfs. Create a list of white dwarfs in the SDSS database.
- Challenge 4: What are the largest galaxies in the SDSS, in terms of size? Hint: Look at surface brightness and ellipticity.
- Challenge 5: Find all objects with spectra classified as unknown.
- Challenge 6: Find the broad absorption line (BAL) quasars in the SDSS database. At what redshift are most BAL quasars found?
- Challenge 7: Find variable stars in the SDSS (stars imaged more than once whose magnitude changed by more than 0.1 between observations). How variable are the stars?

SQL is a common language for querying databases. The International Virtual Observatory Association supports use of the Astronomy Database Query Language (ADQL), and is used by e.g. GAIA [https://www.gaia.ac.uk/data/gaia-data-release-1/adql-cookbook](https://www.gaia.ac.uk/data/gaia-data-release-1/adql-cookbook). Sincd ADQL is built on from SQL, much of the syntax is the same. 