### Intro to SQL Workshop Queries

This notebook contains the queries we wrote during the Intro to SQL workshop. 

In the workshop, we ran these through DB Browser for SQLite. Here, I'm running them through python and pandas, but the queries themselves are transferable - you can copy and paste the SQL and it should run fine in DB Browser or other IDEs (the portability of SQL is actually one of its benefits). 

If you want to run this specific workbook, you'll need to create a data directory (at the same level as this notebook) with a copy of the portal_mammals.sqlite file (see http://datacarpentry.org/sql-ecology-lesson/setup.html for instructions on getting this file). 

In [None]:
import sqlite3
import pandas as pd

In [None]:
conn = sqlite3.connect("data/portal_mammals.sqlite")

In [None]:
pd.read_sql_query("""
    SELECT species, genus, taxa FROM species 
""", conn)

In [None]:
# select columns
pd.read_sql_query("""
    SELECT species, genus, taxa FROM species 
""", conn)

In [None]:
## select rows 
pd.read_sql_query("""
    SELECT * FROM species WHERE taxa = 'Bird'
""", conn)

In [None]:
pd.read_sql_query("""
    SELECT DISTINCT taxa FROM species
""", conn)

### exercise:
- select everything from the surveys TABLE
- select only the species_id and weight from the surveys TABLE
- select the distinct species_id from the surveys TABLE
- are there any species in the species table that aren't in the surveys table?
- select only rows with month = 12

In [None]:
pd.read_sql_query("""
    SELECT * FROM surveys
""", conn)

In [None]:
pd.read_sql_query("""
    SELECT species_id, weight FROM surveys
""", conn)

In [None]:
pd.read_sql_query("""
    SELECT * FROM surveys WHERE month = 12
""", conn)

In [None]:
pd.read_sql_query("""
    SELECT DISTINCT taxa FROM species
""", conn)

In [None]:
pd.read_sql_query("""
    SELECT DISTINCT species_id FROM surveys
""", conn)

In [None]:
## compound WHERE clause
pd.read_sql_query("""
    SELECT * 
    FROM surveys
    WHERE 
    (species_id = 'DS' OR species_id = 'DO' ) AND (hindfoot_length > 50 OR weight < 100)
    ORDER BY hindfoot_length DESC
""", conn)

### exercise:

- return everyting in month 1 OR 2 AND plot 8 or 9


In [None]:
## compound WHERE clause
pd.read_sql_query("""
    SELECT * 
    FROM surveys
    WHERE
        (month = 1 OR month = 2) AND (plot_id = 9 OR plot_id = 9)
""", conn)

In [None]:
# aggregations
# GROUP BY
# HAVING

# note I commwned out the requirement thata hindfoot_length > 30
# remove the -- to include this clause

## compound WHERE clause
pd.read_sql_query("""
    SELECT species_id, AVG(hindfoot_length)
    FROM surveys
    WHERE hindfoot_length IS NOT NULL
    GROUP BY species_id
    --HAVING AVG(hindfoot_length) > 30
    ORDER BY AVG(hindfoot_length) DESC
""", conn)




In [None]:
## aggregations
# GROUP BY
# SUM, AVG, MAX, MIN, COUNT
# HAVING

pd.read_sql_query("""
    SELECT species_id, COUNT(species_id)
    FROM surveys
    GROUP BY species_id
    ORDER BY COUNT(species_id) DESC
""", conn)



### exercise
- count the number of taxa in the species table
- order in by count in descending order

In [None]:
# note - using an alias for one of the column names

pd.read_sql_query("""
    SELECT taxa, COUNT(taxa) AS n_taxa
    FROM species
    GROUP BY taxa
    ORDER BY n_taxa DESC
""", conn)

### JOINS

- we can get the average hindfoot_length for each species_id
- what if I want it by genus, or taxa?

- start by JOINing
- then we'll aggregate


In [None]:
pd.read_sql_query("""
    SELECT * 
    FROM species
    JOIN surveys
    ON species.species_id = surveys.species_id
    JOIN plots
    ON surveys.plot_id = plots.plot_id
""", conn)

### inner vs outer joins

Inner joins only include rows where both tables have a matching fow
Left Outer joins include all rows from the left table even if there is no match from the right table

recall there were a few species_id in the species table that weren't in the surveys TABLE.
We can use a LEFT OUTER join to include them in the result, even though there is no match in the species table

In [None]:
pd.read_sql_query("""
    SELECT *
    FROM species
    LEFT JOIN surveys
    ON species.species_id = surveys.species_id
    ORDER BY record_id
    --WHERE record_id IS NULL
""", conn)