# SQL | Subqueries
---
## Concepts
- SQL is a declarative-programming language. Designers of SQL wants its users to focus on expressing computations over variable names.
- A subquery is a query nested within another query and must always be contained within parentheses.
- A subquery can exist within the __`SELECT`__, __`FROM`__ or __`WHERE`__ clause.
- We can use __`IN` to specify a list of values__ we want to match against.
- When writing queries that have subqueries, we'll want to write our inner queries first.
- The subquery gets executed first whenever the query gets ran.

In [1]:
# importing libraries and database

import sqlite3
import pandas as pd

conn = sqlite3.connect("jobs.db")

In [2]:
q1 = "SELECT * FROM sqlite_master WHERE type='table'"
pd.read_sql_query(q1, conn)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,recent_grads,recent_grads,2,"CREATE TABLE ""recent_grads"" (\n""index"" INTEGER..."


### Instructions

Try to write a query that answers the following question using the SQL you've learned so far:
- Which rows are above the average for the __`ShareWomen`__ column?

In [3]:
q1 = ''' SELECT * FROM recent_grads
    WHERE ShareWomen > (select AVG(ShareWomen) from recent_grads)
    LIMIT 5
    '''
pd.read_sql_query(q1, conn)

Unnamed: 0,index,Rank,Major_code,Major,Major_category,Total,Sample_size,Men,Women,ShareWomen,...,Part_time,Full_time_year_round,Unemployed,Unemployment_rate,Median,P25th,P75th,College_jobs,Non_college_jobs,Low_wage_jobs
0,6,7,6202,ACTUARIAL SCIENCE,Business,3777,51,832,960,0.535714,...,296,2482,308,0.095652,62000,53000,72000,1768,314,259
1,20,21,2102,COMPUTER SCIENCE,Computers & Mathematics,128319,1196,1837,2524,0.578766,...,18726,70932,6884,0.063173,53000,39000,70000,68622,25667,5144
2,30,31,2410,ENVIRONMENTAL ENGINEERING,Engineering,4047,26,2639,3339,0.558548,...,930,1951,308,0.093589,50000,42000,56000,2028,830,260
3,34,35,6107,NURSING,Health,209394,2554,21773,187621,0.896019,...,40818,122817,8497,0.044863,48000,39000,58000,151643,26146,6193
4,38,39,2503,INDUSTRIAL PRODUCTION TECHNOLOGIES,Engineering,4631,73,528,1588,0.750473,...,597,3242,129,0.028308,46000,35000,65000,1394,2454,480


### Instructions

1. Write a query that returns the majors that are below the average for __`Unemployment_rate`__. The results should:
- only contain the __`Major`__ and __`Unemployment_rate`__ columns
- be sorted in ascending order by __`Unemployment_rate`__

In [4]:
q1 = ''' SELECT Major, Unemployment_rate 
    FROM recent_grads 
    WHERE Unemployment_rate < (select AVG(Unemployment_rate) from recent_grads)
    ORDER BY Unemployment_rate
    '''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major,Unemployment_rate
0,MATHEMATICS AND COMPUTER SCIENCE,0.000000
1,BOTANY,0.000000
2,SOIL SCIENCE,0.000000
3,EDUCATIONAL ADMINISTRATION AND SUPERVISION,0.000000
4,ENGINEERING MECHANICS PHYSICS AND SCIENCE,0.006334
...,...,...
82,NATURAL RESOURCES MANAGEMENT,0.066619
83,MICROBIOLOGY,0.066776
84,FAMILY AND CONSUMER SCIENCES,0.067128
85,ADVERTISING AND PUBLIC RELATIONS,0.067961


### Instructions

1. Write a SQL statement that computes the proportion (as a float value) of rows that contain above average values for the __`ShareWomen`__.
2. The results should only return the proportion, aliased as __`proportion_abv_avg`__, like so (with a different value):
    
|__proportion_abv_avg__|
|---|
|__0.000__|



In [5]:
q1 = ''' SELECT CAST(COUNT(*) as float) / (SELECT CAST(COUNT(*) as float) FROM recent_grads) 
    AS proportion_abv_avg 
    FROM recent_grads
    WHERE ShareWomen > (select AVG(ShareWomen) from recent_grads)
    '''
pd.read_sql_query(q1, conn)

Unnamed: 0,proportion_abv_avg
0,0.526012


### Instructions

1. Write a query that returns the __`Major`__ and __`Major_category`__ columns for the rows where:
- __`Major_category` is one of the 5 highest group level sums for the __`Total`__ column


In [6]:
q1 = ''' SELECT Major, Major_category 
    FROM recent_grads
    WHERE Major_category IN (select Major_category from recent_grads
    group by Major_category
    order by SUM(Total) desc limit 5)
    '''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major,Major_category
0,PETROLEUM ENGINEERING,Engineering
1,MINING AND MINERAL ENGINEERING,Engineering
2,METALLURGICAL ENGINEERING,Engineering
3,NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering
4,CHEMICAL ENGINEERING,Engineering
...,...,...
77,ANTHROPOLOGY AND ARCHEOLOGY,Humanities & Liberal Arts
78,EARLY CHILDHOOD EDUCATION,Education
79,OTHER FOREIGN LANGUAGES,Humanities & Liberal Arts
80,COMPOSITION AND RHETORIC,Humanities & Liberal Arts


### Instructions

1. Write a query that returns the average ratio (__`Sample_size/Total`__) for all of the majors.
- You'll need to cast both columns to the float type.
- Use the alias __`avg_ratio`__ for the average ratio.

In [7]:
q1 = ''' SELECT AVG(CAST(Sample_size as float) / CAST(Total as float))
    AS avg_ratio 
    FROM recent_grads
    '''
pd.read_sql_query(q1, conn)

Unnamed: 0,avg_ratio
0,0.009086


### Instructions

Write a query that:
- selects the __`Major`__, __`Major_category`__, and the computed __`ratio`__ columns
- filters to just the rows where ratio is greater than __`avg_ratio`__

In [8]:
q1 = ''' SELECT Major, Major_category, cast(Sample_size as float)/cast(Total as float) AS ratio
    FROM recent_grads
    WHERE ratio > (select AVG(cast(Sample_size as float)/cast(Total as float)) as avg_ratio
    from recent_grads)
    '''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major,Major_category,ratio
0,PETROLEUM ENGINEERING,Engineering,0.015391
1,MINING AND MINERAL ENGINEERING,Engineering,0.009259
2,NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering,0.012719
3,ACTUARIAL SCIENCE,Business,0.013503
4,MECHANICAL ENGINEERING,Engineering,0.011280
...,...,...,...
77,THEOLOGY AND RELIGIOUS VOCATIONS,Humanities & Liberal Arts,0.010263
78,STUDIO ARTS,Arts,0.010720
79,COSMETOLOGY SERVICES AND CULINARY ARTS,Industrial Arts & Consumer Services,0.011132
80,MISCELLANEOUS AGRICULTURE,Agriculture & Natural Resources,0.016129


In [9]:
#Closing a sqlite3 connection
conn.close()