# SQL | basics
---
- `SELECT * FROM db LIMIT x`
- `SELECT * FROM db WHERE column > < = value LIMIT x`
- `AND`, `OR` 
- `ORDER BY`, `DESC`
- `COUNT()`, `MIN()`, `MAX()`, `SUM()`, `AVG()` 
- `SELECT column AS new_column_name`

In [1]:
# importing libraries and database

import sqlite3
import pandas as pd

conn = sqlite3.connect("jobs.db")

In [2]:
q1 = "SELECT * FROM sqlite_master WHERE type='table'"
pd.read_sql_query(q1, conn)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,recent_grads,recent_grads,2,"CREATE TABLE ""recent_grads"" (\n""index"" INTEGER..."


---
### Instructions

Write a SQL query that returns:

- All majors with majority `female` and All majors had a median salary greater than `50000`.

- Only include the following columns in the results and in this order:

    - `Major`
    - `Major_category`
    - `Median`
    - `ShareWomen`

In [3]:
q1 = '''SELECT Major, Major_category, Median, ShareWomen 
    FROM recent_grads
    WHERE ShareWomen > 0.5 AND Median > 50000'''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major,Major_category,Median,ShareWomen
0,ACTUARIAL SCIENCE,Business,62000,0.535714
1,COMPUTER SCIENCE,Computers & Mathematics,53000,0.578766


---

### Instructions

- Write a SQL query that returns the first 20 majors that either:

    - Have a `Median` salary greater than or equal to `10,000`, `or`
    - Have less than or equal to `1,000` `Unemployed` people

- Only include the following columns in the results and in this order:

    - `Major`
    - `Median`
    - `Unemployed`

In [4]:
q1 = '''SELECT Major, Median, Unemployed 
    FROM recent_grads
    WHERE Median >= 10000 OR Unemployed <= 1000 
    LIMIT 20'''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major,Median,Unemployed
0,PETROLEUM ENGINEERING,110000,37
1,MINING AND MINERAL ENGINEERING,75000,85
2,METALLURGICAL ENGINEERING,73000,16
3,NAVAL ARCHITECTURE AND MARINE ENGINEERING,70000,40
4,CHEMICAL ENGINEERING,65000,1672
5,NUCLEAR ENGINEERING,65000,400
6,ACTUARIAL SCIENCE,62000,308
7,ASTRONOMY AND ASTROPHYSICS,62000,33
8,MECHANICAL ENGINEERING,60000,4650
9,ELECTRICAL ENGINEERING,60000,3895


---
## Instructions

- Run the query we explored above, which returns all majors that:

    - Fell under the category of __`Engineering` and either__

        - Had mostly women graduates
        - Or had an unemployment rate below `5.1%`, which was the rate in August 2015
- Only include the following columns in the results and in this order:

    - `Major`
    - `Major_category`
    - `ShareWomen`
    - `Unemployment_rate`

In [5]:
q1 = '''SELECT Major, Major_category, ShareWomen, Unemployment_rate 
    FROM recent_grads
    WHERE Major_category = 'Engineering' 
    AND (ShareWomen > 0.5 OR Unemployment_rate < 0.051)'''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major,Major_category,ShareWomen,Unemployment_rate
0,PETROLEUM ENGINEERING,Engineering,0.120564,0.018381
1,METALLURGICAL ENGINEERING,Engineering,0.153037,0.024096
2,NAVAL ARCHITECTURE AND MARINE ENGINEERING,Engineering,0.107313,0.050125
3,MATERIALS SCIENCE,Engineering,0.31082,0.023043
4,ENGINEERING MECHANICS PHYSICS AND SCIENCE,Engineering,0.183985,0.006334
5,INDUSTRIAL AND MANUFACTURING ENGINEERING,Engineering,0.343473,0.042876
6,MATERIALS ENGINEERING AND MATERIALS SCIENCE,Engineering,0.292607,0.027789
7,ENVIRONMENTAL ENGINEERING,Engineering,0.558548,0.093589
8,INDUSTRIAL PRODUCTION TECHNOLOGIES,Engineering,0.750473,0.028308
9,ENGINEERING AND INDUSTRIAL MANAGEMENT,Engineering,0.174123,0.033652


---
### Instructions

- Write a query that returns all majors where:

    - `ShareWomen` is greater than `0.3`
    - And `Unemployment_rate` is less than `.1`

- Only include the following columns in the results and in this order:
    - `Major`,
    - `ShareWomen`,
    - `Unemployment_rate`
- Order the results in descending order by the `ShareWomen` column.

In [6]:
q1 = '''SELECT Major, ShareWomen, Unemployment_rate 
    FROM recent_grads
    WHERE ShareWomen > 0.3 AND Unemployment_rate < 0.1
    ORDER BY ShareWomen DESC'''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major,ShareWomen,Unemployment_rate
0,EARLY CHILDHOOD EDUCATION,0.967998,0.040105
1,MATHEMATICS AND COMPUTER SCIENCE,0.927807,0.000000
2,ELEMENTARY EDUCATION,0.923745,0.046586
3,ANIMAL SCIENCES,0.910933,0.050862
4,PHYSIOLOGY,0.906677,0.069163
...,...,...,...
116,OPERATIONS LOGISTICS AND E-COMMERCE,0.322222,0.047859
117,TRANSPORTATION SCIENCES AND TECHNOLOGIES,0.321296,0.072725
118,BIOLOGICAL ENGINEERING,0.320784,0.087143
119,MATERIALS SCIENCE,0.310820,0.023043


---
### Instructions

- Write a query that returns the `Engineering` or `Physical Sciences` majors in ascending order of unemployment rates.
- The results should only contain the `Major_category`, `Major`, and `Unemployment_rate` columns.

In [7]:
q1 = '''SELECT Major_category, Major, Unemployment_rate 
    FROM recent_grads
    WHERE Major_category = 'Engineering' OR Major_category = 'Physical Sciences'
    ORDER BY Unemployment_rate
    LIMIT 7'''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major_category,Major,Unemployment_rate
0,Engineering,ENGINEERING MECHANICS PHYSICS AND SCIENCE,0.006334
1,Engineering,PETROLEUM ENGINEERING,0.018381
2,Physical Sciences,ASTRONOMY AND ASTROPHYSICS,0.021167
3,Physical Sciences,ATMOSPHERIC SCIENCES AND METEOROLOGY,0.022229
4,Engineering,MATERIALS SCIENCE,0.023043
5,Engineering,METALLURGICAL ENGINEERING,0.024096
6,Physical Sciences,GEOSCIENCES,0.024374


### Instructions

- Write a query that computes the average of the `Total` column, the minimum of the `Men` column, and the maximum of the `Women` column, in that specific order.
- Make sure that all of the aggregate functions are capitalized (`SUM()` not `sum()`, etc), so our results match yours.

In [8]:
q1 = '''SELECT AVG(Total), MIN(Men), MAX(Women) FROM recent_grads'''
pd.read_sql_query(q1, conn)

Unnamed: 0,AVG(Total),MIN(Men),MAX(Women)
0,39167.716763,119,307087


### Instructions

- Write a query that returns, in the following order:
    - the number of rows as `Number of Students`
    - the maximum value of `Unemployment_rate` as `Highest Unemployment Rate`

In [9]:
q1 = '''SELECT COUNT(*) AS 'Number of Students', 
    MAX(Unemployment_rate) AS 'Highest Unemployment Rate' 
    FROM recent_grads'''
pd.read_sql_query(q1, conn)

Unnamed: 0,Number of Students,Highest Unemployment Rate
0,173,0.177226


### Instructions

- Write a query that returns the number of unique values in the `Major`, `Major_category`, and `Major_code` columns. Use the following aliases in the following order:
    - For the unique value count of the `Major column`, use the alias `unique_majors`.
    - For the unique value count of the `Major_category` column, use the alias `unique_major_categories`.
    - For the unique value count of the `Major_code` column, use the alias `unique_major_codes`.

In [10]:
q1 = '''SELECT COUNT(DISTINCT(Major)) unique_majors, 
    COUNT(DISTINCT(Major_category)) unique_major_categories, 
    COUNT(DISTINCT(Major_code)) unique_major_codes 
    FROM recent_grads'''
pd.read_sql_query(q1, conn)

Unnamed: 0,unique_majors,unique_major_categories,unique_major_codes
0,173,16,173


## Instructions

- Write a query that computes the difference between the `25th` and `75th` percentile of salaries for all majors.
    - Return the `Major` column first, using the default column name.
    - Return the `Major_category` column second, using the default column name.
    - Return the compute difference between the `25th` and `75th` percentile third, using the alias `quartile_spread`.
    - Order the results from lowest to highest and only return the first `20` results.

In [11]:
q1 = '''SELECT Major, 
    Major_category, 
    P75th - P25th quartile_spread 
    FROM recent_grads
    ORDER BY quartile_spread 
    LIMIT 20'''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major,Major_category,quartile_spread
0,MILITARY TECHNOLOGIES,Industrial Arts & Consumer Services,0
1,SCHOOL STUDENT COUNSELING,Education,2000
2,LIBRARY SCIENCE,Education,2000
3,COURT REPORTING,Law & Public Policy,4000
4,PHARMACOLOGY,Biology & Life Science,5000
5,EDUCATIONAL ADMINISTRATION AND SUPERVISION,Education,6000
6,COUNSELING PSYCHOLOGY,Psychology & Social Work,6800
7,SPECIAL NEEDS EDUCATION,Education,10000
8,MATHEMATICS TEACHER EDUCATION,Education,10000
9,SOCIAL WORK,Psychology & Social Work,10000


In [12]:
#Closing a sqlite3 connection
conn.close()