# SQL | Group Summary Statistics
--- 
## Concepts

- The __`GROUP BY`__ clause to compute summary statistics by group.
- The __`HAVING`__ clause filters on the virtual column that GROUP BY generates.
- __`WHERE`__ filters results before the aggregation, whereas HAVING filters after aggregation.
- The __`ROUND`__ function rounds the results to desired decimal places.
- __`PRAGMA TABLE_INFO()`__ returns the type, along with other information for each column.
- The __`CAST`__ function in SQL converts data from one data type to another. For example, we can use the __`CAST `__ function to convert numeric data into character string data.

In [1]:
# importing libraries and database

import sqlite3
import pandas as pd

conn = sqlite3.connect("jobs.db")

In [2]:
q1 = '''SELECT * FROM sqlite_master WHERE type='table' '''
pd.read_sql_query(q1, conn)

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,recent_grads,recent_grads,2,"CREATE TABLE ""recent_grads"" (\n""index"" INTEGER..."


---
### Instructions
1. Use the __`SELECT`__ statement to select the following columns and aggregates in a query: __`Major_category, AVG(ShareWomen)`__
2. Use the __`GROUP BY`__ statement to group the query by the `Major_category` column.

In [3]:
q1 = ''' SELECT Major_category, AVG(ShareWomen) 
    FROM recent_grads
    GROUP BY Major_category '''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major_category,AVG(ShareWomen)
0,Agriculture & Natural Resources,0.617938
1,Arts,0.561851
2,Biology & Life Science,0.584518
3,Business,0.405063
4,Communications & Journalism,0.643835
5,Computers & Mathematics,0.512752
6,Education,0.674986
7,Engineering,0.257158
8,Health,0.616857
9,Humanities & Liberal Arts,0.676193


### Instructions

1. For each major category, find the percentage of graduates who are employed.
    - Use the __`SELECT`__ statement to select the following columns and aggregates in your query: __`Major_category, AVG(Employed) / AVG(Total)asshare_employed`__
    - Use the __`GROUP BY`__ statement to group the query by the __`Major_category`__ column.

In [4]:
q1 = ''' SELECT Major_category, 
    AVG(Employed) / AVG(Total) AS share_employed 
    FROM recent_grads
    GROUP BY Major_category '''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major_category,share_employed
0,Agriculture & Natural Resources,0.836986
1,Arts,0.806748
2,Biology & Life Science,0.667157
3,Business,0.835966
4,Communications & Journalism,0.842229
5,Computers & Mathematics,0.795611
6,Education,0.85819
7,Engineering,0.781967
8,Health,0.803374
9,Humanities & Liberal Arts,0.762638


### Instructions

1. Find all of the major categories where the share of graduates with low-wage jobs is greater than __`.1`__.
- Use the __`SELECT`__ statement to select the following columns and aggregates in a query: __`Major_category, AVG(Low_wage_jobs) / AVG(Total) as share_low_wage`__
- Use the __`GROUP BY`__ statement to group the query by the __`Major_category`__ column.
- Use the __`HAVING`__ statement to restrict the selection to rows where __`share_low_wage`__ is greater than __`.1`__.

In [5]:
q1 = ''' SELECT Major_category, 
    AVG(Low_wage_jobs) / AVG(Total) AS share_low_wage 
    FROM recent_grads
    GROUP BY Major_category
    HAVING share_low_wage > 0.1 '''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major_category,share_low_wage
0,Arts,0.168331
1,Communications & Journalism,0.126324
2,Humanities & Liberal Arts,0.132087
3,Industrial Arts & Consumer Services,0.115713
4,Law & Public Policy,0.115685
5,Psychology & Social Work,0.116934
6,Social Science,0.102233


### Instructions

1. Write a SQL query that returns the following columns of __`recent_grads`__ (in the same order): __`ShareWomen`__ rounded to __`4`__ decimal places, __`Major_category`__
2. Limit the results to __`10`__ rows.

In [6]:
q1 = ''' SELECT ROUND(ShareWomen, 4), Major_category  
    FROM recent_grads LIMIT 10'''
pd.read_sql_query(q1, conn)

Unnamed: 0,"ROUND(ShareWomen, 4)",Major_category
0,0.1206,Engineering
1,0.1019,Engineering
2,0.153,Engineering
3,0.1073,Engineering
4,0.3416,Engineering
5,0.145,Engineering
6,0.5357,Business
7,0.4414,Physical Sciences
8,0.1398,Engineering
9,0.4378,Engineering


### Instructions

- Use the `SELECT` statement to select the following columns and aggregates in a query: <br>
`Major_category, AVG(College_jobs) / AVG(Total)` as `share_degree_jobs`
- Use the `ROUND` function to round share_degree_jobs to `3` decimal places.
- Group the query by the `Major_category column`.
- Only select rows where `share_degree_jobs` is less than `.3`.

In [7]:
q1 = ''' SELECT Major_category, 
    ROUND(AVG(College_jobs) / AVG(Total), 3) AS share_degree_jobs 
    FROM recent_grads
    GROUP BY Major_category
    HAVING share_degree_jobs < 0.3 '''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major_category,share_degree_jobs
0,Agriculture & Natural Resources,0.248
1,Arts,0.265
2,Business,0.114
3,Communications & Journalism,0.22
4,Humanities & Liberal Arts,0.27
5,Industrial Arts & Consumer Services,0.249
6,Law & Public Policy,0.163
7,Social Science,0.215


### Instructions

1. Write a query that
    - Divides the sum of the __`Women` column by the sum of the __`Total`__ column, aliased as __`SW`__.
    - Group the results by __`Major_category`__ and order by __`SW`__.
    - Only contains the __`Major_category`__ and __`SW`__ columns, in that order.
    

In [8]:
q1 = ''' SELECT Major_category, 
    SUM(CAST(Women AS Float)) / SUM(CAST(Total AS Float)) AS SW  
    FROM recent_grads
    GROUP BY Major_category
    ORDER BY SW '''
pd.read_sql_query(q1, conn)

Unnamed: 0,Major_category,SW
0,Law & Public Policy,0.030585
1,Business,0.084743
2,Industrial Arts & Consumer Services,0.160249
3,Computers & Mathematics,0.209356
4,Engineering,0.219596
5,Communications & Journalism,0.250325
6,Arts,0.393327
7,Humanities & Liberal Arts,0.490051
8,Health,0.673588
9,Interdisciplinary,0.800911


In [9]:
#Closing a sqlite3 connection
conn.close()