# SQL - Intro P2

In [1]:
import pandas as pd
import sqlite3

conn = sqlite3.connect('../data/mtcars.sqlite')
df = pd.read_sql_query("SELECT * FROM results", conn)
df

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504.0,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693.0,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436.0,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433.0,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449.0,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
392,27.0,4,140.0,86.0,2790.0,15.6,82,1,ford mustang gl
393,44.0,4,97.0,52.0,2130.0,24.6,82,2,vw pickup
394,32.0,4,135.0,84.0,2295.0,11.6,82,1,dodge rampage
395,28.0,4,120.0,79.0,2625.0,18.6,82,1,ford ranger


## GROUP BY

GROUP BY can be used to summarize values in a table (sum, average, count, etc.). In order to use GROUP BY correctly, the SELECT statement should contain the columns for which you want to display aggregated data, a column that needs to be transformed and indicate what transformation needs to be applied. The rest of your query would follow as standard, i.e. you indicate FROM which table you need to retrieve data.

If there are any filters you need to apply, you specify them in the WHERE clause. Finally, you add GROUP BY which should contain one or more variables separated by commas for which you are grouping the data.

<code> SELECT var1, var2, sum(var3) as sum_var3 FROM table GROUP BY var1, var2 </code>

Keep in mind, that the below syntax will return an error or incorrect results

<code> SELECT var1, var2, sum(var3) as sum_var3 FROM table GROUP BY var1 </code> **var2 needs to be in the GROUP BY**

In [2]:
pd.read_sql_query("""SELECT cylinders, SUM(weight) AS agg_weight 
                  FROM results""", conn)

Unnamed: 0,cylinders,agg_weight
0,8,1179194.0


In [3]:
pd.read_sql_query("""SELECT cylinders, SUM(weight) AS agg_weight 
                  FROM results
                  GROUP BY cylinders""", conn)

Unnamed: 0,cylinders,agg_weight
0,3,9594.0
1,4,467823.0
2,5,9310.0
3,6,268651.0
4,8,423816.0


What if I want to select cylinders  where agg_weight is at least 10,000?

In [4]:
pd.read_sql_query("""SELECT cylinders, SUM(weight) AS agg_weight 
                  FROM results
                  WHERE weight >= 10000
                  GROUP BY cylinders""", conn)

Unnamed: 0,cylinders,agg_weight


In [6]:
# # This will break!

# pd.read_sql_query("""SELECT cylinders, SUM(weight) AS agg_weight 
#                   FROM results
#                   WHERE agg_weight >= 10000
#                   GROUP BY cylinders""", conn)

## HAVING

HAVING allows you to filter aggregated results which the WHERE keyword doesn't support. HAVING should be included **AFTER** GROUP BY.



In [7]:
pd.read_sql_query("""SELECT cylinders, SUM(weight) AS agg_weight 
                  FROM results
                  GROUP BY cylinders
                  HAVING SUM(weight) >= 5000""", conn)

Unnamed: 0,cylinders,agg_weight
0,3,9594.0
1,4,467823.0
2,5,9310.0
3,6,268651.0
4,8,423816.0


## ORDER BY
If you want to sort the order of the SELECT statement, you can use ORDER BY. This doesn't change the order of records in a table, and only affects the output of the statement. You can sort in descending order by adding desc key word after the name of the variable you want to sort. The default order is ascending. You can also sort by multiple variables; you just need to separate them by commas.

In [8]:
pd.read_sql_query("""SELECT *
                  FROM results
                  ORDER BY cylinders""", conn)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,19.0,3,70.0,97.0,2330.0,13.5,72,3,mazda rx2 coupe
1,18.0,3,70.0,90.0,2124.0,13.5,73,3,maxda rx3
2,21.5,3,80.0,110.0,2720.0,13.5,77,3,mazda rx-4
3,23.7,3,70.0,100.0,2420.0,12.5,80,3,mazda rx-7 gs
4,24.0,4,113.0,95.0,2372.0,15.0,70,3,toyota corona mark ii
...,...,...,...,...,...,...,...,...,...
392,19.2,8,267.0,125.0,3605.0,15.0,79,1,chevrolet malibu classic (sw)
393,18.5,8,360.0,150.0,3940.0,13.0,79,1,chrysler lebaron town @ country (sw)
394,23.0,8,350.0,125.0,3900.0,17.4,79,1,cadillac eldorado
395,23.9,8,260.0,90.0,3420.0,22.2,79,1,oldsmobile cutlass salon brougham


In [9]:
pd.read_sql_query("""SELECT *
                  FROM results
                  ORDER BY cylinders DESC""", conn)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504.0,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693.0,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436.0,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433.0,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449.0,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
392,31.0,4,119.0,82.0,2720.0,19.4,82,1,chevy s-10
393,19.0,3,70.0,97.0,2330.0,13.5,72,3,mazda rx2 coupe
394,18.0,3,70.0,90.0,2124.0,13.5,73,3,maxda rx3
395,21.5,3,80.0,110.0,2720.0,13.5,77,3,mazda rx-4


In [10]:
pd.read_sql_query("""SELECT *
                  FROM results
                  ORDER BY cylinders DESC, year DESC""", conn)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,26.6,8,350.0,105.0,3725.0,19.0,81,1,oldsmobile cutlass ls
1,17.0,8,305.0,130.0,3840.0,15.4,79,1,chevrolet caprice classic
2,17.6,8,302.0,129.0,3725.0,13.4,79,1,ford ltd landau
3,16.5,8,351.0,138.0,3955.0,13.2,79,1,mercury grand marquis
4,18.2,8,318.0,135.0,3830.0,15.2,79,1,dodge st. regis
...,...,...,...,...,...,...,...,...,...
392,26.0,4,121.0,113.0,2234.0,12.5,70,2,bmw 2002
393,23.7,3,70.0,100.0,2420.0,12.5,80,3,mazda rx-7 gs
394,21.5,3,80.0,110.0,2720.0,13.5,77,3,mazda rx-4
395,18.0,3,70.0,90.0,2124.0,13.5,73,3,maxda rx3
