# Introduction to SQL

[SQL](https://en.wikipedia.org/wiki/SQL) stands for Structured Query Language (you can pronounce it as 'sequel' or 'ess-cue-elle'). It allows you to retrieve data stored in relational databases and manipulate it in a variety of ways.

If a concept of a database is new to you, you can think of it as a collection of tables. For instance, you can draw a comparison to an Excel workbook with multiple worksheets where each worksheet is somehow related to another. Worksheets in this comparison are SQL tables and the entire workbook would be a database. The key is to have some sort of relationship between the tables that allows you to bring all that data together (this is just an analogy and Excel workbook is not a database!).

Each table in SQL consists of rows and columns where columns represent different data attributes and rows represent observations or data records. Tables can contain different types of data but the main types are integer, decimal, character or string, date, and time.

There are many DB systems available like MySQL, PostgreSQL, Oracle Database, MS SQL Server, Amazon Redshift, etc. However, all of them speak SQL, so once you've got the hang of the basic SQL syntax you'll be able to work with any of them.

To keep things simple, we will be using a database system called [SQLite](https://www.sqlite.org/about.html) to practice SQL syntax.
Unlike most other SQL databases, SQLite does not have a separate server. SQLite reads and writes directly to ordinary files on your computer. A complete SQL database with multiple tables, etc., is contained in a single file.

### Resources

Since SQL is the main database language used worldwide, there are plenty of resources to master it:

- https://sqlzoo.net/

- https://sqlbolt.com

- https://leetcode.com

- https://www.sql-ex.ru/learn_exercises.php

# Using Raw SQL cursor connections

In [1]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('../data/mtcars.sqlite') # create a connection
c = conn.cursor() # create a cursor (a mechanism that enables traversal over the records in a database)

### SELECT

SELECT is the most important command as it allows you to retrieve data from a table. SELECT on itself won't do anything, so you need to specify what data you want to retrieve and from where.

You can retrieve data from a table, so you need to specify a table name in the SELECT statement. SELECT statement refers to the columns part of a table and not the rows, so you also need to list columns. Here are a few things to remember:

- To select a column, you will simply need to type its name
- You can select all columns by typing their name or you can use * to select all columns without typing their names
- You can select as many or as few columns from a table as you want
- You can even create new columns in the SELECT statement


You can even create new columns in the SELECT statement. SQL supports mathematical operations and has a variety of built-in functions https://www.sqlitetutorial.net/


In [2]:
# OPTION 1
# we can use sqlite3 directly
cursor = c.execute(f"SELECT * FROM results")
row = cursor.fetchall()

In [3]:
row[0:2] # gets data back as a list of tuples

[(18.0, 8, 307.0, 130.0, 3504.0, 12.0, 70, 1, 'chevrolet chevelle malibu'),
 (15.0, 8, 350.0, 165.0, 3693.0, 11.5, 70, 1, 'buick skylark 320')]

In [4]:
# get column names
column_names = list(map(lambda x: x[0], cursor.description))

print(column_names)

['mpg', 'cylinders', 'displacement', 'horsepower', 'weight', 'acceleration', 'year', 'origin', 'name']


In [5]:
# OPTION 2
# we can also use pandas
df = pd.read_sql_query("SELECT * FROM results", conn) # pass query & connection
df.head()

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504.0,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693.0,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436.0,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433.0,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449.0,10.5,70,1,ford torino


In [6]:
# select only mpg & cylinders
pd.read_sql_query("SELECT mpg, cylinders, name FROM results", conn)

Unnamed: 0,mpg,cylinders,name
0,18.0,8,chevrolet chevelle malibu
1,15.0,8,buick skylark 320
2,18.0,8,plymouth satellite
3,16.0,8,amc rebel sst
4,17.0,8,ford torino
...,...,...,...
392,27.0,4,ford mustang gl
393,44.0,4,vw pickup
394,32.0,4,dodge rampage
395,28.0,4,ford ranger


## WHERE

There will be cases where you don't want all of the observations to be returned; that's when you use the WHERE clause. It is typically specified after a table name (unless you do a join, more on this to come). This is a typical syntax <code> WHERE column_name operator value </code>


- Mathematical comparisons with the following operators =, <,>, <=, >=, <> . Note that <> means not equal. =, <> can be used for strings and numbers
- To search for a pattern in a string, you can use LIKE operator
- You can use IN operator to specify multiple values in a column
- You can have multiple filters in the WHERE clause separated by AND or OR. Don't forget to specify column name for every filter you pass in the WHERE clause


In [7]:
pd.read_sql_query("""SELECT * 
                        FROM results
                        WHERE mpg < 20 AND cylinders in (6,8) AND year <> 70 AND
                              name LIKE '%ford%'
                   """, conn) # I like to use """ instead of ' or "

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,19.0,6,250.0,88.0,3302.0,15.5,71,1,ford torino 500
1,14.0,8,351.0,153.0,4154.0,13.5,71,1,ford galaxie 500
2,13.0,8,400.0,170.0,4746.0,12.0,71,1,ford country squire (sw)
3,18.0,6,250.0,88.0,3139.0,14.5,71,1,ford mustang
4,14.0,8,351.0,153.0,4129.0,13.0,72,1,ford galaxie 500
5,13.0,8,302.0,140.0,4294.0,16.0,72,1,ford gran torino (sw)
6,14.0,8,302.0,137.0,4042.0,14.5,73,1,ford gran torino
7,13.0,8,351.0,158.0,4363.0,13.0,73,1,ford ltd
8,18.0,6,250.0,88.0,3021.0,16.5,73,1,ford maverick
9,12.0,8,400.0,167.0,4906.0,12.5,73,1,ford country


In [8]:
pd.read_sql_query("""SELECT * 
                        FROM results
                        WHERE cylinders = 6 OR 8
                   """, conn) #Notice it doesn't work as expected

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504.0,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693.0,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436.0,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433.0,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449.0,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
392,27.0,4,140.0,86.0,2790.0,15.6,82,1,ford mustang gl
393,44.0,4,97.0,52.0,2130.0,24.6,82,2,vw pickup
394,32.0,4,135.0,84.0,2295.0,11.6,82,1,dodge rampage
395,28.0,4,120.0,79.0,2625.0,18.6,82,1,ford ranger


In [9]:
pd.read_sql_query("""SELECT * 
                        FROM results
                        WHERE cylinders = 6 OR cylinders = 8
                   """, conn) # We needs to be specific. 

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504.0,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693.0,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436.0,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433.0,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449.0,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
182,20.2,6,200.0,88.0,3060.0,17.1,81,1,ford granada gl
183,17.6,6,225.0,85.0,3465.0,16.6,81,1,chrysler lebaron salon
184,25.0,6,181.0,110.0,2945.0,16.4,82,1,buick century limited
185,38.0,6,262.0,85.0,3015.0,17.0,82,1,oldsmobile cutlass ciera (diesel)


In [10]:
pd.read_sql_query("""SELECT * 
                        FROM results
                        WHERE cylinders = 6 OR cylinders = 8 AND year <> 70
                   """, conn)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,22.0,6,198.0,95.0,2833.0,15.5,70,1,plymouth duster
1,18.0,6,199.0,97.0,2774.0,15.5,70,1,amc hornet
2,21.0,6,200.0,85.0,2587.0,16.0,70,1,ford maverick
3,21.0,6,199.0,90.0,2648.0,15.0,70,1,amc gremlin
4,19.0,6,232.0,100.0,2634.0,13.0,71,1,amc gremlin
...,...,...,...,...,...,...,...,...,...
164,20.2,6,200.0,88.0,3060.0,17.1,81,1,ford granada gl
165,17.6,6,225.0,85.0,3465.0,16.6,81,1,chrysler lebaron salon
166,25.0,6,181.0,110.0,2945.0,16.4,82,1,buick century limited
167,38.0,6,262.0,85.0,3015.0,17.0,82,1,oldsmobile cutlass ciera (diesel)


## CASE WHEN

You can use conditional logic to create variables by following this syntax.

<code> CASE WHEN condition THEN result1 ELSE result2 END AS new_variable </code>

You can also have multiple conditions.

<code> CASE WHEN condition1 THEN result1 WHEN condition2 THEN result2 ELSE result3 END AS new_variable  </code>

In [11]:
pd.read_sql_query("""
                    SELECT *,
                    CASE WHEN mpg < 20 THEN '<20' ELSE '20+' END AS mpg_gp
                    FROM results""", conn)

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name,mpg_gp
0,18.0,8,307.0,130.0,3504.0,12.0,70,1,chevrolet chevelle malibu,<20
1,15.0,8,350.0,165.0,3693.0,11.5,70,1,buick skylark 320,<20
2,18.0,8,318.0,150.0,3436.0,11.0,70,1,plymouth satellite,<20
3,16.0,8,304.0,150.0,3433.0,12.0,70,1,amc rebel sst,<20
4,17.0,8,302.0,140.0,3449.0,10.5,70,1,ford torino,<20
...,...,...,...,...,...,...,...,...,...,...
392,27.0,4,140.0,86.0,2790.0,15.6,82,1,ford mustang gl,20+
393,44.0,4,97.0,52.0,2130.0,24.6,82,2,vw pickup,20+
394,32.0,4,135.0,84.0,2295.0,11.6,82,1,dodge rampage,20+
395,28.0,4,120.0,79.0,2625.0,18.6,82,1,ford ranger,20+


## SQL TABLE

### CREATE
The CREATE TABLE command creates a new table in your database. You can either create temporary or permanent tables. You can also create a table based on an existing table or you can create an empty table and populate it. 

Temporary table syntax varies from one DB system to another. We'll look at SQLite syntax, but here is an example of MS SQL Server syntax:

<code> SELECT * INTO #temp_tbl_nm FROM existing_tbl </code>

SQLite syntax:

<code> CREATE *TEMPORARY* TABLE temp_table AS SELECT * FROM existing_tbl </code>

Syntax to create an empty table:

<code> CREATE TABLE new_table ( column1 int, column2 varchar(255)) </code>

### INSERT INTO

To add rows into a table, we can use the INSERT INTO statement. 

<code> INSERT INTO tbl_name (column1, column2) VALUES (1, 'Canada') </code>

### ALTER
If we want to mofidy a column in a table (add, delete, change data type, etc.), we can use the ALTER TABLE command.

<code> ALTER TABLE new_table ADD column3 varchar(5) </code>

### DROP 

The DROP TABLE command deletes a table in a database. It can be applied to both permanent and temporary tables.

<code> DROP TABLE tbl_name </code>


## Making a new table in Python

You can create a new SQL table from an existing data frame in Python

In [12]:
df

Unnamed: 0,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,18.0,8,307.0,130.0,3504.0,12.0,70,1,chevrolet chevelle malibu
1,15.0,8,350.0,165.0,3693.0,11.5,70,1,buick skylark 320
2,18.0,8,318.0,150.0,3436.0,11.0,70,1,plymouth satellite
3,16.0,8,304.0,150.0,3433.0,12.0,70,1,amc rebel sst
4,17.0,8,302.0,140.0,3449.0,10.5,70,1,ford torino
...,...,...,...,...,...,...,...,...,...
392,27.0,4,140.0,86.0,2790.0,15.6,82,1,ford mustang gl
393,44.0,4,97.0,52.0,2130.0,24.6,82,2,vw pickup
394,32.0,4,135.0,84.0,2295.0,11.6,82,1,dodge rampage
395,28.0,4,120.0,79.0,2625.0,18.6,82,1,ford ranger


In [13]:
df.to_sql(
    name="test_output",
    con=conn, 
    schema=None, 
    if_exists='replace', 
    index=True, 
)

397

In [14]:
pd.read_sql("SELECT * FROM test_output LIMIT 10", con=conn)
# limit 10 works just like .head(10)
# in MS SQL Server, SELECT top 10 * FROM test_output
# in Oracle, SELECT * FROM test_output WHERE rownum <=10

Unnamed: 0,index,mpg,cylinders,displacement,horsepower,weight,acceleration,year,origin,name
0,0,18.0,8,307.0,130.0,3504.0,12.0,70,1,chevrolet chevelle malibu
1,1,15.0,8,350.0,165.0,3693.0,11.5,70,1,buick skylark 320
2,2,18.0,8,318.0,150.0,3436.0,11.0,70,1,plymouth satellite
3,3,16.0,8,304.0,150.0,3433.0,12.0,70,1,amc rebel sst
4,4,17.0,8,302.0,140.0,3449.0,10.5,70,1,ford torino
5,5,15.0,8,429.0,198.0,4341.0,10.0,70,1,ford galaxie 500
6,6,14.0,8,454.0,220.0,4354.0,9.0,70,1,chevrolet impala
7,7,14.0,8,440.0,215.0,4312.0,8.5,70,1,plymouth fury iii
8,8,14.0,8,455.0,225.0,4425.0,10.0,70,1,pontiac catalina
9,9,15.0,8,390.0,190.0,3850.0,8.5,70,1,amc ambassador dpl


### List  tables in a database

Table and index names can be listed by doing a **SELECT** on a special table named "***SQLITE_MASTER***". Every SQLite database has an SQLITE_MASTER table that defines the schema for the database. For tables, the ***type*** field will always be '***table***' and the name field will be the name of the table. So to get a list of all tables in the database, use the following SELECT command:

See more at https://www.sqlite.org/faq.html#q7.

In [15]:
pd.read_sql("SELECT name FROM sqlite_master WHERE type='table'", con=conn)

Unnamed: 0,name
0,results
1,auto_2
2,test_2
3,song_db
4,test_output


# Making a new DB

Is as simple as connecting to a new file name

In [16]:
song = pd.read_csv('../data/song_data.csv')

In [17]:
conn = sqlite3.connect('song.sqlite')
song.to_sql('song_data',con=conn,index=False,if_exists='replace')
    

18835

In [18]:
pd.read_sql("SELECT name FROM sqlite_master WHERE type='table'",conn)

Unnamed: 0,name
0,song_data


In [19]:
pd.read_sql_query("""SELECT * FROM song_data""",conn)

Unnamed: 0,song_name,song_popularity,song_duration_ms,acousticness,danceability,energy,instrumentalness,key,liveness,loudness,audio_mode,speechiness,tempo,time_signature,audio_valence
0,Boulevard of Broken Dreams,73,262333,0.005520,0.496,0.682,0.000029,8,0.0589,-4.095,1,0.0294,167.060,4,0.474
1,In The End,66,216933,0.010300,0.542,0.853,0.000000,3,0.1080,-6.407,0,0.0498,105.256,4,0.370
2,Seven Nation Army,76,231733,0.008170,0.737,0.463,0.447000,0,0.2550,-7.828,1,0.0792,123.881,4,0.324
3,By The Way,74,216933,0.026400,0.451,0.970,0.003550,0,0.1020,-4.938,1,0.1070,122.444,4,0.198
4,How You Remind Me,56,223826,0.000954,0.447,0.766,0.000000,10,0.1130,-5.065,1,0.0313,172.011,4,0.574
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
18830,Let It Breathe,60,159645,0.893000,0.500,0.151,0.000065,11,0.1110,-16.107,1,0.0348,113.969,4,0.300
18831,Answers,60,205666,0.765000,0.495,0.161,0.000001,11,0.1050,-14.078,0,0.0301,94.286,4,0.265
18832,Sudden Love (Acoustic),23,182211,0.847000,0.719,0.325,0.000000,0,0.1250,-12.222,1,0.0355,130.534,4,0.286
18833,Gentle on My Mind,55,352280,0.945000,0.488,0.326,0.015700,3,0.1190,-12.020,1,0.0328,106.063,4,0.323
