# SQL in Python
### Packages

 - [Pandas.read_sql](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_sql.html)
 - [SQLite3](https://docs.python.org/3.6/library/sqlite3.html)
 
### Tutorials
- https://www.freecodecamp.org/news/connect-python-with-sql/
- https://www.tutorialspoint.com/sqlite/sqlite_python.htm
- https://www.pythoncentral.io/introduction-to-sqlite-in-python/
- https://medium.com/swlh/reproducing-sql-queries-in-python-codes-35d90f716b1a
- http://www.sqlitetutorial.net/sqlite-python/

### Create a SQL database connection to a sample SQL database, and read records from that database
Structured Query Language (SQL) is an [ANSI specified](https://docs.oracle.com/database/121/SQLRF/ap_standard_sql001.htm#SQLRF55514), powerful format for interacting with large databases efficiently. **SQLite** is a lightweight and somewhat restricted version of SQL.

In [1]:
# Imports
import sqlite3 as sq3
import pandas.io.sql as pds
import pandas as pd

### Database connections

Our first step will be to create a connection to our SQL database. A few common SQL databases used with Python include:

 - Microsoft SQL Server
 - Postgres
 - MySQL
 - AWS Redshift
 - AWS Aurora
 - Oracle DB
 - Terradata
 - Db2 Family
 - Many, many others
 
Each of these databases will require a slightly different setup, and may require credentials (username & password), tokens, or other access requirements. We'll be using `sqlite3` to connect to our database, but other connection packages include:

 - [`SQLAlchemy`](https://www.sqlalchemy.org/) (most common)
 - [`psycopg2`](http://initd.org/psycopg/)
 - [`MySQLdb`](http://mysql-python.sourceforge.net/MySQLdb.html)

## Classic Rock Database

In [2]:
# Initialize path to SQLite database
path = 'databases/classic_rock.db'
con = sq3.Connection(path)

# We now have a live connection to our SQL database

### Reading data

Now that we've got a connection to our database, we can perform queries, and load their results in as Pandas DataFrames

In [3]:
# Write the query
query = '''
SELECT * 
FROM rock_songs;
'''

# Execute the query
observations = pds.read_sql(query, con)

observations.head()

Unnamed: 0,Song,Artist,Release_Year,PlayCount
0,Caught Up in You,.38 Special,1982.0,82
1,Hold On Loosely,.38 Special,1981.0,85
2,Rockin' Into the Night,.38 Special,1980.0,18
3,Art For Arts Sake,10cc,1975.0,1
4,Kryptonite,3 Doors Down,2000.0,13


In [4]:
# We can also run any supported SQL query
# Write the query
query = '''
SELECT Artist, Release_Year, COUNT(*) AS num_songs, AVG(PlayCount) AS avg_plays  
    FROM rock_songs
    GROUP BY Artist, Release_Year
    ORDER BY num_songs desc;
'''

# Execute the query
observations = pds.read_sql(query, con)

observations.head()

Unnamed: 0,Artist,Release_Year,num_songs,avg_plays
0,The Beatles,1967.0,23,6.565217
1,Led Zeppelin,1969.0,18,21.0
2,The Beatles,1965.0,15,3.8
3,The Beatles,1968.0,13,13.0
4,The Beatles,1969.0,13,15.0


### Common parameters

There are a number of common paramters that can be used to read in SQL data with formatting:

 - **coerce_float**: Attempt to force numbers into floats
 - **parse_dates**: List of columns to parse as dates
 - **chunksize**: Number of rows to include in each chunk
 

In [5]:
query='''
SELECT Artist, Release_Year, COUNT(*) AS num_songs, AVG(PlayCount) AS avg_plays  
    FROM rock_songs
    GROUP BY Artist, Release_Year
    ORDER BY num_songs desc;
'''

# Execute the query
observations_generator = pds.read_sql(query,
                            con,
                            coerce_float=True, # Doesn't effect this dataset, because floats were correctly parsed
                            parse_dates=['Release_Year'], # Parse `Release_Year` as a date
                            chunksize=5 # Allows for streaming results as a series of shorter tables
                           )

for index, observations in enumerate(observations_generator):
    if index < 5:
        print(f'Observations index: {index}'.format(index))
        display(observations)

Observations index: 0


Unnamed: 0,Artist,Release_Year,num_songs,avg_plays
0,The Beatles,1970-01-01 00:32:47,23,6.565217
1,Led Zeppelin,1970-01-01 00:32:49,18,21.0
2,The Beatles,1970-01-01 00:32:45,15,3.8
3,The Beatles,1970-01-01 00:32:48,13,13.0
4,The Beatles,1970-01-01 00:32:49,13,15.0


Observations index: 1


Unnamed: 0,Artist,Release_Year,num_songs,avg_plays
0,Led Zeppelin,1970-01-01 00:32:50,12,13.166667
1,Led Zeppelin,1970-01-01 00:32:55,12,14.166667
2,Pink Floyd,1970-01-01 00:32:59,11,41.454545
3,Pink Floyd,1970-01-01 00:32:53,10,29.1
4,The Doors,1970-01-01 00:32:47,10,28.9


Observations index: 2


Unnamed: 0,Artist,Release_Year,num_songs,avg_plays
0,Fleetwood Mac,1970-01-01 00:32:57,9,35.666667
1,Jimi Hendrix,1970-01-01 00:32:47,9,24.888889
2,The Beatles,1970-01-01 00:32:43,9,2.444444
3,The Beatles,1970-01-01 00:32:44,9,3.111111
4,Elton John,1970-01-01 00:32:53,8,18.5


Observations index: 3


Unnamed: 0,Artist,Release_Year,num_songs,avg_plays
0,Led Zeppelin,1970-01-01 00:32:51,8,47.75
1,Led Zeppelin,1970-01-01 00:32:53,8,34.125
2,Boston,1970-01-01 00:32:56,7,69.285714
3,Rolling Stones,1970-01-01 00:32:49,7,36.142857
4,Van Halen,1970-01-01 00:32:58,7,51.142857


Observations index: 4


Unnamed: 0,Artist,Release_Year,num_songs,avg_plays
0,Bruce Springsteen,1970-01-01 00:32:55,6,7.666667
1,Bruce Springsteen,1970-01-01 00:33:04,6,11.5
2,Creedence Clearwater Revival,1970-01-01 00:32:49,6,23.833333
3,Creedence Clearwater Revival,1970-01-01 00:32:50,6,18.833333
4,Def Leppard,1970-01-01 00:33:07,6,32.0


### Baseball Database Example

In [6]:
# Create a variable, `path`, containing the path to the `baseball.db` contained in `resources/`
path = 'databases/baseball.db'

# Create a connection, `con`, that is connected to database at `path`
con = sq3.Connection(path)

In [7]:
# Create a variable, tables, which reads in all data from the table sqlite_master
all_tables = pd.read_sql('SELECT * FROM sqlite_master', con)
all_tables

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,allstarfull,allstarfull,2,"CREATE TABLE ""allstarfull"" (\n""index"" INTEGER,..."
1,index,ix_allstarfull_index,allstarfull,3,"CREATE INDEX ""ix_allstarfull_index""ON ""allstar..."
2,table,schools,schools,26,"CREATE TABLE ""schools"" (\n""index"" INTEGER,\n ..."
3,index,ix_schools_index,schools,31,"CREATE INDEX ""ix_schools_index""ON ""schools"" (""..."
4,table,batting,batting,99,"CREATE TABLE ""batting"" (\n""index"" INTEGER,\n ..."
5,index,ix_batting_index,batting,100,"CREATE INDEX ""ix_batting_index""ON ""batting"" (""..."


In [8]:
# Displaying all tables in database
pd.read_sql("select name from sqlite_master where type = 'table';", con)

Unnamed: 0,name
0,allstarfull
1,schools
2,batting


In [9]:
# Create a variable, `query`, containing a SQL query which reads in all data from the `` table
query = """
SELECT *
    FROM allstarfull;
"""

allstar_observations = pd.read_sql(query, con)
allstar_observations.head()

Unnamed: 0,index,playerID,yearID,gameNum,gameID,teamID,lgID,GP,startingPos
0,0,gomezle01,1933,0,ALS193307060,NYA,AL,1.0,1.0
1,1,ferreri01,1933,0,ALS193307060,BOS,AL,1.0,2.0
2,2,gehrilo01,1933,0,ALS193307060,NYA,AL,1.0,3.0
3,3,gehrich01,1933,0,ALS193307060,DET,AL,1.0,4.0
4,4,dykesji01,1933,0,ALS193307060,CHA,AL,1.0,5.0


In [10]:
best_query = """
SELECT playerID, sum(GP) AS num_games_played, AVG(startingPos) AS avg_starting_position
    FROM allstarfull
    GROUP BY playerID
    ORDER BY num_games_played DESC, avg_starting_position ASC
    LIMIT 3
"""
best = pd.read_sql(best_query, con)
best.head()

Unnamed: 0,playerID,num_games_played,avg_starting_position
0,musiast01,24.0,6.357143
1,mayswi01,24.0,8.0
2,aaronha01,24.0,8.470588


### Artists Database Example

In [11]:
conn = sq3.connect("databases/artists.sqlite")

# Displaying all tables in database
pd.read_sql("select name from sqlite_master where type = 'table';", conn)

Unnamed: 0,name
0,reviews
1,artists
2,genres
3,labels
4,years
5,content


In [12]:
query="SELECT * FROM artists"
music_reviews = pd.read_sql_query(query, conn)
music_reviews.tail()

Unnamed: 0,reviewid,artist
18826,1535,coldcut
18827,1341,cassius
18828,5376,mojave 3
18829,2413,don caballero
18830,3723,neil hamburger


In [13]:
query1="SELECT * FROM artists WHERE artist='kleenex'"
result=pd.read_sql_query(query1, conn)
result

Unnamed: 0,reviewid,artist
0,22661,kleenex
1,14056,kleenex


In [14]:
# conn.commit()
conn.close()

|![head.png](imgs/head.png)|![head.png](imgs/head.png)|
|---|---|
|![reading_nosql.png](imgs/reading_nosql.png)|![reading_online.png](imgs/reading_online.png)|