# Introduction to SQL Using Python

In [1]:
import sqlite3
import pandas as pd

Download dataset here, https://www.kaggle.com/laudanum/footballdelphi.

In [2]:
# Connect to database
conn = sqlite3.connect('''database.sqlite''')

# Create cursor object
cur = conn.cursor()

In [None]:
cur.execute('''Enter SQL query here;''')
Unique_Teams_df = pd.DataFrame(cur.fetchall())
Unique_Teams_df.columns = [x[0] for x in cur.description]
Unique_Teams_df.head()

For the first query, we will preview the contents of the table, __Unique_Teams__. To do this we can run the following query:

In [7]:
# View Unique_Teams dataframe

cur.execute('''SELECT * 
               FROM Unique_Teams;''')
Unique_Teams_df = pd.DataFrame(cur.fetchall())
Unique_Teams_df.columns = [x[0] for x in cur.description]
Unique_Teams_df

Unnamed: 0,TeamName,Unique_Team_ID
0,Bayern Munich,1
1,Dortmund,2
2,Leverkusen,3
3,RB Leipzig,4
4,Schalke 04,5
5,M'gladbach,6
6,Wolfsburg,7
7,FC Koln,8
8,Hoffenheim,9
9,Hertha,10


The __SELECT__ statement tells the database what information we are trying to pull from the dataset. The asterisk, (*), tells the database that we want to select all the columns available in the table.
The __FROM__ statement tells the database that we want to select the table, __Unique_Teams__ to select data from.
The SQL query is ended by a semicolon, (;). This tell the database that you have ended your query similar to the way a period ends a sentence.

You can practice this on your own by trying to query all the contents of the __Matches__ table and comparing your query to the answer below.

In [8]:
# View Matches dataframe

cur.execute('''SELECT * 
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,Match_ID,Div,Season,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
0,1,D2,2009,2010-04-04,Oberhausen,Kaiserslautern,2,1,H
1,2,D2,2009,2009-11-01,Munich 1860,Kaiserslautern,0,1,A
2,3,D2,2009,2009-10-04,Frankfurt FSV,Kaiserslautern,1,1,D
3,4,D2,2009,2010-02-21,Frankfurt FSV,Karlsruhe,2,1,H
4,5,D2,2009,2009-12-06,Ahlen,Karlsruhe,1,3,A
5,6,D2,2009,2010-04-03,Union Berlin,Karlsruhe,1,1,D
6,7,D2,2009,2009-08-14,Paderborn,Karlsruhe,2,0,H
7,8,D2,2009,2010-03-08,Bielefeld,Karlsruhe,0,1,A
8,9,D2,2009,2009-09-26,Kaiserslautern,Karlsruhe,2,0,H
9,10,D2,2009,2009-11-21,Hansa Rostock,Karlsruhe,2,1,H


There are 9 columns in the __Matches__ table. Perphaps we only want to select the column __Match_ID__. To do this we will just write __Match_ID__ in our select statement instead of the asterisk. The image below shows how this is done:

In [9]:
# View Match_ID from Matches

cur.execute('''SELECT Match_ID
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,Match_ID
0,1
1,2
2,3
3,4
4,5
5,6
6,7
7,8
8,9
9,10


Try selecting only the column __HomeTeam__ from the __Matches__ table and comparing your code to the query below:

In [10]:
# View HomeTeam from Matches

cur.execute('''SELECT HomeTeam
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,HomeTeam
0,Oberhausen
1,Munich 1860
2,Frankfurt FSV
3,Frankfurt FSV
4,Ahlen
5,Union Berlin
6,Paderborn
7,Bielefeld
8,Kaiserslautern
9,Hansa Rostock


You can select multiple columns by typing the name of each column in the select statement and seperating each column name by a comma, (,). You can see the names of both the Home Teams and Away Teams in each match below:

In [12]:
# View HomeTeam and AwayTeam from Matches

cur.execute('''SELECT HomeTeam,
                      AwayTeam
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,HomeTeam,AwayTeam
0,Oberhausen,Kaiserslautern
1,Munich 1860,Kaiserslautern
2,Frankfurt FSV,Kaiserslautern
3,Frankfurt FSV,Karlsruhe
4,Ahlen,Karlsruhe
5,Union Berlin,Karlsruhe
6,Paderborn,Karlsruhe
7,Bielefeld,Karlsruhe
8,Kaiserslautern,Karlsruhe
9,Hansa Rostock,Karlsruhe


Practice selecting multiple columns by selecting __Match_ID__ and __Date__ from the __Matches__ table and compare your query to the code below:

In [14]:
# View Match_ID and Date from Matches

cur.execute('''SELECT Match_ID,
                      Date
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,Match_ID,Date
0,1,2010-04-04
1,2,2009-11-01
2,3,2009-10-04
3,4,2010-02-21
4,5,2009-12-06
5,6,2010-04-03
6,7,2009-08-14
7,8,2010-03-08
8,9,2009-09-26
9,10,2009-11-21


There are 24,625 rows in the Matches table. This makes it difficult to find specific information. Perhaps we only wanted information from the 2015 Season. To do this, we can add a __WHERE__ statement to our SQL query. This allows us to select information that meets a certain condition. To query the information only pertaining to the 2015 season from the __Matches__ tables, we can use the query below:

In [15]:
# View HomeTeam and AwayTeam from Matches

cur.execute('''SELECT *
               FROM Matches
               WHERE Season = 2015;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,Match_ID,Div,Season,Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
0,3540,D1,2015,2015-10-17,Werder Bremen,Bayern Munich,0,1,A
1,3541,D1,2015,2016-03-05,Dortmund,Bayern Munich,0,0,D
2,3542,D1,2015,2016-03-19,FC Koln,Bayern Munich,0,1,A
3,3543,D1,2015,2016-01-22,Hamburg,Bayern Munich,1,2,A
4,3544,D1,2015,2015-12-19,Hannover,Bayern Munich,0,1,A
5,3545,D1,2015,2016-02-27,Wolfsburg,Bayern Munich,0,2,A
6,3546,D1,2015,2016-04-09,Stuttgart,Bayern Munich,1,3,A
7,3547,D1,2015,2015-09-26,Mainz,Bayern Munich,0,3,A
8,3548,D1,2015,2016-02-06,Leverkusen,Bayern Munich,0,0,D
9,3549,D1,2015,2016-02-14,Augsburg,Bayern Munich,1,3,A


Now we only have 992 rows and it allows us to see the dat from the 2015 season much easier than scrolling through the entire table as queired before.

In [4]:
# View Teams dataframe

cur.execute('''SELECT * FROM Teams;''')
Teams_df = pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df.head()

Unnamed: 0,Season,TeamName,KaderHome,AvgAgeHome,ForeignPlayersHome,OverallMarketValueHome,AvgMarketValueHome,StadiumCapacity
0,2017,Bayern Munich,27,26,15,597950000,22150000,75000
1,2017,Dortmund,33,25,18,416730000,12630000,81359
2,2017,Leverkusen,31,24,15,222600000,7180000,30210
3,2017,RB Leipzig,30,23,15,180130000,6000000,42959
4,2017,Schalke 04,29,24,17,179550000,6190000,62271


In [5]:
# View Teams_in_Matches dataframe

cur.execute('''SELECT * FROM Teams_in_Matches;''')
Teams_in_Matches_df = pd.DataFrame(cur.fetchall())
Teams_in_Matches_df.columns = [x[0] for x in cur.description]
Teams_in_Matches_df.head()

Unnamed: 0,Match_ID,Unique_Team_ID
0,1,26
1,1,46
2,2,26
3,2,42
4,3,26


In [6]:
# View Unique_Teams dataframe

cur.execute('''SELECT * FROM Unique_Teams;''')
Unique_Teams_df = pd.DataFrame(cur.fetchall())
Unique_Teams_df.columns = [x[0] for x in cur.description]
Unique_Teams_df.head()

Unnamed: 0,TeamName,Unique_Team_ID
0,Bayern Munich,1
1,Dortmund,2
2,Leverkusen,3
3,RB Leipzig,4
4,Schalke 04,5
