# Introduction to SQL Using Python: Computing Statistics & Aggregating Data

In my last blog, I discussed how to filer data in SQL using the __WHERE__ statement. This blog is a tutorial on how to compute statistics and aggregate data in SQL. We will discuss the following:
 - COUNT Function
 - AS Command
 - SELECT DISTINCT
 - MAX & MIN Functions
 - AVG & SUM Functions
 - Using the WHERE Statement with Computed Statistics
 - ORDER BY Statement
 - GROUP BY Statement
 - HAVING Statement

In this tutorial we will continue working with the same database from my last blog, which can be downloaded here (https://www.kaggle.com/laudanum/footballdelphi). We will be using the Matches table and the Teams table from this database. If you haven't seen my last blog post, I recommend reading it before this tutorial (https://medium.com/analytics-vidhya/introduction-to-sql-using-python-filtering-data-with-the-where-statement-80d89688f39e). It covers filtering data in SQL with the __WHERE__ statement and also reviews the various columns in the Matches table. The columns in the Teams table are described as the follows from https://www.kaggle.com/laudanum/footballdelphi:

 - Season (str): Football season for which the data is valid
 - TeamName (str): Name of the team the data concerns
 - KaderHome (str): Number of Players in the squad
 - AvgAgeHome (str): Average age of players
 - ForeignPlayersHome (str): Number of foreign players (non-German, non-English respectively) playing for the team
 - OverallMarketValueHome (str): Overall market value of the team pre-season in EUR (based on data from transfermarkt.de)
 - AvgMarketValueHome (str): Average market value (per player) of the team pre-season in EUR (based on data from transfermarkt.de)
 - StadiumCapacity (str): Maximum stadium capacity of the team's home stadium

To begin, we will download the necessary libraries, __sqlite3__ and __pandas__.

In [1]:
# Import necessary libraries

import sqlite3
import pandas as pd

Next, you will need to connect to the database and create a cursor object.

In [2]:
# Connect to database
conn = sqlite3.connect('''database.sqlite''')

# Create cursor object
cur = conn.cursor()

The following is format we will be using to run our SQL queries in Python.

In [None]:
cur.execute('''Enter SQL query here;''') # Runs SQL query
data = pd.DataFrame(cur.fetchall()) # Converts SQL query results into dataframe format
data.columns = [x[0] for x in cur.description] # Labels the columns of the dataframe
data # View SQL results dataframe

# COUNT Function

The first function we will look at is the __COUNT__ function. Previously we have either put a single asterisk in the __SELECT__ statement or we had listed the names of the columns we wanted to be returned. The __COUNT__ function can also be listed in the __SELECT__ statement and will count the number of rows specified. The query below uses the function __COUNT(*)__ in the __SELECT__ statement. __COUNT(*)__ will count all the rows from the table specified. The below query counts all the rows in the __Teams__ dataset. 

In [9]:
# Return the number of rows from Teams dataframe

cur.execute('''SELECT COUNT(*) 
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,COUNT(*)
0,468


There are total of 468 rows included in the __Teams__ dataset. We can also insert a column name instead of an asterisk between the parentheses of the __COUNT__ function, for example, __COUNT(Season)__. The function will then count all the rows in the __Season__ column where there is a non-null value. If a value of __Season__ is null for any row, then that row will not be counted. If there are no null values in the __Season__ column then the following function will return 468, the same number as the total number of rows in the __Teams__ dataset.

In [90]:
# Return the number of Season rows in the Teams Dataframe

cur.execute('''SELECT COUNT(Season) 
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,COUNT(Season)
0,468


Great, the number of rows with non-null values is 468, the same as the total amount of rows included in the __Teams__ dataset. That means there are no null values. To practice using the __COUNT__ function, count the number of rows in the __Matches__ dataset and compare your query to the one below:

In [11]:
# Return the number of rows in the Matches dataframe

cur.execute('''SELECT COUNT(*) 
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,COUNT(*)
0,24625


There should be 24,625 rows in the __Matches__ dataset. Check to see if there are any missing values in the column __Season__ from the __Matches__ dataset and compare your query to the one below:

In [12]:
# Count the number of Season rows in the Matches Dataframe

cur.execute('''SELECT COUNT(Season) 
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,COUNT(Season)
0,24625


There are 24,625 rows with non-null values in the __Season__ column in the __Matches__ dataset. Since this is the same as the total number of rows in the __Matches__ dataset, there are no missing/null values in the __Season__ column.

# AS Command

When we get our returned results using the __COUNT__ function, the name of our returned column is __COUNT(Season)__. This column name isn't very clear. This would be a good time to rename the column. We can rename a column using the __AS__ command. The __AS__ command is used to rename a column or even table with an alias. To see how the __AS__ command is used view the query below:

In [91]:
# Count the number of Season rows in the Matches Dataframe

cur.execute('''SELECT COUNT(Season) AS Num_of_NonNull_Season_Rows
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,Num_of_NonNull_Season_Rows
0,24625


We used the query you had just practiced on your own. The difference is that the column __COUNT(Season)__ was renamed to the alias, __Num_of_NonNull_Season_Rows__ using the __AS__ command. To practice using the __AS__ command on your own, write a query that counts all the rows where __TeamName__ is not a null value from the __Teams__ dataset, rename the column to __Num of_NonNull_TeamName_Rows__. Compare your query to the one below:

In [92]:
# Count the number of TeamName rows in the Teams Dataframe

cur.execute('''SELECT COUNT(TeamName) AS Num_of_NonNull_TeamName_Rows
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Num_of_NonNull_TeamName_Rows
0,468


# SELECT DISTINCT

With the __COUNT__ function, we were able to see how many rows were listed in both the __Teams__ and __Matches__ datasets. We also saw the number of rows that did not have null values in the __Seasons__ column for both datasets. What if we want to know the distinct, unique values in the __Seasons__ column in the __Teams__ dataset. We can't use __COUNT(Season)__, because that just gives us the number of rows that have a specified value in the __Season__ column. Instead we will use the __DISTINCT__ clause in the __SELECT__ statement. The below query will show us all of unique values that are included in the __Season__ column in the __Teams__ dataset.

In [94]:
# Return the unqiue Season values in the Teams Dataframe

cur.execute('''SELECT DISTINCT Season
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Season
0,2017
1,2016
2,2015
3,2014
4,2013
5,2012
6,2011
7,2010
8,2009
9,2008


Practice selecting all the unique __AwayTeams__ there are in the __Matches__ dataset and compare your query to the one below:

In [22]:
# Show all the unqiue AwayTeams in the Matches Dataframe

cur.execute('''SELECT DISTINCT AwayTeam
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,AwayTeam
0,Kaiserslautern
1,Karlsruhe
2,Leverkusen
3,Nurnberg
4,Schalke 04
5,Stuttgart
6,Werder Bremen
7,Bochum
8,Hannover
9,Hansa Rostock


# Using COUNT & DISTINCT Together

We just saw how to get all the unique __AwayTeam__ names from the Matches dataset above. What happens when we want to know __how many__ unique values are returned? To get the count of the unique __AwayTeam__ names, we can use the __COUNT__ function together with the __DISTINCT__ clause. The below query will find all the distinct __AwayTeam__ names in the __Matches__ dataset and then return the count or number of rows that are included in distinct __AwayTeam__ names.

In [23]:
# Return the number of unqiue AwayTeams in the Matches Dataframe

cur.execute('''SELECT COUNT (DISTINCT AwayTeam)
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,COUNT (DISTINCT AwayTeam)
0,128


To practice using the __COUNT__ function and __DISTINCT__ clause together, query the number of distinct __Seasons__ included in the __Teams__ dataset and rename the column to __Num_of_Seasons__. Compare your query to the one below:

In [27]:
# Return the number of unique Seasons in the Teams dataset

cur.execute('''SELECT COUNT(DISTINCT Season) AS Num_of_Seasons
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Num_of_Seasons
0,13


# MAX & MIN Functions

We will continue to look at how data can be computed in the __SELECT__ statement. We will now look at how to use the __MAX__ function and __MIN__ function. The __MAX__ function will return the largest value from a specified column and the __MIN__ function will return the smallest value from a specified column. Perhaps we want to know what is the largest stadium capacity that is included in the __Teams__ dataset. To do this, we can use the query below:

In [96]:
# Return the maximum StadiumCapacity in the Teams dataset

cur.execute('''SELECT MAX(StadiumCapacity) AS Largest_StadiumCapacity
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Largest_StadiumCapacity
0,81359


In the example above, 81,359 was the largest value in the __StadiumCapacity__ column in the __Teams__ dataset. To see what the smallest value is in the __StadiumCapacity__ column, we will use the __MIN__ function.

In [97]:
# Return the minimum StadiumCapacity in the Teams dataset

cur.execute('''SELECT MIN(StadiumCapacity) AS Smallest_StadiumCapacity
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Smallest_StadiumCapacity
0,15000


We can see that the smallest stadium capacity included in the __Teams__ dataset is 15,000. To practice using the __MAX__ and __MIN__ functions, query, the largest number of players in a squad, __KaderHome__, and rename the column to __Max_Players__. In the same query, return the smallest number of players in a squad and rename that column to __Min_Players__ and compare your query to the one below:

In [98]:
# Return the maximum and minimum squad count from the Teams table

cur.execute('''SELECT MAX(KaderHome) AS Max_Players,
                      MIN(KaderHome) AS Min_Players
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Max_Players,Min_Players
0,44,23


# AVG Function & SUM Function

Now we will look at two more functions that can be used in the __SELECT__ statement. The __AVG__ function will return the average value of a specified, numeric column. The __SUM__ function will return the sum of a specified, numeric column. We will continue to look at the __StadiumCapacity__ column in the __Teams__ dataset. The below query shows the average stadium capacity in the __Teams__ dataset.

In [38]:
# Return the average stadium capacity from the Teams table

cur.execute('''SELECT AVG(StadiumCapacity) AS Average_StadiumCapacity
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Average_StadiumCapacity
0,47728.094017


There are a lot of numbers following the decimal. If we want to round to the hundredth place we can add a 
__ROUND__ clause around our __AVG__ function, like the query below:

In [39]:
# Return the average stadium capacity from the Teams table

cur.execute('''SELECT ROUND(AVG(StadiumCapacity), 2) AS Average_StadiumCapacity
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Average_StadiumCapacity
0,47728.09


A quick note on the __ROUND__ function. The format is as follows: __ROUND(number to be rounded, number of decimal places to round number to)__.
The number we wanted to round was the __AVG(StadiumCapacity)__ and we wanted to round to the nearest hundredth, or __2__ numbers following the decimal, therefore the format we used was __ROUND(AVG(StadiumCapacity), 2)__.
To read more about the __ROUND__ function visit, https://www.w3schools.com/sql/func_sqlserver_round.asp.

The __SUM__ function will return the sum of a specified column. For example, if we wanted to see the total number of home goals scored in the __Matches__ dataset, we could use the following query:

In [40]:
# Return the total number of home goals included in the Matches Dataframe

cur.execute('''SELECT SUM(FTHG) AS Total_HomeGoals
               FROM Matches;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,Total_HomeGoals
0,37357


To practice using the __AVG__ and __SUM__ functions on your own, query from the __Teams__ dataset the average overall market value of the team pre-season in EUR, __OverallMarketValueHome__, and round the value to the nearest hundredth and then label that column, __Avg_Market_Val__. In the same query, return the sum of the average age of players,  __AvgAgeHome__, and rename that column to __Avg_Age__. Compare your query to the one below:

In [43]:
# Return the average overall market value of the team pre-season and round to the nearest hundreth,
# also return the sum of the average age of players column

cur.execute('''SELECT ROUND(AVG(OverallMarketValueHome), 2) AS Avg_Market_Val,
                      SUM(AvgAgeHome) AS Avg_Age
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Avg_Market_Val,Avg_Age
0,60808525.64,11492


# Using the WHERE Statement with Computed Statistics

If you did not read my last blog post about filtering data in the __WHERE__ statement, now may be a good time to take a look. We are now going to start computing statistics for filtered data.

What if you wanted to know what how many goals Man United scored during home games on average. To get this answer, we could use the following query:

In [45]:
# Show average number of home goals scored by Man United in the Matches Dataframe

cur.execute('''SELECT ROUND(AVG(FTHG), 2) AS Avg_HomeGoals
               FROM Matches
               WHERE HomeTeam = 'Man United';''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,Avg_HomeGoals
0,2.16


By adding the __WHERE__ statement, we are telling our query that we only want to filter data that has a HomeTeam of 'Man United'. To understand this better, we will briefly discuss the SQL order of operations in relation to this query. Although the __SELECT__ statement is the first statement in our SQL query, the first statement to be executed is the __FROM__ statement. Our __FROM__ statement above tells SQL that we are using the __Matches__ dataset as our table. The next statement executed is the __WHERE__ statement. The __WHERE__ statement filters our dataset to include only the information we want, in this case, we only wanted the rows where __HomeTeam = 'Man United'__ to be included. Lastly, the __SELECT__ statement is executed, it returns the data in the form we specify. In this case, we want to return the rounded average value of home goals scored from the data the SQL query filtered through earlier during the query execution. Practice aggregating and using the __WHERE__ statement by writing a query that shows the sum of squad players for all teams during the 2014 season from the __Teams__ dataset. Rename the sum of squad players column to __Total_Soccer_Players__. When you are done compare your query to the one below.

In [56]:
# Query the sum of squad players for all teams during the 2014 season from the Teams dataset

cur.execute('''SELECT SUM(KaderHome) AS Total_Soccer_Players
               FROM Teams
               WHERE Season = 2014;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Total_Soccer_Players
0,1164


# ORDER BY Statement

The __ORDER BY__ statement is used to sort returned data in either ascending or descending order given a specified column. Without an __ORDER BY__ statement, the data will not be sorted in any particular order. If you look at the query below, the distinct average age of players for teams during the 2015 season is not returned in any specific order:

In [101]:
# Query the Distinct AvgAgeHome from the 2015 Season from the Teams dataset

cur.execute('''SELECT DISTINCT AvgAgeHome
               FROM Teams
               WHERE Season = 2015;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,AvgAgeHome
0,25
1,24
2,23
3,26
4,22


We will now add an __ORDER__ statement to the same query. The returned results will now be in ascending order, with the lowest distinct average age of players at the top of the returned results and the highest distinct average age of players at the bottom of our returned results.

In [102]:
# Query the Distinct AvgAgeHome from the 2015 Season from the Teams dataset

cur.execute('''SELECT DISTINCT AvgAgeHome
               FROM Teams
               WHERE Season = 2015
               ORDER BY AvgAgeHome;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,AvgAgeHome
0,22
1,23
2,24
3,25
4,26


We can also return the results of this query in DESCENDING order by __AvgAgeHome__. The following query is the same query as the one above but this one includes a __DESC__ statement after the __ORDER BY AvgAgeHome__ statement. This will sort the returned results in DESCENDING order. The highest distinct average age of players on a team will be returned at the top of the results and the lowest distinct average age of players will be returned at the bottom of the returned results. To see the __DESC__ clause in action, view the query below:

In [104]:
# Query the Distinct AvgAgeHome from the 2015 Season from the Teams dataset

cur.execute('''SELECT DISTINCT AvgAgeHome
               FROM Teams
               WHERE Season = 2015
               ORDER BY AvgAgeHome DESC;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,AvgAgeHome
0,26
1,25
2,24
3,23
4,22


Our results were now returned in DESCENDING order by the distinct average age of players on teams during the 2015 season. If we do not put the __DESC__ clause at the end of an __ORDER BY__ statement, the results will be returned in ASCENDING order by default. You can put __ASC__ at the end of an __ORDER BY__ statement to return results in ascending order but it is not necessary. Practice using the __ORDER BY__ statement by writing a query that returns the __HomeTeam__, __FTHG__ (number of home goals scored in a game) and __FTAG__ (number of away goals scored in a game) from the __Matches__ table. Only include data from the __2010__ season and where __Aachen__ is the name of the home team. Return the results first by the number of home goals scored in a game in descending order. For results that share the same number of home goals scored, order those results by the number of away goals scored in a game in ascending order. Compare your query with the one below:

In [112]:
cur.execute('''SELECT HomeTeam, FTHG, FTAG
               FROM Matches
               WHERE HomeTeam = 'Aachen' AND Season = 2010
               ORDER BY FTHG DESC, FTAG ASC;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,HomeTeam,FTHG,FTAG
0,Aachen,4,0
1,Aachen,4,2
2,Aachen,2,0
3,Aachen,2,1
4,Aachen,2,1
5,Aachen,2,1
6,Aachen,2,1
7,Aachen,2,2
8,Aachen,2,2
9,Aachen,2,2


# GROUP BY Statement

Finally we have reached the __GROUP BY__ function. The __GROUP BY__ function will group together the rows that have matching values in a specified column. To understand how the __GROUP BY__ statement works, we will discuss the query below.

In [114]:
# Return the total home games each team won during the 2016 
# season from the Matches dataset

cur.execute('''SELECT HomeTeam, COUNT(FTR) AS Total_Home_Wins
               FROM Matches
               WHERE FTR = 'H' AND Season = '2016'
               GROUP BY HomeTeam
               ORDER BY COUNT(FTR) DESC;''')
Matches_df =pd.DataFrame(cur.fetchall())
Matches_df.columns = [x[0] for x in cur.description]
Matches_df

Unnamed: 0,HomeTeam,Total_Home_Wins
0,Chelsea,17
1,Tottenham,17
2,Arsenal,14
3,Hannover,14
4,Bayern Munich,13
5,Braunschweig,13
6,Dortmund,13
7,Everton,13
8,Stuttgart,13
9,Hertha,12


This query returns the total number of home games each team won during the 2016 season in descending order of number of home games won. There are two variables in our __SELECT__ statement, __HomeTeam__ and __COUNT(FTR)__. The first variable is an unaggregated column from the __Matches__ dataset, the second variable is an aggregated column from the __Matches__ dataset. The __WHERE__ statement says to only use data where __FTR = 'H'__ (outcome of a game is a home win) and __Season = 2016__. The __GROUP BY__ groups the rows together by __HomeTeam__. The __GROUP BY__ statement tells the query to combine all the rows with the same home team name together. The result is that each home team is grouped together and every time they won a home game, it was added to the total on the left for the final result. 

__GROUP BY__ statements are typically used with aggregate functions like the ones we covered earlier, __COUNT__, __MAX__, __MIN__, __AVG__, and __SUM_. Try practicing using a __GROUP BY__ statement by writing a query that shows the average number of foreign players for each team from the __Teams__ dataset. Rename the column of average number of foreign players to __Avg_Num_Foreign_Players__ and round the column to the nearest hundredth. Return your results by average number of foreign players in ascending order. Compare your query to the one below:

In [115]:
cur.execute('''SELECT TeamName, ROUND(AVG(ForeignPlayersHome),2) AS Avg_Num_Foreign_Players
               FROM Teams
               GROUP BY TeamName
               ORDER BY ROUND(AVG(ForeignPlayersHome),2) ASC;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,TeamName,Avg_Num_Foreign_Players
0,Kiel,3.0
1,Heidenheim,4.0
2,Oberhausen,6.33
3,Osnabruck,6.33
4,St Pauli,6.82
5,Aalen,7.0
6,Regensburg,7.5
7,Erzgebirge Aue,7.6
8,Sandhausen,8.33
9,Ahlen,8.67


# HAVING Statement

The __HAVING__ statement filters groups. It is always used right after the __GROUP BY__ statement. It is like the __WHERE__ statement but is only used to filter aggregated rows. To see how it works, we will run the same query we did before but this time we only want to show the teams where the average number of foreign players is greater than or equal to 15.

In [118]:
cur.execute('''SELECT TeamName, ROUND(AVG(ForeignPlayersHome),2) AS Avg_Num_Foreign_Players
               FROM Teams
               GROUP BY TeamName
               HAVING ROUND(AVG(ForeignPlayersHome),2) >= 15
               ORDER BY ROUND(AVG(ForeignPlayersHome),2) ASC;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,TeamName,Avg_Num_Foreign_Players
0,Saarbrucken,15.0
1,Dortmund,15.15
2,Freiburg,15.31
3,Nurnberg,15.31
4,Cottbus,15.33
5,FC Koln,15.38
6,Stuttgart,15.69
7,Ein Frankfurt,15.85
8,Hannover,15.92
9,M'gladbach,16.0


Try practicing using the __HAVING__ statement to query the sum of the average market value (per player) of the team pre-season in EUR (__AvgMarketValueHome__) for each season where the sum of the average market value (per player) of the team pre-season in EUR is is equal to or less than 65,000,000. Compare your query to the one below:

In [119]:
cur.execute('''SELECT Season, SUM(AvgMarketValueHome)
               FROM Teams
               GROUP BY Season
               HAVING SUM(AvgMarketValueHome) <= 65000000;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Season,SUM(AvgMarketValueHome)
0,2005,42514000
1,2006,46188000
2,2007,51711000
3,2008,61026000
4,2009,61764000


We have reached the end of this tutorial. We have covered the following topics:
    
 - COUNT Function
 - AS Command
 - SELECT DISTINCT
 - MAX & MIN Functions
 - AVG & SUM Functions
 - Using the WHERE Statement with Computed Statistics
 - ORDER BY Statement
 - GROUP BY Statement
 - HAVING Statement

I encourage you to keep playing around with the functions and statements we have covered in this tutorial to gain a deeper understanding of how they work.

In [None]:
Average market value (per player) of the team pre-season in EUR

In [None]:
hen using a __GROUP BY__ statement, every variable in the SELECT statement must either be used to group the data together in the __GROUP BY__ statement, or must be an aggregate function. To see how this works, we wil run the same query, but add another variable.

In [78]:
# View Teams dataframe

cur.execute('''SELECT * 
               FROM Teams;''')
Teams_df =pd.DataFrame(cur.fetchall())
Teams_df.columns = [x[0] for x in cur.description]
Teams_df

Unnamed: 0,Season,TeamName,KaderHome,AvgAgeHome,ForeignPlayersHome,OverallMarketValueHome,AvgMarketValueHome,StadiumCapacity
0,2017,Bayern Munich,27,26,15,597950000,22150000,75000
1,2017,Dortmund,33,25,18,416730000,12630000,81359
2,2017,Leverkusen,31,24,15,222600000,7180000,30210
3,2017,RB Leipzig,30,23,15,180130000,6000000,42959
4,2017,Schalke 04,29,24,17,179550000,6190000,62271
5,2017,M'gladbach,31,25,17,154400000,4980000,54014
6,2017,Wolfsburg,31,24,14,124430000,4010000,30000
7,2017,FC Koln,24,26,9,118550000,4940000,49968
8,2017,Hoffenheim,31,24,14,107330000,3460000,30164
9,2017,Hertha,26,26,12,86800000,3340000,74475
