## If you weren't here last time, make sure the necessary installations are made
If you were you can skip down to and run the cell containing the connection string for the database instance

In [None]:
!pip install pymssql

#### For Mac users:
You will need to install the following program in the terminal or the notebook will throw an error when importing pymssql.

       brew install freetds

Non Mac users may not need this install at all, but in the event that they do, an ubuntu version can be found here:
    https://packages.ubuntu.com/search?keywords=FreeTDS

## Run this cell to connect to the database and set up to make queries

In [None]:
import pymssql
import scipy.stats as stats
import matplotlib.pyplot as plt
%matplotlib inline

import sys
sys.path.append('../../')
from src.pySQL_funcs import pretty_query

with open('../../src/pw2') as pw_file:
    server, user, pw, database = pw_file.readline().split(',')
    
conn = pymssql.connect(host=server,user=user,password=pw,database=database)
cur = conn.cursor()

## Answering questions:

Once everyone is ready we'll dive into the following questions. The table schemas can be found at the bottom of this notebook, however you may find it easier to pull up the github readme for the project, which also contains these schema tables as an apendix at the end, in another window, link below:

https://github.com/dougtheeconomist/flag-on-the-play/blob/master/README.md

#### This time the primary goal is to do a little bit of data exploration in regards to penalties and coaching. 

## Task A
To warm up, perform a simple query on the guest.team table to return the names and average penalties per season of the 10 coaches with the highest average penalties per season for the teams that they coached. 

In [None]:
conn.rollback()s
query = """

Your code here

;
"""
cur.execute(query)
cur.fetchall() 

<details><summary>
Possible solution:
</summary>

SELECT TOP 10 AVG(against_count), coach_name
    
FROM guest.teams
    
GROUP BY coach_name
    
ORDER BY AVG(against_count) DESC
    ;

## Task B
### Part 1
Now that we've seen which coaches manage teams with the highest penalties, we want to get a sense of the overall distribution of average coached penalties per season, and we want to do this visually. First write a query that returns the average penalties per season for all coaches and save the output of this query to a list or array.

In [None]:
conn.rollback()s
query = """

Your code here

;
"""
cur.execute(query)
cur.fetchall() 

<details><summary>
Possible solution:
</summary>
conn.rollback()s
    
query = """
    
SELECT AVG(against_count)
    
FROM guest.teams
    
GROUP BY coach_name  

    ;"""

cur.execute(query)
    
list_for_graphing = cur.fetchall() 

### Part 2
Now that you have your list of averages, we want to graph these as a histogram to get a sense of ovarall distribution. However, if we print the results of this query we will see that each item returned is a tuple rather than an individual value, and Python will not graph that well. So we need to isolate the first element from each of these tuples and only graph these (the second element is empty but Python wants to graph that in a really unhelpful way).

Is it normally distributed?

In [None]:
# Use your favorite graphing library to generate your histogram here


<details><summary>
Possible solution:
</summary>
for i in range(len(list_for_graphing)):
    
    list_for_graphing[i] = list_for_graphing[i][0]

plt.hist(list_for_graphing)

## Task C
Another measurement that might be useful to look at is the variation around the average number of penalties across coaches. Repeat both parts of task B for standard deviation instead of the mean for coaches who have coached for more than 4 seasons.

In [None]:
conn.rollback()s
query = """

Your code here

;
"""
cur.execute(query)
cur.fetchall() 

<details><summary>
Possible solution:
</summary>
conn.rollback()s
    
query = """
    
SELECT STDEV(against_count)
    
FROM guest.teams

GROUP BY coach_name
    
HAVING COUNT(year) > 4

    ;"""

cur.execute(query)
    
list_for_graphing = cur.fetchall() 
  
for i in range(len(list_for_graphing)):
    
    list_for_graphing[i] = list_for_graphing[i][0]

In [None]:
# Use your favorite graphing library to generate your histogram here


## Task D
### Part 1
There clearly is an outlier when it comes to standard deviations by coach. Write a query to identify which coach this is. 

In [None]:
conn.rollback()s
query = """

Your code here

;
"""
cur.execute(query)
cur.fetchall() 

<details><summary>
Possible solution:
</summary>
WITH coaches(name, deviation) AS 
    
(SELECT DISTINCT coach_name AS name, STDEV(against_count) AS deviation
    
FROM guest.teams
    
GROUP BY coach_name
    
HAVING COUNT(year)>4)
    

SELECT deviation, name
    
FROM coaches
    
WHERE deviation = (SELECT MAX(deviation) FROM coaches)

    ;

### Part 2
Now that you've identified *which* coach has this abnormal discrepancy in penalties per season, let's take a closer look and see why this is the case. Write a query to return the year, number of penalties and city for which the team he coaches plays for each year that he coaches. 

In [None]:
conn.rollback()s
query = """

Your code here

;
"""
cur.execute(query)
cur.fetchall() 

<details><summary>
Possible solution:
</summary>
SELECT against_count, year, team_city
    
FROM guest.teams

WHERE coach_name = 'Jack Del Rio'

    ;

## Task E
Putting coaching aside, let's take a final look at the Seahawks in particular; we know from previous explorations that they are the 5th highest penalized team since the 2009 season. Write a query to return the year and percentage of total penalties accrued around the league each year that were committed by Seattle. You can output the results as a percentage using SQL Server's built in FORMAT() function. For example:

    SELECT FORMAT((37.0/38.0),'P') AS [Percentage] -- 97.37 %
    
Or to specify a different number of significant figures, include an integer after the 'P'

    SELECT FORMAT((37.0/38.0),'P0') AS [WholeNumberPercentage] -- 97 %
    SELECT FORMAT((37.0/38.0),'P3') AS [ThreeDecimalsPercentage] -- 97.368 %

(Hint: since the numerical columns found within the relevant tables are integers and SQL will return a query result in the same data type as the column in question, it will be helpful to convert the relevant numbers to a float format, which can be done with SQL's CAST() function)

In [None]:
conn.rollback()s
query = """

Your code here

;
"""
cur.execute(query)
cur.fetchall() 

<details><summary>
Possible solution:
</summary>
WITH SUMMARY(y, total) AS (SELECT year AS y, CAST(SUM(against_count) AS float) AS total FROM guest.teams GROUP BY 
    
    year)

SELECT a.year, FORMAT((a.ac_hawks / b.total),'P') AS [Percentage] 
    
FROM (SELECT year, (CAST(against_count AS float)) AS ac_hawks FROM guest.teams WHERE team_city = 'Seattle') a 
    
JOIN SUMMARY b 
    
ON a.year = b.y


    ;

# guest.teams Table Schema

|id 			   | Description                                | Type      |
|------------------|:-------------------------------------------|:----------|
| year      	   | Year of football season                    | INT       |
| team_city 	   | City where team is located                 | VARCHAR   |
| team_id   	   | id number unique to team                   | INT       |
| coach_name       | Name of team coach                         | VARCHAR   |
| coach_id         | id number unique to coach                  | INT       |
| ranking          | Rank of most to least penalized            | INT       |
| games     	   | Games played that season                   | INT       |
| plays            | Number of plays that season                | INT       |
| against_count    | Number of flags against team               | INT       |
| agnst_yrds       | Total yards penalized in season            | INT       |
| ben_count        | Number of flags on opposing team           | INT       |
| ben_yrds     	   | Yards given for opposing flags             | INT       |
| net_count        | Team flags less opposing flags	            | INT       |
| net_yrds         | Yards lost plus yards gained from penalties| INT       |
| total_flags      | Total flags thrown in team's games         | INT       |


