## Module 2 Project 

### Executive Summary

This notebook is split into the following sections:

    1) Connect to the SQL database
    2) Extract + Transform + Load
    3) 


Upon completion of this lab, each unique team in this dataset should have a record in the MongoDB instance containing the following information:

The name of the team
The total number of goals scored by the team during the 2011 season
The total number of wins the team earned during the 2011 season
A histogram visualization of the team's wins and losses for the 2011 season (store the visualization directly)

The team's win percentage on days where it was raining during games in the 2011 season.

### Inputs

    All team names
    All Match results for 2011
    All goals scored for 2011
    All match dates 
    
    Weather

SQL DB: Name of team, goals scored during the 2011 season, total number of wins
Weather: From the Weather API - Berlin only
Join by date

List the match days => max = 365, likely below 200

### Tables:

There are the following Tables: 

Matches, Teams, Unique Teams, Teams_in_Matches, FlatView, FlatView_Advanced, FlatView_Chrono_TeamOrder_Reduced,  

Match_ID (int): unique ID per match
Div (str): identifies the division the match was played in (D1 = Bundesliga, D2 = Bundesliga 2, E0 = English Premier League)
Season (int): Season the match took place in (usually covering the period of August till May of the following year)
Date (str): Date of the match
HomeTeam (str): Name of the home team
AwayTeam (str): Name of the away team
FTHG (int) (Full Time Home Goals): Number of goals scored by the home team
FTAG (int) (Full Time Away Goals): Number of goals scored by the away team
FTR (str) (Full Time Result): 3-way result of the match (H = Home Win, D = Draw, A = Away Win)

#### Table: Teams

Season (str): Football season for which the data is valid
TeamName (str): Name of the team the data concerns
KaderHome (str): Number of Players in the squad
AvgAgeHome (str): Average age of players
ForeignPlayersHome (str): Number of foreign players (non-German, non-English respectively) playing for the team
OverallMarketValueHome (str): Overall market value of the team pre-season in EUR (based on data from transfermarkt.de)
AvgMarketValueHome (str): Average market value (per player) of the team pre-season in EUR (based on data from transfermarkt.de)
StadiumCapacity (str): Maximum stadium capacity of the team's home stadium

#### Table: Unique Teams

TeamName (str): Name of a team
Unique_Team_ID (int): Unique identifier for each team

#### Table: Teams_in_Matches

Match_ID (int): Unique match ID
Unique_Team_ID (int): Unique team ID (This table is used to easily retrieve each match a given team has played in)
Based on these tables I created a couple of views which I used as input for my machine learning models:

#### View: FlatView

Combination of all matches with the respective additional data from Teams table for both home and away team.

#### View: FlatView_Advanced

Same as Flatview but also includes Unique_Team_ID and Unique_Team in order to easily retrieve all matches played by a team in chronological order.

#### View: FlatView_Chrono_TeamOrder_Reduced

Similar to Flatview_Advanced, however missing the additional attributes from team in order to have a longer history including years 1993 - 2004. Especially interesting if one is only interested in analyzing winning/loosing streaks.
Match_ID (int): unique ID per match
Div (str): identifies the division the match was played in (D1 = Bundesliga, D2 = Bundesliga 2, E0 = English Premier League)
Season (int): Season the match took place in (usually covering the period of August till May of the following year)
Date (str): Date of the match
HomeTeam (str): Name of the home team
AwayTeam (str): Name of the away team
FTHG (int) (Full Time Home Goals): Number of goals scored by the home team
FTAG (int) (Full Time Away Goals): Number of goals scored by the away team
FTR (str) (Full Time Result): 3-way result of the match (H = Home Win, D = Draw, A = Away Win)

In [9]:
import sqlite3
conn = sqlite3.connect('database.sqlite')
cur = conn.cursor()

In [16]:
cur.execute("""SELECT * FROM MATCHES LIMIT 5;""")
cur.fetchall()

[(1, 'D2', 2009, '2010-04-04', 'Oberhausen', 'Kaiserslautern', 2, 1, 'H'),
 (2, 'D2', 2009, '2009-11-01', 'Munich 1860', 'Kaiserslautern', 0, 1, 'A'),
 (3, 'D2', 2009, '2009-10-04', 'Frankfurt FSV', 'Kaiserslautern', 1, 1, 'D'),
 (4, 'D2', 2009, '2010-02-21', 'Frankfurt FSV', 'Karlsruhe', 2, 1, 'H'),
 (5, 'D2', 2009, '2009-12-06', 'Ahlen', 'Karlsruhe', 1, 3, 'A')]

In [21]:
cur.execute("""SELECT * FROM MATCHES WHERE Season == '2011' LIMIT 50;""")

<sqlite3.Cursor at 0x21086b63730>

In [22]:
cur.fetchall()

[(1092, 'D1', 2011, '2012-03-31', 'Nurnberg', 'Bayern Munich', 0, 1, 'A'),
 (1093, 'D1', 2011, '2011-12-11', 'Stuttgart', 'Bayern Munich', 1, 2, 'A'),
 (1094, 'D1', 2011, '2011-08-13', 'Wolfsburg', 'Bayern Munich', 0, 1, 'A'),
 (1095, 'D1', 2011, '2011-11-27', 'Mainz', 'Bayern Munich', 3, 2, 'H'),
 (1096, 'D1', 2011, '2012-02-18', 'Freiburg', 'Bayern Munich', 0, 0, 'D'),
 (1097, 'D1', 2011, '2012-01-20', "M'gladbach", 'Bayern Munich', 3, 1, 'H'),
 (1098, 'D1', 2011, '2012-02-04', 'Hamburg', 'Bayern Munich', 1, 1, 'D'),
 (1099, 'D1', 2011, '2012-04-21', 'Werder Bremen', 'Bayern Munich', 1, 2, 'A'),
 (1100, 'D1', 2011, '2011-09-18', 'Schalke 04', 'Bayern Munich', 0, 2, 'A'),
 (1101, 'D1', 2011, '2011-10-23', 'Hannover', 'Bayern Munich', 2, 1, 'H'),
 (1102, 'D1', 2011, '2011-10-01', 'Hoffenheim', 'Bayern Munich', 0, 0, 'D'),
 (1103, 'D1', 2011, '2012-03-03', 'Leverkusen', 'Bayern Munich', 2, 0, 'H'),
 (1104,
  'D1',
  2011,
  '2011-08-27',
  'Kaiserslautern',
  'Bayern Munich',
  0,
  3,


In [11]:
ls

 Volume in drive C is Windows
 Volume Serial Number is 0463-426F

 Directory of C:\Users\User1\python\Flatiron_Data_Science_Bootcamp\student_versions\london-ds-010620\Projects\mod_2

29/01/2020  16:47    <DIR>          .
29/01/2020  16:47    <DIR>          ..
29/01/2020  16:10             2,349 .gitignore
29/01/2020  15:45    <DIR>          .ipynb_checkpoints
23/01/2020  10:35             7,833 BRIEF.md
29/01/2020  16:44                 0 data.sqlite
23/01/2020  10:35         6,279,168 database.sqlite
23/01/2020  10:35             8,078 index.ipynb
29/01/2020  16:47             1,499 module_2_project.ipynb
29/01/2020  15:55                32 README.md
               7 File(s)      6,298,959 bytes
               3 Dir(s)  29,426,835,456 bytes free


In [12]:
conn

<sqlite3.Connection at 0x21086ac7e30>