# Do you win by scoring or defending?

In this post I'm going to try an figure out the answer to the following question:

> For teams that win the most games, do they win by scoring more, or giving away less goals?

To do this we will look at the data for the 2016/2017 Champions League. Luckily all the 
data is available on the awesome [football-data.org](http://api.football-data.org/) api!

In [74]:
%pylab inline
import requests
import pandas
from bokeh.charts import Bar
from bokeh.io import output_notebook, show
from bokeh.layouts import layout, gridplot
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource


output_notebook()

api_key = 'f4466371787c4fd89d1c0f7f56c0728f'
headers = { 'X-Auth-Token': api_key }

champ_leq = requests.get('http://api.football-data.org/v1/competitions/440', headers=headers).json()
fixtures = requests.get(champ_leq['_links']['fixtures']['href'], headers=headers).json()
leagueTable = requests.get(champ_leq['_links']['leagueTable']['href'], headers=headers).json()

Populating the interactive namespace from numpy and matplotlib


`%matplotlib` prevents importing * from pylab and numpy
  "\n`%matplotlib` prevents importing * from pylab and numpy"


In [2]:
def remix(data): 
    o = {}
    o['awayTeam'] = data['awayTeamName']
    o['homeTeam'] = data['homeTeamName']
    
    o['homeTeamGoals'] = data['result']['goalsHomeTeam']    
    o['awayTeamGoals'] = data['result']['goalsAwayTeam']
    
    o['winner'] = None
    o['loser'] = None
    o['winnerGoals']= 0
    o['loserGoals']= 0
    
    if data['status'] == 'FINISHED' or data['status'] == 'FT':
        if o['homeTeamGoals'] > o['awayTeamGoals']:
            o['winner'] = o['homeTeam']
            o['winnerGoals'] = o['homeTeamGoals']
            o['loser'] = o['awayTeam']
            o['loserGoals'] = o['homeTeamGoals']

        if o['awayTeamGoals'] > o['homeTeamGoals']:
            o['winner'] = o['awayTeam']
            o['winnerGoals'] = o['awayTeamGoals']
            o['loser'] = o['homeTeam']
            o['loserGoals'] = o['homeTeamGoals']
    
    o['goalDelta'] = o['winnerGoals'] - o['loserGoals']
    
    return o

games = [remix(d) for d in fixtures['fixtures']]

# Number of wins per team

In [3]:
group = pandas.DataFrame(games)[['winner']]
p = Bar(group.apply(pandas.value_counts), legend=False)
show(p)

# Number of goals per away team

In [64]:
p1 = Bar(pandas.DataFrame(games)[['awayTeam', 'awayTeamGoals']].groupby('awayTeam').sum(), legend=False, title="Away Goals Scored")
p2 = Bar(pandas.DataFrame(games)[['awayTeam', 'homeTeamGoals']].groupby('awayTeam').sum(), legend=False, title="Away Goals Conceded")

l = layout([
  [p1, p2]
], sizing_mode='scale_width')

show(l)

# Number of goals per home team

In [66]:
p1 = Bar(pandas.DataFrame(games)[['homeTeam', 'homeTeamGoals']].groupby('homeTeam').sum(), legend=False, title="Home Goals Scored")
p2 = Bar(pandas.DataFrame(games)[['homeTeam', 'awayTeamGoals']].groupby('homeTeam').sum(), legend=False, title="Home Goals Conceded")

l = layout([
  [p1, p2]
], sizing_mode='scale_width')

show(l)

# Total number of goals per team

In [67]:
df_away = pandas.DataFrame(games)[['awayTeam', 'awayTeamGoals', 'homeTeamGoals']].groupby('awayTeam').sum()
df_home = pandas.DataFrame(games)[['homeTeam', 'homeTeamGoals', 'homeTeamGoals']].groupby('homeTeam').sum()
df_away.index.names = ['Team']
df_home.index.names = ['Team']
df_away.columns = ['scored', 'conceded']
df_home.columns = ['scored', 'conceded']

df_total = df_home + df_away

In [77]:
p1 = Bar(df_total['scored'], legend=False, title="Scored")
p1 = Bar(df_total['conceded'], legend=False, title="Conceded")

g = gridplot([[p1, p2]], toolbar_location=None, sizing_mode='scale_width')

show(g)