# A Visualization of The San Francisco Giant's Statistics over the Years

In this notebook, we will be using seaborn and plotly to visualize some of the stats for the San Francisco Giants. The data is stored in a local SQLite database and we will be using the python package "sqlite3" to query the data. Lets begin!

In [4]:
import pandas as pd
import numpy as np
import sqlite3
conn = sqlite3.connect('/Users/robinphetsavong/Documents/robinphetsa.github.io/data_science/data-projects/Projects/Baseball/baseball/database.sqlite')
c = conn.cursor()

We'll run a query to get the number of wins per year for all teams in SF (in this case, the Giants starting in 1958)

In [6]:
giants_win_df = pd.read_sql_query("SELECT Year, w AS 'Wins' FROM team WHERE team_id = 'SFN';", conn)

In [7]:
giants_win_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58 entries, 0 to 57
Data columns (total 2 columns):
year    58 non-null int64
Wins    58 non-null int64
dtypes: int64(2)
memory usage: 1000.0 bytes


In [9]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import plotly.plotly as py
from plotly.graph_objs import *
import plotly.graph_objs as go

In [23]:

data = [
    go.Bar(
        x=giants_win_df['year'], # assign x as the dataframe column 'x'
        y=giants_win_df['Wins'],
                marker=dict(            
            color="rgb(255, 153, 51)")
    )
]

layout = go.Layout(
    title = "SF Giants Wins Through the Years",
        xaxis = dict(title = 'Year'),
        yaxis = dict(title = 'Wins')
    )

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

### Now we run a query to get the number of Wins and Losses for each year

In [24]:
wins_loss_df = pd.read_sql_query("SELECT Year AS 'Year', w AS 'Wins', l AS 'Losses' FROM team WHERE team_id = 'SFN';", conn)

In [25]:
wins_loss_df

Unnamed: 0,Year,Wins,Losses
0,1958,80,74
1,1959,83,71
2,1960,79,75
3,1961,85,69
4,1962,103,62
5,1963,88,74
6,1964,90,72
7,1965,95,67
8,1966,93,68
9,1967,91,71


In [32]:
data = [
    go.Bar(
        x=wins_loss_df['Year'],
        y=wins_loss_df['Wins'],
        opacity=0.75,
        name = "Wins",
        marker=dict(            
            color="rgb(255, 153, 51)")
    ),
    go.Bar(
        x=wins_loss_df['Year'],
        y=wins_loss_df['Losses'],
        opacity=0.75,
        name = "Losses",
        marker=dict(            
            color="rgb(0, 0, 0)")
    )

]

layout = go.Layout(
    barmode='stack',
    title='SF Giants Wins and Losses throughout the Years',
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

# League ERA vs SF Giants team ERA by year

Now lets compare how well the Giants' team ERA (Earned Run Average) stacks up against the league average (Both NL and AL) for each season. Should be interesting!

First we begin by running a query for the average ERA from the entire league for 1958 and beyond. Then we run another SQL query to get the Giants' ERA.

Finally we append the two dataframes together.

In [34]:
league_era_df = pd.read_sql_query("SELECT year,AVG(era) AS League_ERA_Avg FROM team WHERE year>=1958 Group by Year;", conn)
sf_era_df = pd.read_sql_query("SELECT era AS 'SF ERA' FROM team WHERE team_id = 'SFN';", conn)

In [35]:
league_era_df

Unnamed: 0,year,League_ERA_Avg
0,1958,3.86
1,1959,3.908125
2,1960,3.818125
3,1961,4.028889
4,1962,3.958
5,1963,3.46
6,1964,3.584
7,1965,3.501
8,1966,3.522
9,1967,3.3055


In [36]:
era_df = pd.concat([league_era_df, sf_era_df], axis=1)

In [37]:
era_df

Unnamed: 0,year,League_ERA_Avg,SF ERA
0,1958,3.86,3.98
1,1959,3.908125,3.47
2,1960,3.818125,3.44
3,1961,4.028889,3.77
4,1962,3.958,3.79
5,1963,3.46,3.35
6,1964,3.584,3.19
7,1965,3.501,3.2
8,1966,3.522,3.24
9,1967,3.3055,2.92


In [40]:
data = [
    go.Bar(
        x=era_df['year'],
        y=era_df['SF ERA'],
        opacity=0.75,
        name = "SF ERA",
        marker=dict(            
            color="rgb(255, 153, 51)")
    ),
    go.Bar(
        x=era_df['year'],
        y=era_df['League_ERA_Avg'],
        opacity=0.75,
        name = "League ERA",
        marker=dict(            
            color="rgb(0, 0, 0)")
    )

]

layout = go.Layout(
    barmode='grouped',
    title='SF Giants Wins and Losses throughout the Years',
        xaxis = dict(title = 'Year'),
        yaxis = dict(title = 'Earned Run Average')
)

fig = go.Figure(data=data, layout=layout)
py.iplot(fig)

To be continued