## Step 1
In this first cell we read in our input csv data, create a home dataframe, and an away dataframe. The dataframe content contains the following columns:
- `State` - Win, loss, tie
- `State_Next` - Win, loss, tie (the next games result after this current game)
- `Observation` - Home, Away (did the team play at home or away)


In [463]:
import pandas
import numpy

home_csv='csv/austin.csv'
away_csv='csv/minnesota.csv'

# Define our states and observations
# W=Win, L=Loss, T=Tie
states = ['W', 'L', 'T']
# H=Home game, A=Away game
observations = ['H', 'A']


def get_df(csv):
  """Given an input csv, return a dataframe"""
  # Read csv
  df = pandas.read_csv(csv, sep=',')
  # Determine Home/Away based on opponent name
  df['Observation'] = df.apply(lambda x: 'A' if x['Opponent'].startswith('@') else 'H', axis=1)
  # Drop unneeded columns
  df.drop(columns=["Opponent", "Score", "Season"], inplace=True)
  # Bring up next row state to this row as "StateNext" which represents the HMM state transition
  df['StateNext'] = df['State'].shift(-1)
  return df


# Home and away df
home_df = get_df(home_csv)
away_df = get_df(away_csv)


## Step 2
In this cell we calculate the home team and away teams hidden state transition matrix. The hidden state transition matrix includes the following transitions:
```
  W L T
W x x x
L x x x
T x x x
```

Some of the code may look a little hacky, because some transitions don't exist in the data set, so we have to clean it up / explicitly set things to 0 if that's the case.

In [464]:
def get_transition_matrix(df, label):
  # Count total state values (total wins, total losses, total ties)
  state_series = df['State'].value_counts()
  print(f"{label}: Original total wins, losses, ties")
  print(state_series)
  print()

  # Subtract the last state value from the counts since there is no transition to go to for it
  last_result = df[-1:]["State"].values[0]
  state_series[last_result] = state_series[last_result] - 1
  for state in states:
    if state not in state_series:
      # If a team never won/lost/tied (like Charlotte) set the count to 0
      state_series[state] = 0
  print(f"{label}: Fixed total wins, losses, ties")
  print(state_series)
  print()

  # Get a matrix of all the State -> StateNext combinations
  matrix_states = df.groupby(['State', 'StateNext']).size()
  # Fix any missing transitions
  for state1 in states:
    if state1 not in matrix_states:
      # If a team never won/lost/tied (like Charlotte) just set it to the initial probability
      matrix_states.at[state1] = pandas.Series()
      matrix_states.at[state1, 'W'] = 1/3
      matrix_states.at[state1, 'L'] = 1/3
      matrix_states.at[state1, 'T'] = 1/3
    for state2 in states:
      if state2 not in matrix_states[state1]:
        # If a team never went from one state to another (like Win --> Tie) set it to 0 probability 
        # (because the other states like Win --> Win and Win --> Loss will sum to 1)
        matrix_states.at[state1, state2] = 0
  print(f"{label}: Total number of transitions from one state to another (like Win --> Win, Win --> Loss, etc...)")
  print(matrix_states)
  print()

  # Stupid hack to prevent division by 0, only happens because a team might have never won/lost/tied a game (like Charlotte)
  divisor_w = 1 if state_series['W'] == 0 else state_series['W']
  divisor_l = 1 if state_series['L'] == 0 else state_series['L']
  divisor_t = 1 if state_series['T'] == 0 else state_series['T']
  transition_matrix = numpy.array([[matrix_states['W']['W'] / divisor_w, matrix_states['W']['L'] / divisor_w, matrix_states['W']['T'] / divisor_w],
                                  [matrix_states['L']['W'] / divisor_l, matrix_states['L']['L'] / divisor_l, matrix_states['L']['T'] / divisor_l],
                                  [matrix_states['T']['W'] / divisor_t, matrix_states['T']['L'] / divisor_t, matrix_states['T']['T'] / divisor_t]
                                  ])
  print(f"{label}: Transition matrix")
  print('Format')
  print('   W L T')
  print(' W')
  print(' L')
  print(' T')
  print(transition_matrix)
  print()

  return transition_matrix

home_start_probability = numpy.array([1/3, 1/3, 1/3])
home_transition_matrix = get_transition_matrix(home_df, "Home")

away_start_probability = numpy.array([1/3, 1/3, 1/3])
away_transition_matrix = get_transition_matrix(away_df, "Away")


Home: Original total wins, losses, ties
L    22
W    11
T     6
Name: State, dtype: int64

Home: Fixed total wins, losses, ties
L    22
W    11
T     5
Name: State, dtype: int64

Home: Total number of transitions from one state to another (like Win --> Win, Win --> Loss, etc...)
State  StateNext
L      L            10
       T             4
       W             8
T      L             2
       T             1
       W             2
W      L             9
       T             1
       W             1
dtype: int64

Home: Transition matrix
Format
   W L T
 W
 L
 T
[[0.09090909 0.81818182 0.09090909]
 [0.36363636 0.45454545 0.18181818]
 [0.4        0.4        0.2       ]]

Away: Original total wins, losses, ties
W    15
T    12
L    12
Name: State, dtype: int64

Away: Fixed total wins, losses, ties
W    15
T    12
L    11
Name: State, dtype: int64

Away: Total number of transitions from one state to another (like Win --> Win, Win --> Loss, etc...)
State  StateNext
L      L            4
    

## Step 3
In this cell we calculate the home team and away teams observation transition matrix. The observation transition matrix includes the following transitions:
```
  H A
W x x
L x x
T x x
```

Some of the code may look a little hacky, because some transitions don't exist in the data set, so we have to clean it up / explicitly set things to 0 if that's the case.

In [465]:
def get_observation_matrix(df, label):
  # Count total observation values (total home, away games)
  observation_series = df['Observation'].value_counts()
  # Count total state values (total wins, total losses, total ties)
  observation_state_series = df['State'].value_counts()
  for state in states:
    if state not in observation_state_series:
      # If a team never won/lost/tied (like Charlotte) set the count to 0
      observation_state_series[state] = 0
  print(f"{label}: Total home, away games")
  print(observation_series)
  print()
  print(f"{label}: Total wins, losses, ties")
  print(observation_state_series)
  print()

  # Get a matrix of all the State -> Observation combinations
  matrix_observations = df.groupby(['State', 'Observation']).size()
  # Fix any missing transitions
  for state in states:
    if state not in matrix_observations:
      # If a team never won/lost/tied (like Charlotte) just set it to the initial probability
      matrix_observations.at[state] = pandas.Series()
      matrix_observations.at[state, 'H'] = .5
      matrix_observations.at[state, 'A'] = .5
    for observation in observations:
      if observation not in matrix_observations[state]:
        # If a team never went from one state to another (like Win --> Home) set it to 0 probability
        # (because the other states like Win --> Away will sum to 1)
        matrix_observations.at[state, observation] = 0
  print(f"{label}: Total number of transitions from one state to observation (like Win --> Away etc...)")
  print(matrix_observations)
  print()

  # Stupid hack to prevent division by 0, only happens because a team might have never won/lost/tied a game (like Charlotte)
  divisor_w = 1 if observation_state_series['W'] == 0 else observation_state_series['W']
  divisor_l = 1 if observation_state_series['L'] == 0 else observation_state_series['L']
  divisor_t = 1 if observation_state_series['T'] == 0 else observation_state_series['T']
  observation_matrix = numpy.array([[matrix_observations['W']['H'] / divisor_w, matrix_observations['W']['A'] / divisor_w],
                                    [matrix_observations['L']['H'] / divisor_l, matrix_observations['L']['A'] / divisor_l],
                                    [matrix_observations['T']['H'] / divisor_t, matrix_observations['T']['A'] / divisor_t]
                                  ])
  print(f"{label}: Observation matrix")
  print('Format')
  print('   H A')
  print(' W')
  print(' L')
  print(' T')
  print(observation_matrix)
  print()

  return observation_matrix

home_observation_matrix = get_observation_matrix(home_df, 'Home')
away_observation_matrix = get_observation_matrix(home_df, 'Away')


Home: Total home, away games
H    20
A    19
Name: Observation, dtype: int64

Home: Total wins, losses, ties
L    22
W    11
T     6
Name: State, dtype: int64

Home: Total number of transitions from one state to observation (like Win --> Away etc...)
State  Observation
L      A              14
       H               8
T      A               3
       H               3
W      A               2
       H               9
dtype: int64

Home: Observation matrix
Format
   H A
 W
 L
 T
[[0.81818182 0.18181818]
 [0.36363636 0.63636364]
 [0.5        0.5       ]]

Away: Total home, away games
H    20
A    19
Name: Observation, dtype: int64

Away: Total wins, losses, ties
L    22
W    11
T     6
Name: State, dtype: int64

Away: Total number of transitions from one state to observation (like Win --> Away etc...)
State  Observation
L      A              14
       H               8
T      A               3
       H               3
W      A               2
       H               9
dtype: int64

Away: O

## Step 4
In this cell we calculate the home team and away teams posterior probability for each hidden state, based on whether the next game they play is at home or away.

Once we obtain the posterior probability for each teams hidden states, take the max of the following:
- Sum home team `W` posterior probability with away team `L` posterior probability, divide by 2
- Sum away team `W` posterior probability with home team `L` posterior probability, divide by 2
- Sum home and away team `T` posterior probability, divide by 2

In [466]:
from hmmlearn import hmm
import numpy as np

NEXT_GAME_HOME = np.array([[0]])
NEXT_GAME_AWAY = np.array([[1]])

def get_model(start_probability, transition_matrix, observation_matrix, next_game):
  model = hmm.MultinomialHMM(n_components=3, init_params = '')
  model.startprob_ = start_probability
  model.transmat_ = transition_matrix
  model.emissionprob_ = observation_matrix
  predict_proba = model.predict_proba(next_game)
  predict = model.predict(next_game)
  decode = model.decode(next_game)
  return predict_proba, predict, decode

predict_proba_home, predict_home, decode_home = get_model(home_start_probability, home_transition_matrix, home_observation_matrix, NEXT_GAME_HOME)
predict_proba_away, predict_away, decode_away = get_model(away_start_probability, away_transition_matrix, away_observation_matrix, NEXT_GAME_AWAY)

print(f"Individual predictions...")
print(f"Home: win, loss, tie prediction probability:\n{predict_proba_home}")
print(f"Home prediction:\n{states[predict_home[0]]}")
print(f"Log probability with home prediction:\n{decode_home[0]}, {states[decode_home[1][0]]}")
print()
print(f"Away: win, loss, tie prediction probability:\n{predict_proba_away}")
print(f"Away prediction:\n{states[predict_away[0]]}")
print(f"Log probability with away prediction:\n{decode_away[0]}, {states[decode_away[1][0]]}")
print()
print()

home_win_away_loss_pct = (predict_proba_home[0][0] + predict_proba_away[0][1]) / 2
home_loss_away_win_pct = (predict_proba_home[0][1] + predict_proba_away[0][0]) / 2
tie_pct = (predict_proba_home[0][2] + predict_proba_away[0][2]) / 2

print(f"Final predictions...")
print(f"Home team win and away team loses:\n{home_win_away_loss_pct}")
print(f"Home team loses and away team wins:\n{home_loss_away_win_pct}")
print(f"Tie:\n{tie_pct}")

Individual predictions...
Home: win, loss, tie prediction probability:
[[0.48648649 0.21621622 0.2972973 ]]
Home prediction:
W
Log probability with home prediction:
-1.2992829841302609, W

Away: win, loss, tie prediction probability:
[[0.13793103 0.48275862 0.37931034]]
Away prediction:
L
Log probability with away prediction:
-1.550597412411167, L


Final predictions...
Home team win and away team loses:
0.48462255358807094
Home team loses and away team wins:
0.1770736253494874
Tie:
0.33830382106244183
