# Lab 1
## State Space Models
## Opening Files


##############################################################################################################

What is a "State Space Model"?

A State is a description of the world or an outcome or a part of a system.

A State might be any of the following:

    "It is raining on the Cal campus today."

    "The last die roll was a 3."

    "The Raiders won the 1980 Superbowl."

    "James Harden will be the 2019-2020 NBA MVP."

So, a "State Space Model" is a model that describes States and the transitions between them.

Many areas of analytic interest will be approachable using State Space modeling methods.


##############################################################################################################

Let's `import` some important `libraries`

What do we mean by **import** and **libraries**?

A "library" is essentially a collection of code. It is usually divided into individual files each containing a specific set of functionality.

The "import" command tells the Python system to read in that library and make the elements ("objects" and "methods") available to us.

Common libraries we'll be using are:

*Datascience*, a Data8-specific data tools library,

*Numpy*, a Python numeric methods and analytic library, and

*Pandas*, a richer toolset for data analysis.


We can assign new names for these libraries when we import them, to make it easier to call their features.



In [1]:
# It's easy to import libraries!!

# Note: To "execute" a block of Python code, hold down <Shift> Enter.
# Notice that this code doesn't return anything

import numpy as np
import datascience as ds
import pandas as pd

Let's build a simple State Space Model.

This Model has two States: "Sunny" and "Raining"

If it's "Sunny" today, then tomorrow it will still be Sunny 80% of the time.
    In this simple example it should be obvious that tomorrow it will be Raining 20% of the time.
    
If it's "Raining" today, then tomorrow it will still be Raining 60% of the time (and, correspondingly, Sunny 40% of the time)

                        TOMORROW
                    Sunny          Raining

        Sunny        0.80            0.20

TODAY

        Raining      0.40            0.60


# This is called the "State Transition Matrix"



# Here's what this looks like graphically

<img src="sunny-raining-state-space.png"/>

## 80% of the time when it's Sunny, it stays Sunny
## And 60% of the time when it's Raining, it stays Raining




In [2]:
# We start by creating the simple 2 x 2 transition matrrix as decribed


state_space_1 = np.array([[.8, .2], [.4, .6]])

state_space_1

array([[0.8, 0.2],
       [0.4, 0.6]])

### There's clearly a tendency for current conditions to "persist"

### It turns out multiplying the transition matrix by itself gives you a window into the future 2 changes ahead...




In [4]:
state_space_1 * state_space_1

array([[0.64, 0.04],
       [0.16, 0.36]])

### OOPS! It also turns out there are multiple ways of multiplying matrices together!

 As we see above, Python supports a mutiplication protocol which is an `element-wise` multiplication

 That means that the result is:
 
 [state_space(1,1) * state_space(1,1),   state_space(1,2) * state_space(1,2)]
 
 [state_space(2,1) * state_space(2,1),   state_space(2,2) * state_space(2,2)]

<br>
<br>

### What we _want_ is a matrix multiplication, also known as the `dot product`

### Here, Result[a,a] = input(a,a) * input(a,a) + input(a,b) * input(b,a)

Result[a,b] = I(a,a) * I(a,b) + I(a,b) * I(b,b)

 Result[a,b] = I(b,a) * I(a,a) + I(b,b) * I(b,a)

 Result[b,b] = I(b,a) * I(a,b) + I(b,b) * I(b,b)


In [5]:
np.dot(state_space_1, state_space_1)

array([[0.72, 0.28],
       [0.56, 0.44]])

In [8]:
# Python also has a generalized ".matrix_power" feature that is equivalent to using ".dot" multiple times
# This feature is part of the "linalg" library of Numpy

np.linalg.matrix_power(state_space_1, 2)

array([[0.72, 0.28],
       [0.56, 0.44]])

<h2> What does this matrix tell us? </h2>


As stated before, this matrix shows us the probabilities of each state the day after tommorow. 

For example, what is the probability that it will be sunny the day after tommorow given that it's sunny today?

Given that it's <b>sunny</b> today there is a .8 chance of being sunny tommorow and then another .8 chance of being sunny the day after that. However there is also a .2 chance that it rains tommorow, and then if it's raining there's a .4 chance of being sunny the day after tommorow. 

So we get `(.8 * .8) + (.2 * .4) = .72` 

Which is the same result as what we get in finding the dot product





In [9]:
#Finally, we won't get into the math here, but the limit of these matrix multiplications 
#yields the probability of finding yourself in any given state at any given time...

np.linalg.matrix_power(state_space_1, 4)

array([[0.6752, 0.3248],
       [0.6496, 0.3504]])

## We see that our simple weather system converges on 67% Sunny, 33% Rainy

In [7]:
np.linalg.matrix_power(state_space_1, 10)

array([[0.66670162, 0.33329838],
       [0.66659676, 0.33340324]])

# Review

### If we knew nothing, what would we guess the odds of it being Rainy tomorrow?

### If we knew it was Sunny today? Rainy?

# We can make this a little more complex

 Let's imagine 3 states now: Sunny, Cloudy, and Raining.



In [12]:
state_space_2 = np.array([[.7, .2, .1], [.2, .6, .2], [.1, .2, .7]])

In [13]:
state_space_2

array([[0.7, 0.2, 0.1],
       [0.2, 0.6, 0.2],
       [0.1, 0.2, 0.7]])

In [14]:
np.linalg.matrix_power(state_space_2, 2)

array([[0.54, 0.28, 0.18],
       [0.28, 0.44, 0.28],
       [0.18, 0.28, 0.54]])

In [15]:
# these symmetric probabilities yield equal time in all states

np.linalg.matrix_power(state_space_2, 15)

array([[0.3335686 , 0.33333298, 0.33309842],
       [0.33333298, 0.33333405, 0.33333298],
       [0.33309842, 0.33333298, 0.3335686 ]])

In [16]:
# changing it up a little changes the steady-state probabilities

state_space_3 = np.array([[.7, .2, .1], [.3, .6, .1], [.1, .3, .6]])
state_space_3

array([[0.7, 0.2, 0.1],
       [0.3, 0.6, 0.1],
       [0.1, 0.3, 0.6]])

In [17]:
np.linalg.matrix_power(state_space_3, 12)

array([[0.43343379, 0.36661504, 0.19995117],
       [0.43341701, 0.36663182, 0.19995117],
       [0.43296228, 0.36684241, 0.20019531]])

## Quick Review

### If today is rainy, what is the probability that the day after tommorow is sunny?

# Let's start to think about what a State Space Model of a baseball inning might look like

## What are the possible States of the "system"?

 Every inning starts with 0 outs and empty bases.
 
 Every inning ends with 3 outs.
 
 Each new batter creates a transition event
 
## Are there other non-batter transitions?

 ### Let's consider the combinations of outs and base-runner situations...
 
 Outs go over the set {0, 1, 2 ,3}
 
 Base-runners go over set {None, 1st, 2nd, 3rd, 1st & 2nd, 2nd & 3rd, Loaded}

## What can we say about these combinations?
## Not all paths are possible...
 You cannot go "backwards" from 1 out to 0 or 2 outs to 1.
 
 Similarly, you cannot go from {0, None} to {0, Loaded}

## Let's consider the available paths from one State, {1, 1st & 2nd}
 `{0, any} not possible`
 
 `{3, none} Hit into DP, Inning over`

## The available paths with 1 and 2 outs are more complicated...

 `{1, None} Home Run! [3 runs in]`
 
 `{1, 1st}  1BH + 2 runs score`
 
 `{1, 2nd}  2BH + 2 runs score`
 
 `{1, 3rd}  3BH + 2 runs score`
 
 `{1, 1st & 2nd}  1BH + 1 run scores`
 
 `{1, 2nd & 3rd}  Double Steal(!)` 
 
 `{1, Loaded}  1BH, Walk, Error`

 `{2, None} 2 runs score + 1 out made on bases`
 
 `{2, 1st}  1BH + 1 run scores + 1 out made on bases`
 
 `{2, 2nd}  2BH + 1 run scores + 1 out made on bases`
 
 `{2, 3rd}  3BH + 1 run scores + 1 out made on bases [other?]`
 
 `{2, 1st & 2nd}  Batter Out, Fielder's Choice`
 
 `{2, 2nd & 3rd}  Sacrifice, Fielder's Choice, Hit + Out on Bases`
 
 `{2, Loaded}  not possible`
 



## There are various "Events" that act over the system to change the State Space.

### Above we reference several, things like "1BH" (Single), "2BH" (Double), "Double Steal", "Sacrifice"

### We can imagine the State Space of an inning evolving along a path like:
 `1) {0, None} The default start condition of the system`
 
 `2) {0, None} --> {0, 2nd}: Leadoff batter gets 2BH`
 
 `3) {0, 2nd} --> {1, 2nd}:  2nd batter makes an out`
 
 `4) {1, 2nd} --> {1, 1st}: 3rd batter gets a 1BH, run scores`
 
 `5) {1, 1st} --> {3, None}: 4th batter hits into DP...All innings end with 3 outs.`


### Note: the State of the system doesn't tell us if any runs have scored, only describes the current Outs / Runners State

### We're not going to concern ourselves with specifically how the system evolves right now

### You can imagine a much more detailed State Space Model that includes, for example, the Ball-Strikes count on the batter.

### Problem 1
`Create a 4-state transition matrix with the following probabilities...`<br>
#### State 1: 90% to 1, 10% to 2, 0% to 3, 0% to 4
#### State 2: 25% to 1, 70% to 2, 5% to 3, 0% to 4
#### State 3: 0% to 1, 5% to 2, 70% to 3, 25% to 4
#### State 4: 0% to 1, 0% to 2, 10% to 3, 90% to 4

<br>
<br>

### Observe this state space over different forward periods <br>
### Describe the behavior over short- and long-run periods <br>
### Are there any real-world state phenomena like this?


### Problem 2

#### Describe a potential State, and Event space for a basketball, or football posession<br>