## Introduction

A Monte Carlo simulation of a Markov Chain.  

I once heard something to the effect that if you have a given set of states with given transition probabilities, then the average time spent in each state and the likelihood of being in a given state at the end is independent of which start you start in.

In this toy example, there is a house with each room being adjacent to other rooms in the house.  I run simulation with many "chains" with each chain representing moving through the house to X number of rooms.  I track the list of rooms visited and also the last room visited.  A random room is chosen to start each chain.

### Hypothesis 1:

No matter what room you start in, the proportion in each room will be roughly constant.

### Hypothesis 2:

If you look at the last room in the chain, the proportions of those last rooms will be roughly equal to the overall proportions of visits to each room.

This was to test that.

In [1]:
from prob_dist import *
import pandas as pd

In [2]:
DEFAULT_CHAIN_LENGTH = 100000
DEFAULT_NUM_CHAINS = 1000

Rooms in the house

In [14]:
#rooms in house 1
PANTRY = 'pantry'
KITCHEN = 'kitchen'
SCHOOL_ROOM = 'school room'
DEN = 'den'
ENTRY_HALL = 'entry hall'
OFFICE = 'office'
HALL = 'hall'
HALL_BATHROOM = 'hall bathroom'
BOY_BEDROOM = 'boy bedroom'
GIRL_BEDROOM = 'girl bedroom'
MASTER_BEDROOM = 'master bedroom'
MASTER_BATHROOM = 'master bathroom'

In [15]:
kitchen_probs = ProbDist({SCHOOL_ROOM: 1, PANTRY: 3, DEN: 6}, id=KITCHEN)
pantry_probs = ProbDist({KITCHEN: 1}, id=PANTRY)
school_room_probs = ProbDist({KITCHEN: 2, OFFICE: 1}, id=SCHOOL_ROOM)
office_probs = ProbDist({SCHOOL_ROOM: 2, ENTRY_HALL: 3}, id=OFFICE)
entry_hall_probs = ProbDist({OFFICE: 1, DEN: 1}, id=ENTRY_HALL)
den_probs = ProbDist({KITCHEN: 4, ENTRY_HALL: 2, HALL: 4}, id=DEN)
hall_probs = ProbDist({MASTER_BEDROOM: 5, DEN: 5}, id=HALL)
master_bedroom_probs = ProbDist({HALL: 6, MASTER_BATHROOM: 4}, id=MASTER_BEDROOM)
master_batrhoom_probs = ProbDist({MASTER_BEDROOM: 1}, id=MASTER_BATHROOM)

In [16]:
dist_map = {
    KITCHEN: kitchen_probs,
    PANTRY: pantry_probs,
    SCHOOL_ROOM: school_room_probs,
    OFFICE: office_probs,
    ENTRY_HALL: entry_hall_probs,
    DEN: den_probs,
    HALL: hall_probs,
    MASTER_BEDROOM: master_bedroom_probs,
    MASTER_BATHROOM: master_batrhoom_probs
}

In [43]:
def make_one_chain(chain_length=DEFAULT_CHAIN_LENGTH):
    """
    Make a chain, meaning a path from one room to the next, with chain_length transitions.
    Basically, simulate the person walking through the house going into room after room 100000 or however many times.
    When in a given room, the room they go into next is determined by the transition probabilities.
    :param chain_length:
    :return:
    """
    current_room = np.random.choice([k for k, v in dist_map.items()])

    chain = [current_room]
    for i in range(chain_length):
        current_room = dist_map[current_room].get_random_value()[0]
        chain.append(current_room)

    return pd.Series(chain, name=current_room)

In [56]:
def do_sim_comparison(num_chains=DEFAULT_NUM_CHAINS, chain_length=DEFAULT_CHAIN_LENGTH):
    """
    Make a bunch of chains and print the results of each.
    :param num_chains:
    :param chain_length:
    :return:
    """
    ending_rooms = []
    all_stats = None #make_one_chain(chain_length)
    #print(all_stats)
    for i in range(num_chains):
        print()
        chain = make_one_chain(chain_length)
        print(pd.Series(chain).value_counts())

Now if you run it a few times, the number of visits to each room should not change a whole lot, if the hypothesis is correct.

In [57]:
do_sim_comparison(num_chains=10, chain_length=10000)


den                2275
hall               1795
kitchen            1640
master bedroom     1476
entry hall          789
master bathroom     596
office              552
pantry              482
school room         396
Name: hall, dtype: int64

den                2328
hall               1795
kitchen            1738
master bedroom     1439
entry hall          774
master bathroom     549
pantry              532
office              488
school room         358
Name: den, dtype: int64

den                2266
hall               1768
kitchen            1636
master bedroom     1470
entry hall          819
master bathroom     591
office              574
pantry              470
school room         407
Name: office, dtype: int64

den                2295
hall               1859
kitchen            1550
master bedroom     1526
entry hall          801
master bathroom     618
office              538
pantry              450
school room         364
Name: pantry, dtype: int64

den                2234
hall

Yes, the proportion of times you end in each room does not change a whole lot.  We aren't doing any fancy statistical tests here, but on visual examination Hypothesis 1 is looking good.

In [58]:
def do_sim_aggregate(num_chains=DEFAULT_NUM_CHAINS, chain_length=DEFAULT_CHAIN_LENGTH):
    """
    Make a bunch of chains and aggregate the results.  Result the aggregates of both the number of times 
    visited each room and the number of times it was the last room visited.  Ideally, the two stats 
    should generally agree.
    :param num_chains:
    :param chain_length:
    :return:
    """
    ending_rooms = []
    all_stats = None #make_one_chain(chain_length)
    #print(all_stats)
    for i in range(num_chains):
        #if i % 10 == 0:
        #    print(i)
        chain = make_one_chain(chain_length)
        ending_rooms.append(chain.name)
        if all_stats is None:
            all_stats = chain.value_counts()
        else:
            all_stats += chain.value_counts()
        '''print("current:")
        print(pd.Series(chain).value_counts())
        print()
        print("total:")
        print(all_stats)'''

    print()
    print("\ntotal time in rooms:")
    print(all_stats)
    print("\ntotal percentage of time in rooms:")
    print(all_stats / (num_chains * chain_length) * 100)
    ending_rooms_series = pd.Series(ending_rooms)
    print("\ntotal numbers of ending rooms:")
    print(ending_rooms_series.value_counts())
    print("\ntotal percentages of ending rooms:")
    print(ending_rooms_series.value_counts() / ending_rooms_series.shape[0] * 100)

Now, if you do the simulation, each time the proportion of rooms you end in should be the same.

In [59]:
do_sim_aggregate(num_chains=100, chain_length=10000)



total time in rooms:
den                227636
hall               181946
kitchen            165352
master bedroom     151428
entry hall          76002
master bathroom     60586
pantry              49623
office              50527
school room         37000
Name: den, dtype: int64

total percentage of time in rooms:
den                22.7636
hall               18.1946
kitchen            16.5352
master bedroom     15.1428
entry hall          7.6002
master bathroom     6.0586
pantry              4.9623
office              5.0527
school room         3.7000
Name: den, dtype: float64

total numbers of ending rooms:
den                22
hall               17
master bedroom     15
kitchen            12
entry hall         10
office             10
master bathroom     7
pantry              4
school room         3
dtype: int64

total percentages of ending rooms:
den                22.0
hall               17.0
master bedroom     15.0
kitchen            12.0
entry hall         10.0
office         

In [60]:
do_sim_aggregate(num_chains=1000, chain_length=10000)



total time in rooms:
den                2278750
hall               1825374
kitchen            1649251
master bedroom     1523072
entry hall          754590
master bathroom     610091
office              499026
pantry              495507
school room         365339
Name: den, dtype: int64

total percentage of time in rooms:
den                22.78750
hall               18.25374
kitchen            16.49251
master bedroom     15.23072
entry hall          7.54590
master bathroom     6.10091
office              4.99026
pantry              4.95507
school room         3.65339
Name: den, dtype: float64

total numbers of ending rooms:
den                232
hall               196
master bedroom     143
kitchen            142
entry hall          87
master bathroom     68
office              56
pantry              42
school room         34
dtype: int64

total percentages of ending rooms:
den                23.2
hall               19.6
master bedroom     14.3
kitchen            14.2
entry hall  

The ranking of rooms you ended in is not exactly the same as the total amount of time spent in each room, but is similar.  The proportions of time spent in each room is similar and the places where the order is different is in rooms with less time spent so the sampling size is smaller.

Hypothesis 2 is looking good but needs more simulations before I will say with certainty that it is confirmed.

As the number of chains increases, I would expect the proportion of rooms ended in to converge to the propportion of total visits to each room.


## For further research:
<ul>
    <li>Run the simulation for hypothesis 2 with more chains</li>
    <li>Calculate the theoretical proportions of time spent in each room and check against the experimental values</li>
    <li>A visual, similar to the visuals of Bayesian Networks</li>
    <li>Set up a Bayesian Network and a scenario of a short chain and compute the probability of ending in each room given a certain starting room</li>
</ul>
    