# Problem 1:  BST Traversal
This problem builds on Problem 1 of Homework 7 in which you wrote a binary search tree.

### Part 1

As discussed in lecture, three different types to do a depth-first traversal are: preorder, inorder, and postorder. Here is a reference: [Tree Traversal](https://en.wikipedia.org/wiki/Tree_traversal#Depth-first_search).

Write an iterator class called `DFSTraversal` with the following specifications:

* `__init__(self, tree, traversalType)`: Constructor takes a `BinaryTree` object and one of the enums from `DFSTraversalTypes`

```python
from enum import Enum

class DFSTraversalTypes(Enum):
    PREORDER = 1
    INORDER = 2
    POSTORDER = 3
```

* `changeTraversalType(self, traversalType)`: Change the traversal type
* `__iter__(self)`: This is the initialization of an iterator
* `__next__(self)`: This is called in the iterator for getting the next value

Here's how you might use your `DFSTraversal` class:

```python
input_array = [3, 9, 2, 11]
bt = BinaryTree()
for val in input_array:
    bt.insert(val)
traversal = DFSTraversal(bt, DFSTraversalTypes.INORDER)
for val in traversal:
    print(val)
2
3
9
11
```

### Part 2
Put your `BinaryTree` class (from homework 7) and your `DFSTraversal` class (from Part 1 of this homework) in a file titled `TreeTraversal.py`.

In [1]:
# from HW7
class TreeNode:
    def __init__(self, val):
        self.left = None
        self.right = None
        self.val = val
        self.parent = None

class BinaryTree:
    def __init__(self):
        self.root = None
    
    def insert(self, val):     
        if self.root is None:
            self.root = TreeNode(val)
        else:
            self._insert(val, self.root)
    
    def _insert(self, val, node):
        if val <= node.val:
            if node.left is not None:
                self._insert(val, node.left)
            else:
                node.left = TreeNode(val)
                node.left.parent = node
        else:
            if node.right is not None:
                self._insert(val, node.right)
            else:
                node.right = TreeNode(val)
                node.right.parent = node
    
    def find(self, val):
        if self.root is None:
            return None
        else:
            return self._find(val, self.root)
    
    def _find(self, val, node):
        if val == node.val:
            return node
        elif val < node.val and node.left is not None:
            return self._find(val, node.left)
        elif val > node.val and node.right is not None:
            return self._find(val, node.right)
        else:
            return None
              
    def getValues(self, depth):
        if self.root is None:
            return []
        else:
            val_list = []
            self._getValues(depth, self.root, val_list)
            return val_list
            
    def _getValues(self, depth, node, vals=[]):
        if depth == 0:
            vals.append(node.val)
        else:
            if node.left is not None:
                self._getValues(depth-1, node.left, vals)
            else:
                for i in range(int(2**(depth-1))):
                    vals.append(None)
            if node.right is not None:
                self._getValues(depth-1, node.right, vals)
            else:
                for i in range(int(2**(depth-1))):
                    vals.append(None)
        return vals
    
    def max_depth(self, root):
        if root is None:
            return 0
        else:
            return max(self.max_depth(root.left), self.max_depth(root.right)) + 1
    def __len__(self):
        return self.max_depth(self.root)

In [2]:
from enum import Enum

class DFSTraversalTypes(Enum):
    PREORDER = 1
    INORDER = 2
    POSTORDER = 3

In [3]:
class DFSTraversal:
    def __init__(self, tree, traversalType):
        self.traversalType = traversalType.name
        self.tree = tree
        
    def changeTraversalType(self, traversalType):
        self.traversalType = traversalType.name
                 
    def preorder(self, node):
        if node is not None:
            yield node.val
            yield from self.preorder(node.left)
            yield from self.preorder(node.right)
                       
    def inorder(self, node):
        if node is not None:
            yield from self.inorder(node.left)
            yield node.val
            yield from self.inorder(node.right)
            
    def postorder(self, node):
        if node is not None:
            yield from self.postorder(node.left)
            yield from self.postorder(node.right)
            yield node.val
        
    def __iter__(self):
        if self.traversalType == 'PREORDER':
            yield from self.preorder(self.tree.root)
        elif self.traversalType == 'INORDER':
            yield from self.inorder(self.tree.root)
        elif self.traversalType == 'POSTORDER':
            yield from self.postorder(self.tree.root)

In [4]:
input_array = [3, 9, 2, 11]
bt = BinaryTree()
for val in input_array:
    bt.insert(val)

traversal = DFSTraversal(bt, DFSTraversalTypes.PREORDER)
for val in traversal:
    print(val)

3
2
9
11


In [5]:
traversal.changeTraversalType(DFSTraversalTypes.INORDER)
for val in traversal:
    print(val)

2
3
9
11


In [6]:
traversal.changeTraversalType(DFSTraversalTypes.POSTORDER)
for val in traversal:
    print(val)

2
11
9
3


In [7]:
traversal.changeTraversalType(DFSTraversalTypes.INORDER)
# traversal.inorder(bt.root)
for val in traversal:
    print(val)

2
3
9
11


In [21]:
bt2 = BinaryTree()
arr = [20, 10, 17, 14, 3, 0]
for i in arr:
    bt2.insert(i)
print("Height of binary tree is {}.\n".format(len(bt2)))
for i in range(len(bt2)):
    print("Level {0} values: {1}".format(i, bt2.getValues(i)))   
traversal2 = DFSTraversal(bt2, DFSTraversalTypes.INORDER)
for val in traversal2:
    print(val)

Height of binary tree is 4.

Level 0 values: [20]
Level 1 values: [10, None]
Level 2 values: [3, 17, None, None]
Level 3 values: [0, None, 14, None, None, None, None, None]
0
3
10
14
17
20


In [20]:
%%file TreeTraversal.py
class TreeNode:
    def __init__(self, val):
        self.left = None
        self.right = None
        self.val = val
        self.parent = None

class BinaryTree:
    def __init__(self):
        self.root = None
    
    def insert(self, val):     
        if self.root is None:
            self.root = TreeNode(val)
        else:
            self._insert(val, self.root)
    
    def _insert(self, val, node):
        if val <= node.val:
            if node.left is not None:
                self._insert(val, node.left)
            else:
                node.left = TreeNode(val)
                node.left.parent = node
        else:
            if node.right is not None:
                self._insert(val, node.right)
            else:
                node.right = TreeNode(val)
                node.right.parent = node
    
    def find(self, val):
        if self.root is None:
            return None
        else:
            return self._find(val, self.root)
    
    def _find(self, val, node):
        if val == node.val:
            return node
        elif val < node.val and node.left is not None:
            return self._find(val, node.left)
        elif val > node.val and node.right is not None:
            return self._find(val, node.right)
        else:
            return None
              
    def getValues(self, depth):
        if self.root is None:
            return []
        else:
            val_list = []
            self._getValues(depth, self.root, val_list)
            return val_list
            
    def _getValues(self, depth, node, vals=[]):
        if depth == 0:
            vals.append(node.val)
        else:
            if node.left is not None:
                self._getValues(depth-1, node.left, vals)
            else:
                for i in range(int(2**(depth-1))):
                    vals.append(None)
            if node.right is not None:
                self._getValues(depth-1, node.right, vals)
            else:
                for i in range(int(2**(depth-1))):
                    vals.append(None)
        return vals
    
    def max_depth(self, root):
        if root is None:
            return 0
        else:
            return max(self.max_depth(root.left), self.max_depth(root.right)) + 1
    def __len__(self):
        return self.max_depth(self.root)
    
from enum import Enum

class DFSTraversalTypes(Enum):
    PREORDER = 1
    INORDER = 2
    POSTORDER = 3
    

class DFSTraversal:
    def __init__(self, tree, traversalType):
        self.traversalType = traversalType.name
        self.tree = tree
        
    def changeTraversalType(self, traversalType):
        self.traversalType = traversalType.name
                 
    def preorder(self, node):
        if node is not None:
            yield node.val
            yield from self.preorder(node.left)
            yield from self.preorder(node.right)
                       
    def inorder(self, node):
        if node is not None:
            yield from self.inorder(node.left)
            yield node.val
            yield from self.inorder(node.right)
            
    def postorder(self, node):
        if node is not None:
            yield from self.postorder(node.left)
            yield from self.postorder(node.right)
            yield node.val
        
    def __iter__(self):
        if self.traversalType == 'PREORDER':
            yield from self.preorder(self.tree.root)
        elif self.traversalType == 'INORDER':
            yield from self.inorder(self.tree.root)
        elif self.traversalType == 'POSTORDER':
            yield from self.postorder(self.tree.root)

Overwriting TreeTraversal.py


## Problem 2: Markov Chains

[Markov Chains](https://en.wikipedia.org/wiki/Markov_chain) are widely used to model and predict discrete events. Underlying Markov chains are Markov processes which make the assumption that the outcome of a future event only depends on the event immediately preceeding it. In this exercise, we will be assuming that weather has Markov properties (e.g. today's weather is dependent only on yesterday's weather). We will use the Markov assumption to create a basic model for predicting weather.

To begin, let's categorize weather into 7 types: ['sunny', 'cloudy', 'rainy', 'snowy', 'windy', 'hailing'].

In the `weather.csv` file accompanying this homework, each row corresponds to one type of weather (in the order given above) and each column is the probability of one type of weather occurring the following day (also in the order given above).

The $ij$th element is the probability that the $j$th weather type occurs after the $i$th weather type. So for example, (1,2) is the probability a cloudy day occurs after a sunny day.

Take a look at the data. Make sure you see how if the previous day was sunny, the following day will have a 0.4 probability of being sunny as well. If the previous day was raining (index $i = 3$), then the following day (index $j$) has a 0.05 probability of being windy ($j = 5$).

In [9]:
import pandas as pd
df = pd.read_csv('weather.csv', header=None)
df

Unnamed: 0,0,1,2,3,4,5
0,0.4,0.3,0.1,0.05,0.1,0.05
1,0.3,0.4,0.1,0.1,0.08,0.02
2,0.2,0.3,0.35,0.05,0.05,0.05
3,0.1,0.2,0.25,0.3,0.1,0.05
4,0.15,0.2,0.1,0.15,0.3,0.1
5,0.1,0.2,0.35,0.1,0.05,0.2


### Part 1:  Parse the `.csv` file into a `Numpy` array

In [10]:
#Load CSV file -- hint: you can use np.genfromtxt()
import numpy as np
data = np.genfromtxt("weather.csv", delimiter=",")
data

array([[ 0.4 ,  0.3 ,  0.1 ,  0.05,  0.1 ,  0.05],
       [ 0.3 ,  0.4 ,  0.1 ,  0.1 ,  0.08,  0.02],
       [ 0.2 ,  0.3 ,  0.35,  0.05,  0.05,  0.05],
       [ 0.1 ,  0.2 ,  0.25,  0.3 ,  0.1 ,  0.05],
       [ 0.15,  0.2 ,  0.1 ,  0.15,  0.3 ,  0.1 ],
       [ 0.1 ,  0.2 ,  0.35,  0.1 ,  0.05,  0.2 ]])

In [11]:
data.shape


(6, 6)

### Part 2:  Create a class called `Markov` that has the following methods:

* `load_data(array)`: loads the Numpy 2D array and stores it as a class variable.
* `get_prob(previous_day, following_day)`: returns the probability of `following_day` weather given `previous_day` weather. 

**Note:** `previous_day` and `following_day` should be passed in string form (e.g. "sunny"), as opposed to an index (e.g. 0). 

In [12]:
class Markov:
    def __init__(self):
        # implement here
        self.weather = {"sunny":0, "cloudy":1, "rainy":2, "snowy":3, "windy":4, "hailing":5}
        
    def load_data(self, array):
        # implement here
        self.array = array
    
    def get_prob(self, previous_day, following_day):
        # implement here -- returns a probability
        i = self.weather[previous_day]
        j = self.weather[following_day]
        return self.array[i,j]
        

In [13]:
m = Markov()
m.load_data(data)
m.get_prob("sunny", "rainy")

0.10000000000000001

In [14]:
m.get_prob("rainy", "windy") 

0.050000000000000003

## Problem 3: Iterators

Iterators are a convenient way to walk along your Markov chain.

#### Part 1: Using your `Markov` class from Problem 3, write `Markov` as an iterator by implementing the `__iter__()` and `__next__()` methods.

Remember:  
* `__iter__()` should return the iterator object and should be implicitly called when the loop begins
* The `__next()__` method should return the next value and is implicitly called at each step in the loop.

Each 'next' step should be stochastic (i.e. randomly selected based on the relative probabilities of the following day weather types) and should return the next day's weather as a string (e.g. "sunny") rather than an index (e.g. 0).

In [15]:
class MarkovIterator:
    def __init__(self, markov):
        self.markov = markov;
        self.current_idx = self.markov.current_idx
        self.table = self.markov.array
        
    def __next__(self):
        rand_num = np.random.random()
        try:
            nextProb = self.table[self.current_idx];
        except IndexError:
            raise StopIteration()
        cdf = np.zeros(nextProb.shape);
        for i in range(nextProb.shape[0]):
            cdf[i] = nextProb[i] + cdf[i-1]
        
#         print(cdf, rand_num)
        next_idx = 0
        for i in range(nextProb.shape[0]):
            if rand_num <= cdf[i]:
                next_idx = i
                break
        
        current_str = self.markov.idx2str[self.current_idx]
        self.current_idx = next_idx
        
        return current_str
        
    
    def __iter__(self):
        return self

class Markov:
    def __init__(self, current_weather = "sunny"):
        self.idx2str = ["sunny", "cloudy", "rainy", "snowy", "windy", "hailing"]
        self.weather = {"sunny":0, "cloudy":1, "rainy":2, "snowy":3, "windy":4, "hailing":5}
        self.current_idx = self.weather[current_weather]

    def load_data(self, array):
        self.array = array
    
    def get_prob(self, previous_day, following_day):
        i = self.weather[previous_day]
        j = self.weather[following_day]
        return self.array[i,j]
    
    def set_current(self, current_weather):
        self.current_idx = self.weather[current_weather]
    
    def __iter__(self):
        return MarkovIterator(self)

In [16]:
m2 = Markov()
m2.load_data(data)
print(m2.current_idx)
m2.set_current("hailing")
print(m2.current_idx)

iter1 = iter(m2)
iter2 = iter(m2)
print(next(iter1))
print(next(iter1))
print(next(iter1))
print(next(iter1))
# print(next(iter(m2)))

0
5
hailing
rainy
rainy
rainy


#### Part 2: We want to predict what weather will be like in a week for 5 different cities.

Now that we have our `Markov` iterator, we can try to predict what the weather will be like in seven days from now.

Given each city's current weather in the dictionary `city_weather` (see below), simulate what the weather will be like in 7 days from now.  Rather than just producing one prediction per city, simulate 100 such predictions per city and store the most commonly occuring prediction.

In your submission, print a dictionary `city_weather_predictions` that has each city as a key and the most commonly predicted weather as the corresponding value.

**Note**: Don't worry if your values don't seem to make intuitive sense.  We made up the weather probabilities.

In [17]:
city_weather = {
    'New York': 'rainy',
    'Chicago': 'snowy',
    'Seattle': 'rainy',
    'Boston': 'hailing',
    'Miami': 'windy',
    'Los Angeles': 'cloudy',
    'San Fransisco': 'windy'
}

In [18]:
from collections import Counter
total_predictions = {}
nPredictions = 7
nSimulations = 100

m1 = Markov()
m1.load_data(data)

for city, weather in city_weather.items():    
    m1.set_current(weather)
    simulations = []
    for i in range(nSimulations):
        predictor = iter(m1)
        next(predictor)
        predictions = []
        for j in range(nPredictions):
            predictions.append(next(predictor))
        simulations.append(predictions)
    total_predictions[city] = simulations
    
city_weather_predictions = {}
for c, w in total_predictions.items():
    most_common = []
    for i in range(nPredictions):
        each_day = [item[i] for item in total_predictions[c]]
        most_common.append(Counter(each_day).most_common(1)[0][0])
    city_weather_predictions[c] = most_common

In [19]:
city_weather_predictions

{'Boston': ['rainy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy',
  'sunny'],
 'Chicago': ['snowy', 'cloudy', 'sunny', 'sunny', 'sunny', 'sunny', 'cloudy'],
 'Los Angeles': ['sunny',
  'cloudy',
  'sunny',
  'cloudy',
  'sunny',
  'cloudy',
  'sunny'],
 'Miami': ['cloudy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy',
  'sunny',
  'cloudy'],
 'New York': ['rainy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy'],
 'San Fransisco': ['windy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy'],
 'Seattle': ['rainy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy',
  'cloudy']}