# Homework 8
## Due Date:  Tuesday, October 31st at 11:59 PM

# Problem 1:  BST Traversal
This problem builds on Problem 1 of Homework 7 in which you wrote a binary search tree.

### Part 1

As discussed in lecture, three different types to do a depth-first traversal are: preorder, inorder, and postorder. Here is a reference: [Tree Traversal](https://en.wikipedia.org/wiki/Tree_traversal#Depth-first_search).

Write an iterator class called `DFSTraversal` with the following specifications:

* `__init__(self, tree, traversalType)`: Constructor takes a `BinaryTree` object and one of the enums from `DFSTraversalTypes`

```python
from enum import Enum

class DFSTraversalTypes(Enum):
    PREORDER = 1
    INORDER = 2
    POSTORDER = 3
```

* `changeTraversalType(self, traversalType)`: Change the traversal type
* `__iter__(self)`: This is the initialization of an iterator
* `__next__(self)`: This is called in the iterator for getting the next value

Here's how you might use your `DFSTraversal` class:

```python
input_array = [3, 9, 2, 11]
bt = BinaryTree()
for val in input_array:
    bt.insert(val)
traversal = DFSTraversal(bt, DFSTraversalTypes.INORDER)
for val in traversal:
    print(val)
2
3
9
11
```

### Part 2
Put your `BinaryTree` class (from homework 7) and your `DFSTraversal` class (from Part 1 of this homework) in a file titled `TreeTraversal.py`.

---

## Problem 2: Markov Chains

[Markov Chains](https://en.wikipedia.org/wiki/Markov_chain) are widely used to model and predict discrete events. Underlying Markov chains are Markov processes which make the assumption that the outcome of a future event only depends on the event immediately preceeding it. In this exercise, we will be assuming that weather has Markov properties (e.g. today's weather is dependent only on yesterday's weather). We will use the Markov assumption to create a basic model for predicting weather.

To begin, let's categorize weather into 7 types: ['sunny', 'cloudy', 'rainy', 'snowy', 'windy', 'hailing'].

In the `weather.csv` file accompanying this homework, each row corresponds to one type of weather (in the order given above) and each column is the probability of one type of weather occurring the following day (also in the order given above).

The $ij$th element is the probability that the $j$th weather type occurs after the $i$th weather type. So for example, (1,2) is the probability a cloudy day occurs after a sunny day.

Take a look at the data. Make sure you see how if the previous day was sunny, the following day will have a 0.4 probability of being sunny as well. If the previous day was raining (index $i = 3$), then the following day (index $j$) has a 0.05 probability of being windy ($j = 5$).

### Part 1:  Parse the `.csv` file into a `Numpy` array

In [1]:
import numpy as np
data = np.genfromtxt(r'weather.csv', delimiter=',')
print('Weather Data:\n\n{}'.format(data))

Weather Data:

[[ 0.4   0.3   0.1   0.05  0.1   0.05]
 [ 0.3   0.4   0.1   0.1   0.08  0.02]
 [ 0.2   0.3   0.35  0.05  0.05  0.05]
 [ 0.1   0.2   0.25  0.3   0.1   0.05]
 [ 0.15  0.2   0.1   0.15  0.3   0.1 ]
 [ 0.1   0.2   0.35  0.1   0.05  0.2 ]]


### Part 2:  Create a class called `Markov` that has the following methods:

* `load_data(array)`: loads the Numpy 2D array and stores it as a class variable.
* `get_prob(previous_day, following_day)`: returns the probability of `following_day` weather given `previous_day` weather. 

**Note:** `previous_day` and `following_day` should be passed in string form (e.g. "sunny"), as opposed to an index (e.g. 0). 




In [2]:
class Markov:
    """Markov process to predict weather.

    Attributes
    ----------
    data : 2D array
        data[i, j] is the probability that the jth  weather type occurs
        after the  ith weather type.
    weather_types : Dict[str, int]
        Maps string representing weather types to the row (column) number
        it is stored in within self.data. Defaults to class attribute
        DEFAULT_WEATHER_TYPES.
    """
    DEFAULT_WEATHER_TYPES = {'sunny':0, 'cloudy':1, 'rainy':2, 'snowy':3,
                             'windy':4, 'hailing':5}

    def __init__ (self):
        self.data = None
        self.weather_types = self.DEFAULT_WEATHER_TYPES

    def load_data (self, array):
        """Ensures array is correctly dimensioned before storing it in data
        attribute.
        """
        data = np.array(array)
        rows, cols = data.shape
        if rows != cols:
            raise ValueError('Data should contain same number of rows as '
                             'columns.')
        if rows != len(self.DEFAULT_WEATHER_TYPES):
            raise ValueError(
                  'Data dimensions not equal to number of expected weather '
                  'types.')
        self.data = array

    def get_prob (self, previous_day, following_day):
        row = self.DEFAULT_WEATHER_TYPES[previous_day]
        col = self.DEFAULT_WEATHER_TYPES[following_day]
        return self.data[row, col]
    
def test_get_prob ():
    markov = Markov()
    data = np.genfromtxt(r'weather.csv', delimiter=',')
    markov.load_data(data)
    expected = 0.15
    actual = markov.get_prob('windy', 'snowy')
    np.testing.assert_almost_equal(expected, actual)
    print('get_prob() method works successfully.')
    
test_get_prob()

get_prob() method works successfully.


---

## Problem 3: Iterators

Iterators are a convenient way to walk along your Markov chain.

#### Part 1: Using your `Markov` class from Problem 3, write `Markov` as an iterator by implementing the `__iter__()` and `__next__()` methods.

Remember:  
* `__iter__()` should return the iterator object and should be implicitly called when the loop begins
* The `__next()__` method should return the next value and is implicitly called at each step in the loop.

Each 'next' step should be stochastic (i.e. randomly selected based on the relative probabilities of the following day weather types) and should return the next day's weather as a string (e.g. "sunny") rather than an index (e.g. 0).

In [3]:
class Markov:
    """Predicts weather using day-by-day Markov process.

    Attributes
    ----------
    data : 2D array
        data[i, j] is the probability that the jth  weather type occurs
        after the  ith weather type.
    weather_types : Dict[str, int]
        Maps string representing weather types to the row (column) number
        it is stored in within self.data. Defaults to class attribute
        DEFAULT_WEATHER_TYPES.
    current_weather : str
        Used when iterating to store the predicted weather. Must be
        initialized before iteration can occur or RuntimeError is raised.
    """
    DEFAULT_WEATHER_TYPES = {'sunny':0, 'cloudy':1, 'rainy':2, 'snowy':3,
                             'windy':4, 'hailing':5}

    def __init__ (self):
        self.data = None
        self.weather_types = self.DEFAULT_WEATHER_TYPES
        self.current_weather = None

    def load_data (self, array):
        """Ensures array is correctly dimensioned before storing it in data
        attribute.
        """
        data = np.array(array)
        rows, cols = data.shape
        if rows != cols:
            raise ValueError('Data should contain same number of rows as '
                             'columns.')
        if rows != len(self.DEFAULT_WEATHER_TYPES):
            raise ValueError(
                  'Data dimensions not equal to number of expected weather '
                  'types.')
        self.data = array

    def get_prob (self, previous_day, following_day):
        row = self.DEFAULT_WEATHER_TYPES[previous_day]
        col = self.DEFAULT_WEATHER_TYPES[following_day]
        return self.data[row, col]

    def __iter__ (self):
        if self.current_weather is None:
            raise RuntimeError('current_weather weather needs to be set '
                               'before iterating.')
        return self

    def __next__ (self):
        """Returns simulated next-day weather."""
        self.current_weather = self.simulate_next_day(self.current_weather)
        return self.current_weather

    def simulate_next_day (self, current_weather):
        """Returns random next-day weather given current_weather and the
        associated conditional probabilities from self.data.
        """
        # Get relative next-day weather type probabilities.
        row_index = self.weather_types[current_weather]
        next_day_prob = self.data[row_index, :]

        # Get random number from [0, 1] and see where it lies along the
        # next-day probability continuum.
        random_prob = np.random.uniform(0, 1)
        cumulative_prob = 0
        # col represents both the index of our next-day probabilities list as
        # we iterate through the list as well as the corresponding column in
        # self.data.
        for col, prob in enumerate(next_day_prob):
            cumulative_prob += prob
            if random_prob <= cumulative_prob:
                weather = self.weather_str_for_index(col)
                return weather
        raise RuntimeError('Failed to simulate next-day weather.')

    def weather_str_for_index (self, index):
        """Returns string representation of weather for row/column index
        using weather_types attribute.
        """
        for key, value in self.weather_types.items():
            if value == index:
                return key
        raise RuntimeError('Unable to map index {} to a weather '
                           'type.'.format(index))

#### Part 2: We want to predict what weather will be like in a week for 5 different cities.

Now that we have our `Markov` iterator, we can try to predict what the weather will be like in seven days from now.

Given each city's current weather in the dictionary `city_weather` (see below), simulate what the weather will be like in 7 days from now.  Rather than just producing one prediction per city, simulate 100 such predictions per city and store the most commonly occuring prediction.

In your submission, print a dictionary `city_weather_predictions` that has each city as a key and the most commonly predicted weather as the corresponding value.

**Note**: Don't worry if your values don't seem to make intuitive sense.  We made up the weather probabilities.

In [4]:
def predict_weather_in_x_days (markov, x):
    """Returns expected weather in x days by iterating over markov object x
    times.
    """
    if x < 1:
        raise ValueError('x must be >= 1')
    day = 0
    for weather in markov:
        day += 1
        if day == x:
            return weather


def most_common (lst):
    """Returns most common item in list. From stackoverflow.com"""
    return max(set(lst), key=lst.count)


# Init Markov object.
markov = Markov()
data = np.genfromtxt(r'weather.csv', delimiter=',')
markov.load_data(data)

city_weather = {
    'New York':'rainy',
    'Chicago':'snowy',
    'Seattle':'rainy',
    'Boston':'hailing',
    'Miami':'windy',
    'Los Angeles':'cloudy',
    'San Fransisco':'windy'
    }

# Create dict to store simulation results.
city_weather_sim = {}
for city in city_weather:
    city_weather_sim[city] = []

# Run 100 simulations of forecasting weather in 7 days.
fcst_days = 7
for i in range(100):
    for city, init_weather in city_weather.items():
        markov.current_weather = init_weather
        prediction = predict_weather_in_x_days(markov, fcst_days)
        city_weather_sim[city].append(prediction)

# Create dict to store most common forecast for each city.
city_weather_predictions = {}
for city, simulations in city_weather_sim.items():
    city_weather_predictions[city] = most_common(simulations)

print('Weather predictions:')
print(city_weather_predictions)

Weather predictions:
{'Seattle': 'cloudy', 'Boston': 'cloudy', 'Chicago': 'cloudy', 'Los Angeles': 'cloudy', 'San Fransisco': 'cloudy', 'Miami': 'cloudy', 'New York': 'cloudy'}
