# IS602: IPython Part 2  
### Author: Mauricio Alarcon
### November 25, 2015


* Take your solution from Homework 11 and complete the Monte Carlo step (step 6) in parallel.  There are many ways you can go about doing this, and I'm not looking for anything too complicated.  If you can get multiple processes crunching the data together, that is great.  Using IPython’s built-in tools would be a great method

First, let's pre-process the dataset and obtain the parameters for the simulation

In [1]:
import pandas as pd
apple_2011 = pd.read_csv('apple.2011.csv')

apple_2011.columns = ['date', 'last_price','pct_change']
apple_2011[['pct_change']] = apple_2011[['pct_change']].convert_objects(convert_numeric=True)
apple_2011.head()

apple_mean = apple_2011['pct_change'].mean()
apple_sd = apple_2011['pct_change'].std()
apple_last_price = float(apple_2011['last_price'].tail(1))



In [2]:
(apple_mean,apple_sd,apple_last_price)

(0.0009573552071713143, 0.016520556298411322, 405.0)

Let's now define the simulation code that needs to be executed:

In [None]:
simulation_functions = '''
from functools import reduce
import operator
import numpy


def gaussian_daily_change ( mean, sd, days ):
    return(numpy.random.normal(mean,sd,days))

def cummulative_pct_change(mean,sd,days):
    daily_pct_change = gaussian_daily_change(mean, sd, days)
    total_pct_change = reduce(operator.mul, daily_pct_change+1, 1)
    return (total_pct_change)

def repeat_price_est_n_times(price, mean, sd, days, n):
    prices = [1]*n
    return (map(lambda x: float(cummulative_pct_change(mean, sd, days)*price), prices))
'''

simulation_code = simulation_functions + '''

output = repeat_price_est_n_times({}, {}, {}, {}, {})

'''

* Compare the timing for your solution in homework 11 and this parallel solution.  This is similar to what you did in homeworks 6 and 7.  Ideally, you'll see some speed improvement.  The amount you see will largely be based the capabilities of your hardware, and less on the software implementation.  There is additional overhead for running an operation in parallel, so speed gains will be more obvious with a larger number of calculations.

Let's time the execution time for a total of 4,000 experiment repetitions as a single process:

In [None]:
import timeit
execute_this = simulation_code.format(apple_last_price,apple_mean,apple_sd,20,4000)
timeit.timeit(execute_this, number=100)

Now, let's see how long it takes to run the same 4,000 experiments over a cluster of four parallel processes at 1000 experiments per process. Please note that the cluster has been started by executing: ```ipcluster start -n 4```

In [None]:
parallelized_code = '''
import os
import ipyparallel as ipp

clients = ipp.Client()

clients.block = True

print clients.ids

dview = clients.direct_view()

simulation_code = \'\'\'
{}
\'\'\'

dview.execute(simulation_code.format({}, {}, {}, {}, {}))
'''

In [None]:
execute_this = simulation_code.format(apple_last_price,apple_mean,apple_sd,20,1000)
timeit.timeit(execute_this, number=100)

**As we can see from the results, spreading the load over 4 proceses results in an equivalent execution time reduction**