# Data 602: Homework 12
#### Author: Aaron Grzasko
#### Date: 5/8/2017

# Assignment Overview
This homework will get your feet wet with some basic parallel computing approaches.  
  
 Do both of the following:

# Part 1

(1) Take your solution from Homework 11 and complete the Monte Carlo step (step 6) in parallel.  There are many ways you can go about doing this, and I'm not looking for anything too complicated.  If you can get multiple processes crunching the data together, that is great.  Using IPythonâ€™s built-in tools would be a great method.

*Note that the following command was entered into the terminal before firing up the jupyter notebook:*  

ipcluster start -n 4

*Let's import the relevant modules and run setup scripts for parallel processing:*

In [1]:
# set up for parallel processing
from ipyparallel import Client
clients = Client()
clients.block = True
dview = clients.direct_view()

# import modules
import pandas as pd
import numpy as np  # numpy will used in each core

*Now, read in the data:*

In [2]:
# read apple stock data
apple_url = 'https://raw.githubusercontent.com/spitakiss/DATA602_Work/master/Homework11/apple.2011.csv'
apple = pd.read_csv(apple_url, na_values="XXXXX")

# rename third column 
apple.rename(columns={'Unnamed: 2':'pchg'}, inplace=True)

*Initialize the relevant constants, parameters, and functions from the Homework 11 Monte Carlo simulation:*  

In [3]:
# mean daily return
mean_ret = apple['pchg'].mean()

# std deviation of daily return
std_ret = apple['pchg'].std()

# starting price
p0 = float(apple["Last"].iloc[-1:])

# stochastically generated stock price after n days
# inputs: initial price, number of days, mean, std
def price_gen(beg_price, days, mean, std):
    return beg_price * np.prod(np.random.normal(loc=mean,scale=std,size=days)+1)

# Simulate stock price generation x times (e.g. 10,000)
# inputs: number of simulations, price generating function passed as string
def sim_stock(num_sims,price_func):
    sim_result = []
    for i in range(num_sims):
        sim_result.append(round(eval(price_func),2))
        
    return sim_result


*Run the Monte Carlo simulation 10,000 times, with each engine processing 2,500 simulations.*  

In [4]:
# distribute 10,000 simulations evenly among four engines (i.e. 2,500 each) 
my_sims = dview.apply_sync(sim_stock,2500,"price_gen(p0,20,mean_ret,std_ret)")

# flatten list of lists into one list
my_sims = [item for sublist in my_sims for item in sublist]

# verify that a total of 10,000 simulations were run
print len(my_sims)

10000


# Part 2

(2) Compare the timing for your solution in homework 11 and this parallel solution.  This is similar to what you did in homeworks 6 and 7.  Ideally, you'll see some speed improvement.  The amount you see will largely be based the capabilities of your hardware, and less on the software implementation.  There is additional overhead for running an operation in parallel, so speed gains will be more obvious with a larger number of calculations.

*Let's time ten loops of the non-parallel and parallel solutions, respectively.*

In [5]:
%%timeit -n 10

# stock price simulation, non-parallel solution. Run 10,000 simulations
my_sims = sim_stock(10000,"price_gen(p0,20,mean_ret,std_ret)")

10 loops, best of 3: 302 ms per loop


In [6]:
%%timeit -n 10

# stock price simulation, parallel solution.  Run 2,500 sims per engine
my_sims = dview.apply_sync(sim_stock,2500,"price_gen(p0,20,mean_ret,std_ret)")

# flatten results into one list
my_sims = [item for sublist in my_sims for item in sublist]


10 loops, best of 3: 120 ms per loop
