# Parallel Python

## Question 1

Answer: If I ran first_process.py, A1 would print -- followed by B1, B2, and B3.  This is because, as the documentation states, os.execvp "execute a new program, replacing the current process." More specifically, this would consequently mean that A2 would not print from the first process.  

## Question 2

Answer: The second script will run slower than the first script in part because of the Global Interpretor Lock in Python which is a mutex that makes it so that multiple threads cannot execute at the same time in the same program.  There is also communication overhead involved in each additional thread from the second script -- slowing down the execution of the script.

Another reason that the second script will run slower than the first script becomes more pronounced with increasing values of COUNT.  Specifically, the second script involves taking a COUNT, dividing it by 4, and having each of the four involved threads counting down successively.  An issue is that with larger and larger values of COUNT, the division operation required in each thread before they execute the countdown function takes longer and longer -- ultimately slowing down the whole script.  The nature of division operations is that larger input values take longer to compute.  Referenced StackOverflow for why division of larger numbers is more complex: "Division is an iterative algorithm where the result from the quotient must be shifted to the remainder using a Euclidean measure, whereas, multiplication can be reduced to a (fixed) series of bit manipulation tricks."




## Question 3


### 3A
Answer: I would choose pool.map for this scenario because it returns the results of the functions in an ordered way.  We also want pool.map instead of pool.map_async because it incorporates the ability to conditionally block the next lines of code. Further, pool.map is the best option here because it supports running a function with a single argument -- where everything else in the function is constant.


### 3B
Answer: I would choose pool.starmap_async because it allows for multiple arguments.  Further, the results of the function are returned in an ordered fashion, and has no blocking.  I would not choose pool.map_asynch instead of pool.starmap_async because it does not returned ordered results despite meeting other required conditions in the problem.  

## Question 4

In [45]:
import time
import multiprocessing as mp

### 1)
def adjusted_salary(current_salary, percent_increase, years): 
    resulting_salary = current_salary * (1 + percent_increase)**years
    return resulting_salary
      
### 2)
def calc_pension():
    income_dict = {}
    current_income = 50000

    for age in range(36,76):
        income_dict[age] = adjusted_salary(current_income,.04,1)
        current_income = income_dict[age]

    base_pension_income = {}
    retirement_pension = {}

    for retirement_age in range(45,76):
        _sum = 0
        for step in range(retirement_age,retirement_age-7, -1):
                _sum += income_dict[step]
                base_pension_income[retirement_age] = _sum/7  
### 3)            
        death_age = 56
        for death_age in range(retirement_age,91):
            if death_age == retirement_age:
                total_pension_income = 0
                print("retirement age: ", retirement_age, " death age ", death_age, " total pension is ZERO because retirement and death year are same")
            elif (retirement_age >=45 and retirement_age<55) and (death_age >= retirement_age+1 and death_age>55):
                new_current_income = base_pension_income[retirement_age]
                total_pension_income = .7*adjusted_salary(new_current_income, .02, death_age-55)
                print("retirement age: ", retirement_age, " death age ", death_age, " pension income ", total_pension_income)
            elif death_age <=55:
                print("retirement age: ", retirement_age, " death age ", death_age, "pension income is 0 because of death at or before 55")
            elif retirement_age >=55 and death_age >= retirement_age+1:
                new_current_income = base_pension_income[retirement_age]
                total_pension_income = .7*adjusted_salary(new_current_income, .02, death_age-retirement_age)
                print("retirement age: ", retirement_age, " death age ", death_age, " pension income ", total_pension_income)

            time.sleep(.1)

### 4)
start_time = time.time()
p = mp.Pool(3)
print(p.apply(calc_pension),(),1)
p.close()
p.join()
print("run time:" + str(time.time() - start_time))

### 5) Mp.pool lets me pass in a parameter to create a group of cores that work together
### to execute the script that I have written.  In order to ensure that the workload from 
### above is balanced across the three cores, I passed in 3 into mp.pool and then 
### set chunk size to 1 when using p.apply.  


retirement age:  45  death age  45  total pension is ZERO because retirement and death year are same
retirement age:  45  death age  46 pension income is 0 because of death at or before 55
retirement age:  45  death age  47 pension income is 0 because of death at or before 55
retirement age:  45  death age  48 pension income is 0 because of death at or before 55
retirement age:  45  death age  49 pension income is 0 because of death at or before 55
retirement age:  45  death age  50 pension income is 0 because of death at or before 55
retirement age:  45  death age  51 pension income is 0 because of death at or before 55
retirement age:  45  death age  52 pension income is 0 because of death at or before 55
retirement age:  45  death age  53 pension income is 0 because of death at or before 55
retirement age:  45  death age  54 pension income is 0 because of death at or before 55
retirement age:  45  death age  55 pension income is 0 because of death at or before 55
retirement age:  45

retirement age:  47  death age  65  pension income  60912.31140407446
retirement age:  47  death age  66  pension income  62130.55763215596
retirement age:  47  death age  67  pension income  63373.16878479907
retirement age:  47  death age  68  pension income  64640.63216049506
retirement age:  47  death age  69  pension income  65933.44480370496
retirement age:  47  death age  70  pension income  67252.11369977906
retirement age:  47  death age  71  pension income  68597.15597377464
retirement age:  47  death age  72  pension income  69969.09909325014
retirement age:  47  death age  73  pension income  71368.48107511515
retirement age:  47  death age  74  pension income  72795.85069661745
retirement age:  47  death age  75  pension income  74251.7677105498
retirement age:  47  death age  76  pension income  75736.8030647608
retirement age:  47  death age  77  pension income  77251.53912605601
retirement age:  47  death age  78  pension income  78796.56990857713
retirement age:  47  d

retirement age:  50  death age  52 pension income is 0 because of death at or before 55
retirement age:  50  death age  53 pension income is 0 because of death at or before 55
retirement age:  50  death age  54 pension income is 0 because of death at or before 55
retirement age:  50  death age  55 pension income is 0 because of death at or before 55
retirement age:  50  death age  56  pension income  57332.85274647776
retirement age:  50  death age  57  pension income  58479.509801407316
retirement age:  50  death age  58  pension income  59649.09999743547
retirement age:  50  death age  59  pension income  60842.08199738417
retirement age:  50  death age  60  pension income  62058.92363733186
retirement age:  50  death age  61  pension income  63300.10211007849
retirement age:  50  death age  62  pension income  64566.104152280066
retirement age:  50  death age  63  pension income  65857.42623532566
retirement age:  50  death age  64  pension income  67174.57476003219
retirement age: 

retirement age:  52  death age  87  pension income  114571.22457240037
retirement age:  52  death age  88  pension income  116862.6490638484
retirement age:  52  death age  89  pension income  119199.90204512535
retirement age:  52  death age  90  pension income  121583.90008602788
retirement age:  53  death age  53  total pension is ZERO because retirement and death year are same
retirement age:  53  death age  54 pension income is 0 because of death at or before 55
retirement age:  53  death age  55 pension income is 0 because of death at or before 55
retirement age:  53  death age  56  pension income  64491.66207181399
retirement age:  53  death age  57  pension income  65781.49531325027
retirement age:  53  death age  58  pension income  67097.12521951529
retirement age:  53  death age  59  pension income  68439.06772390558
retirement age:  53  death age  60  pension income  69807.84907838369
retirement age:  53  death age  61  pension income  71204.00605995137
retirement age:  53 

retirement age:  56  death age  56  total pension is ZERO because retirement and death year are same
retirement age:  56  death age  57  pension income  72544.34896474896
retirement age:  56  death age  58  pension income  73995.23594404393
retirement age:  56  death age  59  pension income  75475.14066292482
retirement age:  56  death age  60  pension income  76984.64347618331
retirement age:  56  death age  61  pension income  78524.33634570698
retirement age:  56  death age  62  pension income  80094.82307262112
retirement age:  56  death age  63  pension income  81696.71953407355
retirement age:  56  death age  64  pension income  83330.65392475502
retirement age:  56  death age  65  pension income  84997.26700325013
retirement age:  56  death age  66  pension income  86697.21234331514
retirement age:  56  death age  67  pension income  88431.15659018143
retirement age:  56  death age  68  pension income  90199.77972198506
retirement age:  56  death age  69  pension income  92003.7

retirement age:  59  death age  72  pension income  103491.73471753486
retirement age:  59  death age  73  pension income  105561.56941188556
retirement age:  59  death age  74  pension income  107672.80080012328
retirement age:  59  death age  75  pension income  109826.25681612574
retirement age:  59  death age  76  pension income  112022.78195244826
retirement age:  59  death age  77  pension income  114263.23759149722
retirement age:  59  death age  78  pension income  116548.50234332717
retirement age:  59  death age  79  pension income  118879.47239019373
retirement age:  59  death age  80  pension income  121257.06183799758
retirement age:  59  death age  81  pension income  123682.20307475755
retirement age:  59  death age  82  pension income  126155.8471362527
retirement age:  59  death age  83  pension income  128678.96407897775
retirement age:  59  death age  84  pension income  131252.54336055732
retirement age:  59  death age  85  pension income  133877.59422776848
retirem

retirement age:  63  death age  69  pension income  105399.32303353417
retirement age:  63  death age  70  pension income  107507.30949420485
retirement age:  63  death age  71  pension income  109657.45568408897
retirement age:  63  death age  72  pension income  111850.60479777073
retirement age:  63  death age  73  pension income  114087.61689372617
retirement age:  63  death age  74  pension income  116369.36923160068
retirement age:  63  death age  75  pension income  118696.75661623268
retirement age:  63  death age  76  pension income  121070.69174855735
retirement age:  63  death age  77  pension income  123492.1055835285
retirement age:  63  death age  78  pension income  125961.94769519908
retirement age:  63  death age  79  pension income  128481.18664910305
retirement age:  63  death age  80  pension income  131050.81038208512
retirement age:  63  death age  81  pension income  133671.8265897268
retirement age:  63  death age  82  pension income  136345.26312152136
retireme

retirement age:  67  death age  82  pension income  147357.6627455009
retirement age:  67  death age  83  pension income  150304.8160004109
retirement age:  67  death age  84  pension income  153310.91232041916
retirement age:  67  death age  85  pension income  156377.13056682752
retirement age:  67  death age  86  pension income  159504.67317816408
retirement age:  67  death age  87  pension income  162694.7666417274
retirement age:  67  death age  88  pension income  165948.66197456192
retirement age:  67  death age  89  pension income  169267.63521405315
retirement age:  67  death age  90  pension income  172652.98791833423
retirement age:  68  death age  68  total pension is ZERO because retirement and death year are same
retirement age:  68  death age  69  pension income  116145.83996758012
retirement age:  68  death age  70  pension income  118468.7567669317
retirement age:  68  death age  71  pension income  120838.13190227035
retirement age:  68  death age  72  pension income 

retirement age:  73  death age  73  total pension is ZERO because retirement and death year are same
retirement age:  73  death age  74  pension income  141309.17329824233
retirement age:  73  death age  75  pension income  144135.35676420716
retirement age:  73  death age  76  pension income  147018.06389949133
retirement age:  73  death age  77  pension income  149958.42517748114
retirement age:  73  death age  78  pension income  152957.59368103076
retirement age:  73  death age  79  pension income  156016.74555465137
retirement age:  73  death age  80  pension income  159137.08046574442
retirement age:  73  death age  81  pension income  162319.82207505932
retirement age:  73  death age  82  pension income  165566.21851656048
retirement age:  73  death age  83  pension income  168877.54288689172
retirement age:  73  death age  84  pension income  172255.09374462953
retirement age:  73  death age  85  pension income  175700.19561952213
retirement age:  73  death age  86  pension inc

## Question 5


In [None]:
import pandas as pd
import numpy as np
import time
import multiprocessing as mp

#Referenced https://machinelearningmastery.com/implement-decision-tree-algorithm-scratch-python/
#to inform approach for building decision tree

# Function for splitting a dataset based on an attribute's index
# and the value of that attribute
def split_dataset(feature_index, val, data):
    left_split_group = []
    right_split_group = []
    for row in data:
        if row[feature_index] < val:
            left_split_group.append(row)
        else:
            right_split_group.append(row)
    return left_split_group, right_split_group

# Function for calculating the mean square error of the split dataset
# Using MSE for the splitting criteria of my decision tree implementation
def calc_mean_squared_error(a, b):
    mse = np.square(a - b).mean()
    return mse

# Function for evaluating the optimal place to split the dataset
def evaluate_split(data):
    outcome_list = []
    for row in data:
        outcome_var_vals = row[-1]   # Assuming that the last column in dataset is outcome var
        outcome_list.append(outcome_var_vals)
        
    outcome_set = set(outcome_list)
        
    for feature_index in range(len(data[0]-1)):
        for rows in data:
            split_data = split_dataset(feature_index, row[feature_index], data)
            current_mse_score = mse_score(split_data,outcome_set)
            if row[feature_index] < current_mse_score:
                evaluate_split(data)
                
def stopping_criterion(data, node, max_depth):
    if len(node) >= max_depth: 
        return True
    else:
        return False
    
def tree_builder(data, node, max_depth):
    for row in data:
        if stopping_criterion(row, node, max_depth) is False:
            p = Process(target=evaluate_split, args=(row)) 
            ##recursively split at line above
            #multiprocessing there as well - at the point where each split occurs
            p.close()
            p.join()
        else:
            break
        
#Note: was unable to get a fully working script for the tree_builder function
#I provided pseudo-code and comments to explain the approach I tried.


## Question 5 Response

### You may realize that there are multiple approaches to parallelizing a single decision tree. 
#### Consider and state (3-5 sentences) a few of the tradeoffs of your approach.

I chose to approach my decision tree implementation by parallizing the split operations in the recursive building of the tree -- where the function calculates the optimal feature point to split each node of the tree in.  In order to find the optimal split point, my code calculates the mean square error between the relevant groups at each node -- and splits in parallel based on calculated thresholds for mean squared error at each level. There are some tradeoffs associated with my approach -- for example, there may be load balance issues between parallel processes of different operational complexity.  Another potential tradeoff is that there may be relatively high communication overhead involved in my approach because it calculates the mean squared error at every node (regardless of the size of the nodes themselves).  
    