# Using ML techniques to infer a multiplier

### Scenario

You discover that the number of apple seeds is directly tied to the overall height of the fruit, the seed count just needs to be multiplied by some fixed number. Create a model such that given the number of seeds, you can predict the height of the fruit. ***Use an iterative guessing approach to estimate the value of the multiplier.***

### We use two packages for this
1. random - to generate random numbers
2. numpy - this package handles matricies (or more technically arrays, which may have more dimensions than a matrix)

In [5]:
import numpy as np
import random 

## Part 1 - Set up data

### Randomly select the multiplier
This will be the value the seed count is multiplied by, and the number we're trying to discover
* Select a random number between 10 and 100 (uniform distribution) and set it equal to a variable named "actual_multiplier"

In [19]:
actual_multiplier = random.uniform(10,100)
print(actual_multiplier)

73.35399422268213


### Collect some apple seeds
Collect some samples of apple seeds, and measure the associated fruits
* To start we'll use 10 samples with different numbers of seeds in each sample. Here we'll use numbers 1, 2, ..., 9, 10
    * Make a numpy array named seed_count_array with these values
* For obvious reasons, we will not be measuring any apples right now. We're going to cheat a bit and say that the height of the associated apples were the number of seeds times our multiplier value plus noise
    * Make a numpy array called apple_height_array that is length 10, and equal to the seed_count_array times the actual_multiplier
    * Use the np.random.random method to create an array of length 10, and name it noise_array
    * Add the values from the elements of the noise array to the elements of apple_height_array
* Print out the actual_multiplier, seed_count_array, and apple_height_array

In [7]:
seed_count_array = np.array(range(1,11)) # alternate - np.arange(1,11) or [x+1 for x in range(10)]
seed_count_array

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [8]:
apple_height_array = seed_count_array * actual_multiplier
apple_height_array

array([ 11,  22,  33,  44,  55,  66,  77,  88,  99, 110])

In [9]:
noise_array = np.array(object = np.random.random(size = 10))
noise_array

array([0.39123305, 0.46918944, 0.50868444, 0.95361529, 0.70135775,
       0.78243863, 0.50317105, 0.3545299 , 0.47419082, 0.81559455])

In [10]:
apple_height_array = apple_height_array+noise_array

### Sidenote - How contrived is this exercise?

This is toy problem where we know the answer before we start. The point of this example is understand overall process of iterative improvement. Relationships typically being modelled with ML are more complicated than a simple multiplier, but suprisingly little changes for more complex problems. Here we're modelling a single parameter, but many models used in biology have 10s of millions, but are built out of many simple calculations like our exercise. The math is more advanced (but maybe not as much as you might think) and beyond our scope, but wouldn't serve much practical use anyway since these calculations are never ever done by hand, and a comprehensive understanding of them is not strictly necessary unless researching novel algorithm designs.

### Steps (Add steps from slides here)
1. write nested for-loops to 1) make a random prediction for each sample and 2) go through 10 epochs (function name - predict_multiplier - guesses a value from -100 to 100)
2. function name - calculate_loss that substracts the prediction from true value (at each step we will print loss, predicted multiplier, actual multiplier, predicted target, actual target (height) and no. of seeds)
3. create a variable - keeps track of the best loss value (best_loss) - make a list (called best_param_list) that appends to another list loss,predicted multiplier, actual multiplier, predicted target, actual target, and no. of seeds whenever a new loss is found. 2) try increasing no. of epochs (comment out print statement)
4. update the predict function to take in previous step's prediction and loss to make the output more accurate - add a step before your loop to initialise these values
5.Multiply the loss function with a learning rate of 0.001

In [11]:
def predict_multiplier_1(predict, loss, min = -100, max = 100):
    if loss < 0:
        multiplier = random.uniform(min < predict, max)
    elif loss > 0:
        multiplier = random.uniform(min, max > predict) 
    else:
        multiplier = random.uniform(min, max)
    return multiplier

In [20]:
def predict_multiplier(predict, loss):
    if loss < 0:
        multiplier = predict - (0.1 * predict)
    elif loss > 0:
        multiplier = predict + (0.1 * predict)
    else:
        multiplier = random.uniform(-100, 100)
    return multiplier       

In [None]:
# alternate way - 4 function
# def predict_multiplier(predict, loss):
#    return predict + (loss*0.001) # 0.001 - learning rate

In [13]:
def calculate_loss(true_value, predicted_value):
    loss = true_value - predicted_value
    return loss

In [None]:
#2
epoch_count = 10
for epoch in range(epoch_count):
    print('loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample]')
    for sample in range(len(seed_count_array)):
        guess = predict_multiplier()
        predicted_target = seed_count_array[sample] * guess
        loss = calculate_loss(apple_height_array[sample], predicted_target)
        print(loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample])
    print()

In [82]:
#3
epoch_count = 1000
best_loss = 1000
best_param_list = []
for epoch in range(epoch_count):
    #print('best_loss, loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample]')
    for sample in range(len(seed_count_array)): #could have used enumerate instead of length
        guess = predict_multiplier()
        predicted_target = seed_count_array[sample] * guess
        loss = calculate_loss(apple_height_array[sample], predicted_target)
        output = [loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample]]
        if abs(best_loss) > abs(loss):
            best_loss = loss
            best_param_list.append(output)
        #print(best_loss, loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample])
    #print()

In [156]:
print(best_loss)
for row in best_param_list:
    print(row)

47.33990993231572
[72.70146681999887, -48.70146681999886, 24, -48.70146681999886, 24, 1]
[47.33990993231572, 0.6600900676842798, 48, 0.3300450338421399, 24, 2]


In [31]:
#4
epoch_count = 100000
best_loss = 1000
best_param_list = []
loss = 1000
guess = 100
for epoch in range(epoch_count):
    #print('best_loss, loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample]')
    for sample in range(len(seed_count_array)): #could have used enumerate instead of length
        guess = predict_multiplier(predict = guess, loss = loss)
        predicted_target = seed_count_array[sample] * guess
        loss = calculate_loss(apple_height_array[sample], predicted_target)
        output = [loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample]]
        if abs(best_loss) > abs(loss):
            best_loss = loss
            best_param_list.append(output)
        #print(best_loss, loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample])
    #print()

In [68]:
print(best_loss)
for row in best_param_list:
    print(row)

-0.10364583929604976
[-75.65011535384168, 87.04134840731308, 11.391233053471396, 87.04134840731308, 73.35399422268213, 1]
[-18.958208533391137, 30.349441586862532, 11.391233053471396, 30.349441586862532, 73.35399422268213, 1]
[-18.105764179790384, 117.57995500459218, 99.4741908248018, 13.064439444954687, 73.35399422268213, 9]
[-6.764360455744551, 117.57995500459218, 110.81559454884763, 11.757995500459218, 73.35399422268213, 10]
[0.809037103058099, 10.582195950413297, 11.391233053471396, 10.582195950413297, 73.35399422268213, 1]
[0.32935071728425314, 22.139838720857263, 22.469189438141516, 11.069919360428631, 73.35399422268213, 2]
[-0.3058593520430488, 11.697092405514445, 11.391233053471396, 11.697092405514445, 73.35399422268213, 1]
[0.26741456306629985, 11.123818490405096, 11.391233053471396, 11.123818490405096, 73.35399422268213, 1]
[-0.2666205564037085, 11.657853609875104, 11.391233053471396, 11.657853609875104, 73.35399422268213, 1]
[-0.2627039237627642, 11.65393697723416, 11.391233

In [223]:
# alternate way - 4
epoch_count = 10000
best_loss = 1000
best_param_list = []
guess = random.uniform(-100, 100)
loss = calculate_loss(apple_height_array[0], guess*seed_count_array[0])

for epoch in range(epoch_count):
    #print('best_loss, loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample]')
    for sample in range(len(seed_count_array)): #could have used enumerate instead of length
        guess = predict_multiplier(predict = guess, loss = loss)
        predicted_target = seed_count_array[sample] * guess
        loss = calculate_loss(apple_height_array[sample], predicted_target)
        output = [loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample]]
        if abs(best_loss) > abs(loss):
            best_loss = loss
            best_param_list.append(output)
        #print(best_loss, loss, predicted_target, apple_height_array[sample], guess, actual_multiplier, seed_count_array[sample])
    #print()

  predicted_target = seed_count_array[sample] * guess


In [224]:
print(best_loss)
for row in best_param_list:
    print(row)

94.97271352785023
[94.97271352785023, -70.97271352785023, 24, -70.97271352785023, 24, 1]
