### Problem Statement

For starters let us write an agent to solve a simple optimization problem.
1. Data Generator: Write a piece of code which produces our ground truth data as
y=f(x) where f(x) = −0.2 + 0.4 ∗ x and x should be in the range [-1,1]. Produce a
plot of f(x).
2. Function: The function we’d like to fit is f(x; α, β) = α + β ∗ x Write a piece of
code which produces ŷ = f(x; α, β). Produce a plot for all parameters equal to one.
3. Discretized Space: Let also be α and β be in the range [-1,1]. Their values should
be discretized with stepsize 0.2. Plot this space as squares in a plane where each
discrete pair is denoted by a dot in the middle of one of the squares (should be 64
squares in the end).
4. Rewards: Use the RMSE value at a given point to come up with rewards for this
space, i.e lower RMSE should give more reward then higher RMSE. Add the
Rewards to your plot from 3.
5. Q Learning: Write an agent which is able to walk this discrete space to find the
parameters of the true data generator.

In [25]:

# import random
# import numpy as np

# import matplotlib.pyplot as plt
%matplotlib inline
%load_ext autoreload
%autoreload 2
%matplotlib nbagg  

# core imports
import os
import sys
import random
import numpy as np

sys.path.append(os.path.abspath(os.path.join('..', 'src')))
# local imports
from autonetwork.environments.single_parameter import TwoParaEnv
from autonetwork.agents.qlearning_agent import QLearningAgent
from autonetwork.simulation import Run



interactive = True



The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [26]:
def data_generator(input_data):
    """
    Generate ground truth data (y) based on
    y=f(x) where f(x) = −0.2 + 0.4 ∗ x
    """
    return -0.2 + 0.4 * input_data

def fit_function(x, alpha, beta):
    """
    Return predicted value (y_bar)
    """
    return alpha + beta * x

def discrete_parameter_space(input_range, step_size=0.2):
    """
    We have alpha and beta parameters for the model.
    We are creating discrete parameter space for the modeling purpose.
    """
    return np.arange(input_range[0]+step_size, input_range[1], step_size)

def rmse_func(targets, predictions):
    """
    it calculate the root mean square value
    """
    return np.sqrt(np.mean((predictions-targets)**2))

def reward_func(current_input):
    """
    Let us discretize the rewards (we can also use continuous rewards based on rmse value))
    """
    if current_input < 0.1:
        return 20
    elif current_input < 0.5:
        return 10
    elif current_input < 1:
        return 5
    else:
        # making sure higher rmse gets lower reward
        return -1 * current_input
    

### 1. Generate Ground Truth

In [6]:
input_range = (-1, 1)
data_length = 20
input_data = np.linspace(input_range[0], input_range[1], data_length)
ground_truth = data_generator(input_data)

In [7]:
ground_truth

array([-0.6       , -0.55789474, -0.51578947, -0.47368421, -0.43157895,
       -0.38947368, -0.34736842, -0.30526316, -0.26315789, -0.22105263,
       -0.17894737, -0.13684211, -0.09473684, -0.05263158, -0.01052632,
        0.03157895,  0.07368421,  0.11578947,  0.15789474,  0.2       ])

### 2. Plot Grouth Truth

In [8]:
index = np.linspace(input_range[0], input_range[1], data_length)
plt.plot(index, ground_truth, 'r.', label='ground truth(y)') # x
plt.legend(loc="upper left")
plt.xlabel('x', fontsize=16)
plt.ylabel('y(ground truth)', fontsize=16)
plt.show()

<IPython.core.display.Javascript object>

### 3. Fit model (f(x; α, β) = α + β ∗ x) with α, β = 1

In [9]:
alpha = 1
beta = 2
fitted_values = fit_function(input_data, alpha, beta)

In [10]:
plt.plot(index, fitted_values, 'b.', label='predicted values(y_bar)') # x

plt.legend(loc="upper left")
plt.xlabel('x', fontsize=10)
plt.ylabel('y_bar(predicted_values)', fontsize=10)
plt.show()

### 4. Discretize the parameter space and create square maze

In [11]:
discrete_alphas = discrete_parameter_space(input_range)
discrete_betas = discrete_parameter_space(input_range)

In [12]:
discrete_alphas, discrete_betas

(array([-8.00000000e-01, -6.00000000e-01, -4.00000000e-01, -2.00000000e-01,
        -2.22044605e-16,  2.00000000e-01,  4.00000000e-01,  6.00000000e-01,
         8.00000000e-01]),
 array([-8.00000000e-01, -6.00000000e-01, -4.00000000e-01, -2.00000000e-01,
        -2.22044605e-16,  2.00000000e-01,  4.00000000e-01,  6.00000000e-01,
         8.00000000e-01]))

In [13]:
fig, ax = plt.subplots()
ax.set_xlim((-0.8,0.8))
ax.set_ylim((-0.8,0.8))
x0,x1 = ax.get_xlim()
y0,y1 = ax.get_ylim()
ax.grid(b=True, which='major', color='k', linestyle='--')
# fig.savefig('test.png', dpi=600)
# plt.close(fig)
plt.xlabel('beta', fontsize=10)
plt.ylabel('alpha', fontsize=10)
plt.show()

<IPython.core.display.Javascript object>

### 5. Plot the rewards on the space based on RMSE value

In [14]:
# apply alpha, beta pair from discrete_alphas, discrete_betas to calculate the rewards
reward_matrix = []
positive_reward_20_spaces = []
positive_reward_10_spaces = []
positive_reward_5_spaces = []
negative_reward_spaces = []
for alpha in discrete_alphas:
    reward_row = []
    for beta in discrete_betas:
        predictions = fit_function(input_data, alpha, beta)
        rmse = rmse_func(ground_truth, predictions)
        reward = reward_func(rmse)
        if reward == 20:
            positive_reward_20_spaces.append([beta + 0.1, alpha+0.1])
        elif reward == 10:
            positive_reward_10_spaces.append([beta + 0.1, alpha+0.1])
        elif reward == 5:
            positive_reward_5_spaces.append([beta + 0.1, alpha+0.1])
        else:
            negative_reward_spaces.append([beta + 0.1, alpha+0.1])
        reward_row.append(reward)
    
    reward_matrix.append(reward_row)

fig, ax = plt.subplots()
# + for positive reward and - for negative reward indicators
# using alpha value for giving higher alpha to higher reward place
ax.scatter(*zip(*positive_reward_20_spaces), c='green', marker='+', s=200)
ax.scatter(*zip(*positive_reward_10_spaces), c='green', marker='+', s=200, alpha=0.7)
ax.scatter(*zip(*positive_reward_5_spaces), c='green', marker='+', s=200, alpha=0.4)
ax.scatter(*zip(*negative_reward_spaces), c='red', marker='_', s=200)
ax.set_xlim((-0.8,1))
ax.set_ylim((-0.8,1))
x0,x1 = ax.get_xlim()
y0,y1 = ax.get_ylim()
# ax.set_aspect(abs(x1-x0)/abs(y1-y0))
ax.grid(b=True, which='major', color='k', linestyle='--')
# fig.savefig('test.png', dpi=600)
# plt.close(fig)
plt.xlabel('beta', fontsize=10)
plt.ylabel('alpha', fontsize=10)
plt.show()

<IPython.core.display.Javascript object>

### 6. Q-learning 

In [27]:
# rewards table
R = np.array(reward_matrix)
interactive = True
max_number_of_episodes = 10
env = TwoParaEnv(R)
agent = QLearningAgent(range(env.action_space.n))
run = Run(env, agent)
run.run_qlearning(max_number_of_episodes, interactive)

<IPython.core.display.Javascript object>

i am in step function
inside step function : previous state 0
inside step function : current action 3
inside update state: initial idx, idy 0 0
inside update sate: around_map [-9, 1, 9, -1]
inside update state : selected round map -1
insde update state: max maze size 80
inside step fuction : updated state 0
inside step function: reward 0
i am in step function
inside step function : previous state 0
inside step function : current action 0
inside update state: initial idx, idy 0 0
inside update sate: around_map [-9, 1, 9, -1]
inside update state : selected round map -9
insde update state: max maze size 80
inside step fuction : updated state 0
inside step function: reward 0
i am in step function
inside step function : previous state 0
inside step function : current action 0
inside update state: initial idx, idy 0 0
inside update sate: around_map [-9, 1, 9, -1]
inside update state : selected round map -9
insde update state: max maze size 80
inside step fuction : updated state 0
inside step

i am in step function
inside step function : previous state 0
inside step function : current action 0
inside update state: initial idx, idy 0 0
inside update sate: around_map [-9, 1, 9, -1]
inside update state : selected round map -9
insde update state: max maze size 80
inside step fuction : updated state 0
inside step function: reward 0
i am in step function
inside step function : previous state 0
inside step function : current action 0
inside update state: initial idx, idy 0 0
inside update sate: around_map [-9, 1, 9, -1]
inside update state : selected round map -9
insde update state: max maze size 80
inside step fuction : updated state 0
inside step function: reward 0
i am in step function
inside step function : previous state 0
inside step function : current action 0
inside update state: initial idx, idy 0 0
inside update sate: around_map [-9, 1, 9, -1]
inside update state : selected round map -9
insde update state: max maze size 80
inside step fuction : updated state 0
inside step

i am in step function
inside step function : previous state 0
inside step function : current action 0
inside update state: initial idx, idy 0 0
inside update sate: around_map [-9, 1, 9, -1]
inside update state : selected round map -9
insde update state: max maze size 80
inside step fuction : updated state 0
inside step function: reward 0
i am in step function
inside step function : previous state 0
inside step function : current action 0
inside update state: initial idx, idy 0 0
inside update sate: around_map [-9, 1, 9, -1]
inside update state : selected round map -9
insde update state: max maze size 80
inside step fuction : updated state 0
inside step function: reward 0
i am in step function
inside step function : previous state 0
inside step function : current action 2
inside update state: initial idx, idy 0 0
inside update sate: around_map [-9, 1, 9, -1]
inside update state : selected round map 9
insde update state: max maze size 80
inside update sate: next idx, idy 1 0
inside step 

i am in step function
inside step function : previous state 1
inside step function : current action 2
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 10
insde update state: max maze size 80
inside update sate: next idx, idy 1 1
inside step fuction : updated state 10
inside step function: reward 5.0
i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde update state: max maze size 80
inside update sate: next idx, idy 0 1
inside step fuction : updated state 1
inside step function: reward 5.0
i am in step function
inside step function : previous state 1
inside step function : current action 2
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 10
insde

i am in step function
inside step function : previous state 2
inside step function : current action 0
inside update state: initial idx, idy 0 2
inside update sate: around_map [-7, 3, 11, 1]
inside update state : selected round map -7
insde update state: max maze size 80
inside step fuction : updated state 2
inside step function: reward 0
i am in step function
inside step function : previous state 2
inside step function : current action 0
inside update state: initial idx, idy 0 2
inside update sate: around_map [-7, 3, 11, 1]
inside update state : selected round map -7
insde update state: max maze size 80
inside step fuction : updated state 2
inside step function: reward 0
i am in step function
inside step function : previous state 2
inside step function : current action 0
inside update state: initial idx, idy 0 2
inside update sate: around_map [-7, 3, 11, 1]
inside update state : selected round map -7
insde update state: max maze size 80
inside step fuction : updated state 2
inside step

i am in step function
inside step function : previous state 3
inside step function : current action 0
inside update state: initial idx, idy 0 3
inside update sate: around_map [-6, 4, 12, 2]
inside update state : selected round map -6
insde update state: max maze size 80
inside step fuction : updated state 3
inside step function: reward 0
i am in step function
inside step function : previous state 3
inside step function : current action 0
inside update state: initial idx, idy 0 3
inside update sate: around_map [-6, 4, 12, 2]
inside update state : selected round map -6
insde update state: max maze size 80
inside step fuction : updated state 3
inside step function: reward 0
i am in step function
inside step function : previous state 3
inside step function : current action 0
inside update state: initial idx, idy 0 3
inside update sate: around_map [-6, 4, 12, 2]
inside update state : selected round map -6
insde update state: max maze size 80
inside step fuction : updated state 3
inside step

i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde update state: max maze size 80
inside update sate: next idx, idy 0 1
inside step fuction : updated state 1
inside step function: reward 5.0
i am in step function
inside step function : previous state 1
inside step function : current action 2
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 10
insde update state: max maze size 80
inside update sate: next idx, idy 1 1
inside step fuction : updated state 10
inside step function: reward 5.0
i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde

i am in step function
inside step function : previous state 4
inside step function : current action 3
inside update state: initial idx, idy 0 4
inside update sate: around_map [-5, 5, 13, 3]
inside update state : selected round map 3
insde update state: max maze size 80
inside update sate: next idx, idy 0 3
inside step fuction : updated state 3
inside step function: reward 5.0
i am in step function
inside step function : previous state 3
inside step function : current action 2
inside update state: initial idx, idy 0 3
inside update sate: around_map [-6, 4, 12, 2]
inside update state : selected round map 12
insde update state: max maze size 80
inside update sate: next idx, idy 1 3
inside step fuction : updated state 12
inside step function: reward 5.0
i am in step function
inside step function : previous state 12
inside step function : current action 1
inside update state: initial idx, idy 1 3
inside update sate: around_map [3, 13, 21, 11]
inside update state : selected round map 13
insd

i am in step function
inside step function : previous state 10
inside step function : current action 1
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 11
insde update state: max maze size 80
inside update sate: next idx, idy 1 2
inside step fuction : updated state 11
inside step function: reward 5.0
i am in step function
inside step function : previous state 11
inside step function : current action 0
inside update state: initial idx, idy 1 2
inside update sate: around_map [2, 12, 20, 10]
inside update state : selected round map 2
insde update state: max maze size 80
inside update sate: next idx, idy 0 2
inside step fuction : updated state 2
inside step function: reward 5.0
i am in step function
inside step function : previous state 2
inside step function : current action 3
inside update state: initial idx, idy 0 2
inside update sate: around_map [-7, 3, 11, 1]
inside update state : selected round map 1
insd

i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde update state: max maze size 80
inside update sate: next idx, idy 0 1
inside step fuction : updated state 1
inside step function: reward 5.0
i am in step function
inside step function : previous state 1
inside step function : current action 2
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 10
insde update state: max maze size 80
inside update sate: next idx, idy 1 1
inside step fuction : updated state 10
inside step function: reward 5.0
i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde

i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde update state: max maze size 80
inside update sate: next idx, idy 0 1
inside step fuction : updated state 1
inside step function: reward 5.0
i am in step function
inside step function : previous state 1
inside step function : current action 2
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 10
insde update state: max maze size 80
inside update sate: next idx, idy 1 1
inside step fuction : updated state 10
inside step function: reward 5.0
i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde

i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde update state: max maze size 80
inside update sate: next idx, idy 0 1
inside step fuction : updated state 1
inside step function: reward 5.0
i am in step function
inside step function : previous state 1
inside step function : current action 2
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 10
insde update state: max maze size 80
inside update sate: next idx, idy 1 1
inside step fuction : updated state 10
inside step function: reward 5.0
i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde

i am in step function
inside step function : previous state 1
inside step function : current action 2
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 10
insde update state: max maze size 80
inside update sate: next idx, idy 1 1
inside step fuction : updated state 10
inside step function: reward 5.0
i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde update state: max maze size 80
inside update sate: next idx, idy 0 1
inside step fuction : updated state 1
inside step function: reward 5.0
i am in step function
inside step function : previous state 1
inside step function : current action 2
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 10
insde

i am in step function
inside step function : previous state 1
inside step function : current action 1
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 2
insde update state: max maze size 80
inside update sate: next idx, idy 0 2
inside step fuction : updated state 2
inside step function: reward 5.0
i am in step function
inside step function : previous state 2
inside step function : current action 3
inside update state: initial idx, idy 0 2
inside update sate: around_map [-7, 3, 11, 1]
inside update state : selected round map 1
insde update state: max maze size 80
inside update sate: next idx, idy 0 1
inside step fuction : updated state 1
inside step function: reward 5.0
i am in step function
inside step function : previous state 1
inside step function : current action 2
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 10
insde up

i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde update state: max maze size 80
inside update sate: next idx, idy 0 1
inside step fuction : updated state 1
inside step function: reward 5.0
i am in step function
inside step function : previous state 1
inside step function : current action 2
inside update state: initial idx, idy 0 1
inside update sate: around_map [-8, 2, 10, 0]
inside update state : selected round map 10
insde update state: max maze size 80
inside update sate: next idx, idy 1 1
inside step fuction : updated state 10
inside step function: reward 5.0
i am in step function
inside step function : previous state 10
inside step function : current action 0
inside update state: initial idx, idy 1 1
inside update sate: around_map [1, 11, 19, 9]
inside update state : selected round map 1
insde

KeyboardInterrupt: 