# Grounded Rewards

In [None]:
import ipywidgets as widgets
from traitlets import Dict, Unicode, validate

class Display(widgets.DOMWidget):
    _view_name = Unicode('Display').tag(sync=True)
    _view_module = Unicode('display').tag(sync=True)
    _view_module_version = Unicode('0.1.0').tag(sync=True)
    state = Dict({"values": []}).tag(sync=True)

In [None]:
%%javascript
// Reset the loader's internal state to forget about the previous definition of the module
require.undef('display');

require.config({
  //Define 3rd party plugins dependencies
  paths: {
    fabric: "https://cdnjs.cloudflare.com/ajax/libs/fabric.js/2.7.0/fabric.min"
  }
});

define('display', ["@jupyter-widgets/base", "fabric"], function(widgets, fabric) {

    var Display = widgets.DOMWidgetView.extend({

        render: function() {
            const canvas = document.createElement('canvas');
            canvas.id = 'canvas';
            canvas.width = 1000;
            canvas.height = 500;
            var ctx = canvas.getContext("2d");
            ctx.fillStyle = "blue";
            ctx.fillRect(0, 0, canvas.width, canvas.height);
            this.el.appendChild(canvas);

            const fabricCanvas = new fabric.Canvas(canvas);

            // Create a starting rect (useful to see something is working)
            const shape = new fabric.Circle({
                top : 100,
                left : 100,
                radius : 20,
                fill : '#5BC8F7'
            });

            fabricCanvas.add(shape);
            
            // Create a list of objects to re-use
            const shapes = [shape];
            
            // Set up our listener
            const onStateChanged = this.handleStateChanged.bind(this, fabricCanvas, shapes);
            this.model.on('change:state', onStateChanged);
        },
        
        handleStateChanged: function(fabricCanvas, shapes) {
            const vals = this.model.get('state').values;
            
            for (let i=0; i < vals.length; i++) {
                let shape;
                // Create a new shape if our list of vals has increased in length
                if (shapes[i] === undefined) {
                    shape = new fabric.Circle({
                        top : 100,
                        left : 100,
                        radius : 20,
                        fill : '#F6C7BE'
                    });
                    fabricCanvas.add(shape);
                    shapes.push(shape);
                } else {
                    shape = shapes[i];
                }
                // Update the shape to the correct location
                const top = vals[i];
                const left = 500 - (i * (shape.radius * 3));
                shape.set({
                    top,
                    left
                });
                
            }
            
            // Render all rects;
            fabricCanvas.renderAll();
        }
    });

    return {
        Display
    };
});

In [None]:
# Instantiate the Python widget.
display = Display()

In [None]:
# Invoke the __repr__() of the widget which actually causes the JavaScript widget to be drawn.
display

In [None]:
# Add some new values to our state and watch our display update live!
import math
from time import sleep
from collections import deque

i = 0
q = deque(maxlen=9)
while True:
    new_state = display.state.copy()
    new_val = (math.sin(i) + 1) * 200
    q.appendleft(new_val)
    new_state['values'] = list(q)
    display.state = new_state
    i += 0.05
    sleep(0.02)

# Full Puzzle

https://miro.com/app/board/o9J_kwPmu2M=/

Set Point -> Value -> Error -> Perception -> Controller -> Feedback -> Control Element Action

observation ->


# Evolve a controller

We want to use either an Evolution Strategies (ES) or Genetic Algorithm (GA) approach to create a controller for a continuous dynamic process.

# Citations

## Paper Links

"Another distinguishing feature of drives are their homeostatic nature. For animals to survive, they must maintain a variety of critical parameters (such as temperature, energy level, amount of fluids, etc.) within a bounded range. As such, the drives keep changing in intensity to reflect the ongoing needs of the robot and the urgency for tending to them.  There is a desired operational point for each drive and an acceptable bounds of operation around that point.  We call this range the homeostatic regime. As long as a drive is within the homeostatic regime, the robot’s “needs” are being adequately met."

http://robotic.media.mit.edu/wp-content/uploads/sites/7/2015/01/Breazeal-AAAI-98.pdf


Figure 1

https://www.researchgate.net/profile/Oscar_Deniz/publication/228709943_A_Proposal_of_a_Homeostatic_Regulation_Mechanism_for_a_Vision_System/links/5419ded50cf25ebee98882c9/A-Proposal-of-a-Homeostatic-Regulation-Mechanism-for-a-Vision-System.pdf

Figure 2

https://ieeexplore.ieee.org/document/8408296

## Web Links (Need to Find Papers)

"To survive, animals must maintain certain critical parameters within a bounded range. For instance, an animal must regulate its temperature, energy level, amount of fluids, etc. Maintaining each critical parameter requires that the animal come into contact with the corresponding satiatory stimulus (shelter, food, water, etc.) at the right time. The process by which these critical parameters are maintained is generally referred to as homeostatic regulation.

In a simplified view, each satiatory stimulus can be thought of as an innately specified need. In broad terms, there is a desired fixed point of operation for each parameter, and an allowable bounds of operation around that point (see figure). As the critical parameter moves away from the desired point of operation, the animal becomes more strongly motivated to behave in ways that will restore that parameter. The physiological mechanisms that serve to regulate these needs, driving the animal into contact with the needed stimulus at the appropriate time, are quite complex and distinct."

http://www.ai.mit.edu/projects/sociable/homeostatic-regulation.html

## Github

https://github.com/hardmaru/estool

# Evolution Strategies

If we want to visualize the progress of a population toward a global optimum we need

1. A fitness function
2. A visualization of that function

Plotting libraries like matplotlib and plotly generally work with arrays of discrete values rather than equations like desmos. Thus to *draw* the function we need to output a set of points that can be used to draw a plot. In our case we want to draw a contour plot, which is a 2D slice of a 3D function. The 3D function is our fitness function and we will use it directly when *evaluating* each of the members of our population.

In [None]:
import numpy as np
import plotly.graph_objects as go

x_vals = np.arange(0, 10)

fig = go.Figure(data =
    go.Contour(
        z=[[10, 10.625, 12.5, 15.625, 20],
           [5.625, 6.25, 8.125, 11.25, 15.625],
           [2.5, 3.125, 5., 8.125, 12.5],
           [0.625, 1.25, 3.125, 6.25, 10.625],
           [0, 0.625, 2.5, 5.625, 10]]
    ))
fig.show()

# Plotly as a JS library

In [1]:
import numpy as np
import ipywidgets as widgets
from ipywidgets import Layout
from traitlets import Dict, Unicode, validate

class PlotlyDisplay(widgets.DOMWidget):
    _view_name = Unicode('PlotlyDisplay').tag(sync=True)
    _view_module = Unicode('plotly-display').tag(sync=True)
    _view_module_version = Unicode('0.1.0').tag(sync=True)
    _state = {
        "fitness": {"x": [], "y": [], "z": []},
        "pop": {"x": [], "y": [], "z": []},
        "pop_mean": {"x": 0.0, "y": 0.0, "z": 0.0},
        "best": {"x": 0.0, "y": 0.0, "z": 0.0}
    }
    state = Dict(_state).tag(sync=True)

In [21]:
%%javascript
// Reset the loader's internal state to forget about the previous definition of the module
require.undef('plotly-display');

require.config({
  //Define 3rd party plugins dependencies
  paths: {
    plotly: "https://cdn.plot.ly/plotly-latest"
  }
});

define('plotly-display', ["@jupyter-widgets/base", "plotly"], function(widgets, plotly) {

    var PlotlyDisplay = widgets.DOMWidgetView.extend({

        render: function() {
            // Create target div
            this.mydiv = document.createElement('div');
            this.mydiv.id = "myDiv"
            this.el.appendChild(this.mydiv);
            
            // Bind functions
            this.draw = this.draw.bind(this);
            this.step = this.step.bind(this);
            this.handleStateChanged = this.handleStateChanged.bind(this);
            
            // Set up our listener
            this.model.on('change:state', this.handleStateChanged);
            
            // Draw initial state
            const firstDraw = true;
            this.draw(this.mydiv, firstDraw);
            
            // Start raf loop
            this.newData = false;
            window.requestAnimationFrame(this.step);
            
            this.ts = null;
        },

        handleStateChanged: function() {
            this.newData = true;
        },
        
        step: function(ts) {
            if (this.newData) {
                this.draw(this.mydiv);
                this.newData = false;
                if (!this.ts) {
                    this.ts = ts;
                }
//                 console.log(ts - this.ts);
                this.ts = ts;
            }
            window.requestAnimationFrame(this.step);
        },

        draw: function(mydiv, firstDraw) {
            const {fitness, pop, pop_mean, best } = this.model.get('state');
            
            const contour = {
                x: fitness.x,
                y: fitness.y,
                z: fitness.z,
                type: 'surface',
                colorscale: 'YlOrRd'
            };
            
            const scatter = {
                x: pop.x,
                y: pop.y,
                z: pop.z,
                mode: "markers",
                marker: {
                    color: "blue",
                    size: 2,
                },
                type: 'scatter3d'
            };
            
            const pop_mean_scatter = {
                x: [pop_mean.x],
                y: [pop_mean.y],
                z: [pop_mean.z],
                mode: "markers",
                marker: {
                    color: "green",
                    size: 4
                },
                type: 'scatter3d',
            };
            
            const best_scatter = {
                x: best.x,
                y: best.y,
                z: best.z,
                mode: "markers",
                marker: {
                    color: "red",
                    size: 4
                },
                type: 'scatter3d'
            };
            
            const data = [
                contour,
                scatter,
                pop_mean_scatter,
                best_scatter
            ];

            const camera = {
                eye: {
                    x: 0.0,
                    y: 0.0, 
                    z: 0.8
                }
            };
            
            const layout = {
                title: 'Surface + Scatter Plot',
                showlegend: false,
                scene: {
                    zaxis: {
                        range: [0, 80]
                    },
                    camera
                },
                height: 700
            };

            if (firstDraw) {
                plotly.newPlot(mydiv, data, layout);
            } else {
                plotly.react(mydiv, data, layout);
            }
        }
    });

    return {
        PlotlyDisplay
    };
});

<IPython.core.display.Javascript object>

In [22]:
# Define our functions
def rastrigin(X, Y):
    Z = (X**2 - 10 * np.cos(2 * np.pi * X)) + \
        (Y**2 - 10 * np.cos(2 * np.pi * Y)) + 20
    return Z / 10.

def normal_sample(count, sigma, mu):
    samples = (sigma * np.random.randn(count) + mu)
    return samples

# Instantiate the Python widget.
pdisplay = PlotlyDisplay()

# Initial State
init_state = pdisplay.state.copy()

# Rastrigin function
X_1D = np.linspace(-5.12, 5.12, 100) # Create a list of 100 numbers
Y_1D = np.linspace(-5.12, 5.12, 100) # Create a list of 100 numbers

# Turn that into the x's and y's of a 2D set of points
X_2D, Y_2D = np.meshgrid(X_1D, Y_1D)

# Compute the Z values for each of those grid points.
Z = rastrigin(X_2D, Y_2D)

# Set up the initial state of the graph
init_state['fitness'] = {
    "x": X_1D.tolist(),
    "y": Y_1D.tolist(),
    "z": Z.tolist()}

sigma = 0.3
x_mu = -3
y_mu = 2
pop_count = 300
point_height = 8.0

pop_X = normal_sample(pop_count, sigma, x_mu)
pop_Y = normal_sample(pop_count, sigma, y_mu)
pop_Z = [point_height] * pop_count
init_state['pop'] = {
    "x": pop_X.tolist(),
    "y": pop_Y.tolist(),
    "z": pop_Z
}
pdisplay.state = init_state

In [23]:
import math
from time import sleep
from collections import deque
import numpy as np


def mean_adaptation():
    x_mu = -3
    y_mu = 2
    for i in np.arange(100):
        new_state = pdisplay.state.copy()
        X = normal_sample(pop_count, sigma, x_mu)
        Y = normal_sample(pop_count, sigma, y_mu)
        Z = rastrigin(X, Y) + 1.0

        min_ind = np.argmin(Z)
        new_state['pop'] = {
            "x": X.tolist(),
            "y": Y.tolist(),
            "z": Z.tolist()
        }
        new_state['pop_mean'] = {
            "x": np.mean(X),
            "y": np.mean(Y),
            "z": point_height
        }
        
        # Move the sample mean to the location of the best previous sample
        x_mu = X[min_ind]
        y_mu = Y[min_ind]
        new_state['best'] = {
            "x": [x_mu],
            "y": [y_mu],
            "z": [point_height]
        }
        pdisplay.state = new_state
        sleep(0.1)

In [42]:
def cma_es():
    x_mu = -3
    y_mu = 2
    x_sigma = y_sigma = 1.0
    top_n = 25
    for i in np.arange(40):
        new_state = pdisplay.state.copy()
        X = normal_sample(pop_count, x_sigma, x_mu)
        Y = normal_sample(pop_count, y_sigma, y_mu)
        Z = rastrigin(X, Y) + 1.0
        
        new_state['pop'] = {
            "x": X.tolist(),
            "y": Y.tolist(),
            "z": Z.tolist()
        }
        new_state['pop_mean'] = {
            "x": np.mean(X),
            "y": np.mean(Y),
            "z": point_height
        }
        
        # Find the lowest 25 z values
        # Note: `ind` is not sorted
        ind = np.argpartition(Z, top_n)[:top_n]
        
        # Find the X and Y values with the lowest z values
        X_best = X[ind]
        Y_best = Y[ind]
        Z_best = Z[ind]
        
        new_state['best'] = {
            "x": X_best.tolist(),
            "y": Y_best.tolist(),
            "z": Z_best.tolist()
        }
        pdisplay.state = new_state
        
        # Calculate our new std-deviations using the CMA-ES hack of using
        # the means from the previous generation.
        # Note: as we're now *estimating* variance we need the -1
        X_var = np.sum(np.power(X_best - x_mu, 2)) / (top_n - 1)
        Y_var = np.sum(np.power(Y_best - y_mu, 2)) / (top_n - 1)
        x_sigma = np.sqrt(X_var)
        y_sigma = np.sqrt(Y_var)

        # Calculate our new means
        x_mu = np.mean(X_best)
        y_mu = np.mean(Y_best)
        
        sleep(0.2)

In [25]:
# Invoke __repr__ to show the plot
pdisplay

PlotlyDisplay(state={'fitness': {'x': [-5.12, -5.016565656565657, -4.913131313131313, -4.80969696969697, -4.70…

In [45]:
cma_es()

1. Firefox profiler

A large amount of time is being spent in plotly.react()

2. Find a dynamic graphic library for JS (d3.js)
3. Reduce data fetched on each frame
4. Plotly method to update just data?

The .update() method doesn't appear to work as expected. It does not redraw data and it doesn't appear to be any faster even without a successful redraw.

5. Log timings

In [None]:
# Evolution Strategies


# Draw a fitness function in 2D space
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

In [None]:
fig = plt.figure(figsize=(6,5))
left, bottom, width, height = 0.1, 0.1, 0.8, 0.8
ax = fig.add_axes([left, bottom, width, height]) 

start, stop, n_values = -8, 8, 800

x_vals = np.linspace(start, stop, n_values)
y_vals = np.linspace(start, stop, n_values)
X, Y = np.meshgrid(x_vals, y_vals)


Z = np.sqrt(X**2 + Y**2)

cp = plt.contourf(X, Y, Z)
plt.colorbar(cp)

ax.set_title('Contour Plot')
ax.set_xlabel('x (cm)')
ax.set_ylabel('y (cm)')
plt.show()