# Random Search

We implement a random search for LED configurations that most closely match a target
light spectrum. After importing relevant modules and functions and setting up the
spectrophotometer sensor, we define helper functions for measuring data from the sensor,
generating random inputs, and measuring the objective function (i.e. mean absolute error). From there,
we perform an experiment with 100 iterations and visualize the results.

## Setup

https://raspberrypi.stackexchange.com/a/108041/137101
https://github.com/numpy/numpy/issues/16012#issuecomment-615927988

In [27]:
from time import sleep
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error
from blinkt import set_pixel, set_brightness, show, clear
import board
from adafruit_as7341 import AS7341


In [30]:
i2c = board.I2C()  # uses board.SCL and board.SDA
sensor = AS7341(i2c)

## Helper Functions

We define four helper functions:
- `observe_sensor_data`
- `get_random_inputs`
- `get_target_inputs`
- `evaluate` (the objective function which measures MAE)

### Sensor Data
We define a function that takes brightness and values for red, green, and blue as inputs and then returns
the measured intensities at 8 distinct wavelengths.

In [29]:
def observe_sensor_data(I, R, G, B):
    set_brightness(I)
    clear()
    set_pixel(3, R, G, B)
    show()
    sleep(0.5)
    data = sensor.all_channels
    clear()
    show()
    return data

Notice that RGB = (255, 0, 0) favors the higher end of the spectrum (red).

In [32]:
observe_sensor_data(0.1, 255, 0, 0)

(169, 78, 192, 206, 302, 5605, 10204, 183)

Whereas RGB = (255, 255, 255), or white, has large values across the spectrum.

In [31]:
observe_sensor_data(0.1, 255, 255, 255)

(666, 10325, 9908, 14797, 3369, 6181, 10820, 1002)

### Random Inputs

Next, we define a function that will return random brightness, red, green, and blue values.

In [38]:
rng = np.random.default_rng(42)

def get_random_inputs(rng=rng):
    I = 0.5 * rng.random()  # 1.0 is really bright, so no more than 0.5
    RGB = 255 * rng.random(3)
    R, G, B = RGB.astype(int)
    return I, R, G, B

Notice how if we call this function twice in a row, we get different values.

In [95]:
print(get_random_inputs())
print(get_random_inputs())

(0.028615970076630193, 204, 236, 196)
(0.3490603919893186, 213, 10, 51)


### Fixed Target Inputs

Next, we define an arbitrary set of target inputs that we'll use to measure a target spectrum.

In [43]:
def get_target_inputs(seed=604523):
    targ_rng = np.random.default_rng(seed)
    return get_random_inputs(rng=targ_rng)

Notice how the outputs are identical, since we're using a fixed seed to define these values.

In [96]:
print(get_target_inputs())
print(get_target_inputs())

(0.22770586934673454, 7, 89, 169)
(0.22770586934673454, 7, 89, 169)


However, if we change the seed, we get a new set of target inputs. Again, the outputs
are identical.

In [98]:
print(get_target_inputs(seed=10001))
print(get_target_inputs(seed=10001))

(0.35740321158073246, 160, 86, 92)
(0.35740321158073246, 160, 86, 92)


### Target Parameters and Spectrum

Using our helper functions, we retrieve the target inputs and then measure a spectrum
with the LED set to the target input parameters. Finally, we define an objective
function which is the mean absolute error (MAE) between the target spectrum and some yet to be
measured spectrum.

#### Input Parameters

These are the fixed target inputs for the default seed, 604523.

In [47]:
I_true, R_true, G_true, B_true = get_target_inputs()
print(I_true, R_true, G_true, B_true)

0.22770586934673454 7 89 169


#### Target Spectrum (Stochastic)

Note that running the spectrum measurement repeatedly will produce slightly different results.

In [49]:
target_data = observe_sensor_data(I_true, R_true, G_true, B_true)
print(target_data)

(627, 16176, 12060, 11001, 2241, 1069, 1566, 1030)


### Objective Function (evaluate)

In addition to return the MAE between the target and yet to be measured
data, we also track the data for the individual channels to make analysis and
visualization easier later on. The evaluate function is defined based on the fixed,
observed target data from the cell above.

In [55]:
from sklearn.metrics import mean_absolute_error

def evaluate(I, R, G, B):
    data = observe_sensor_data(I, R, G, B)
    return {
        "ch415_violet": data[0],
        "ch445_indigo": data[1],
        "ch480_blue": data[2],
        "ch515_cyan": data[3],
        "ch555_green": data[4],
        "ch590_yellow": data[5],
        "ch630_orange": data[6],
        "ch680_red": data[7],
        "mae": mean_absolute_error(target_data, data),
    }


Note how calling evaluate on a set of new random input parameters produces different results
each time.

In [99]:
pd.DataFrame([evaluate(*get_random_inputs()), evaluate(*get_target_inputs())])

Unnamed: 0,ch415_violet,ch445_indigo,ch480_blue,ch515_cyan,ch555_green,ch590_yellow,ch630_orange,ch680_red,mae
0,144,2073,2241,3820,1016,1053,1700,250,4217.625
1,634,16383,12197,11155,2257,1082,1588,1042,71.0


## Experiment

We set a budget of 100 iterations and track the input parameters and observations at
each iteration. Since this is random search, no optimization is performed. In other
words, subsequent trials don't explicitly depend on previous ones.

In [58]:
from tqdm import trange

num_iter = 100

inputs = []
observed_data = []
for i in trange(num_iter):
    input = get_random_inputs()
    observed_datum = evaluate(*input)
    
    inputs.append(input)
    observed_data.append(observed_datum)

100%|██████████| 100/100 [02:01<00:00,  1.21s/it]


### Exploratory Data Analysis

In [61]:
input_df = pd.DataFrame(np.array(inputs), columns=["I", "R", "G", "B"])
input_df

Unnamed: 0,I,R,G,B
0,0.077145,174.0,189.0,246.0
1,0.162913,94.0,119.0,48.0
2,0.064961,121.0,57.0,170.0
3,0.218576,212.0,178.0,79.0
4,0.416130,205.0,98.0,73.0
...,...,...,...,...
95,0.053210,254.0,169.0,165.0
96,0.045220,228.0,7.0,61.0
97,0.071511,198.0,50.0,232.0
98,0.328135,9.0,1.0,13.0


In [62]:
observe_df = pd.DataFrame(observed_data)
observe_df

Unnamed: 0,ch415_violet,ch445_indigo,ch480_blue,ch515_cyan,ch555_green,ch590_yellow,ch630_orange,ch680_red,mae
0,357,6377,6039,7447,1833,2763,4729,583,3169.500
1,357,3310,4186,10574,2174,3893,6846,504,3766.750
2,204,4397,3774,2429,710,1860,3250,358,4217.250
3,741,6538,7922,18513,3796,10107,18362,974,6105.875
4,1189,12047,11373,18372,3844,19462,36989,1466,8575.500
...,...,...,...,...,...,...,...,...,...
95,159,2098,2258,3382,949,1732,3041,262,4520.625
96,69,794,736,265,186,1461,2637,121,5303.375
97,274,5983,5041,2265,735,2867,5205,465,4226.125
98,80,1796,1090,300,173,808,1461,143,4989.875


Next, let's track the worst, average, and best performance, including the worst and best
parameters associated with those. Notice that the best parameters match more closely to
the target parameters, while the worst parameters have a much bigger mistmatch to the
target parameters. This behavior is also reflected in the MAE.

In [77]:
input_names = ["I", "R", "G", "B"]
worst_mae = observe_df["mae"].max()
avg_mae = observe_df["mae"].mean()
best_mae = observe_df["mae"].min()
worst_observed_iter = observe_df["mae"].idxmax()
best_observed_iter = observe_df["mae"].idxmin()
# could track median as well, see e.g. https://stackoverflow.com/questions/46411507/get-corresponding-index-of-median
worst_parameters = input_df.loc[worst_observed_iter, input_names].to_dict() 
best_parameters = input_df.loc[best_observed_iter, input_names].to_dict()
print(f"worst_mae: {worst_mae}, avg_mae: {avg_mae}, best_mae: {best_mae}")
print(f"best_parameters:\n{best_parameters}")
target_parameters = dict(I=I_true, R=R_true, G=G_true, B=B_true)
print(f"target_parameters:\n{target_parameters}")
print(f"worst_parameters:\n{worst_parameters}")

worst_mae: 18171.625, avg_mae: 6166.1525, best_mae: 1359.5
best_parameters:
{'I': 0.1809063052766952, 'R': 22.0, 'G': 30.0, 'B': 245.0}
target_parameters:
{'I': 0.22770586934673454, 'R': 7, 'G': 89, 'B': 169}
worst_parameters:
{'I': 0.48978534029329635, 'R': 204.0, 'G': 198.0, 'B': 163.0}


## Visualization

To get an idea of the efficiency of random search in this context, we can take a look at
the "best objective so far" as a function of the iteration number. In addition to
plotting the best so far, we also plot the raw, observed MAE values at each iteration.

In [92]:
import plotly.express as px
import plotly.graph_objects as go

df = pd.concat((input_df, observe_df), axis=1)

df["iteration"] = np.arange(1, len(df) + 1)
df["best_mae"] = df["mae"].cummin()
best_objective_plot = px.line(df, x="iteration", y=["best_mae"])
objective_plot = px.scatter(df.rename(columns=dict(mae="observed_mae")), x="iteration", y=["observed_mae"])
fig = go.Figure(data=best_objective_plot.data + objective_plot.data, layout=best_objective_plot.layout)
fig

### Example Visualization

![random search](random_search.png)

## Code Graveyard

In [None]:
# def initialize_target_data(seed = 42):
#     # while the inputs are fixed, the measurement is stochastic
#     return observe_sensor_data(*initialize_target_inputs(seed=seed))