# User Study 02 - RL Audio Notebook

Before starting this survey, please click the folliwng two links to read the explanatory statrement and answer the pre-study questionnaire.

<span style="color:yellow">**Explanatory Statement:**</span> https://drive.google.com/file/d/1-8npbW1wg_ABzBnnGa1dgEgCaYjDED8o/view?usp=sharing

<span style="color:yellow">**Pre-study Questionnaire:**</span> https://forms.gle/GAU8xzekWKkTMDLVA   (Participant ID Required)

# Setup

## Imports & Args

In [None]:
%cd ~/Documents/PHD/repos/RL_audio/notebooks

In [None]:
PWD = %pwd

In [None]:
# IMPORTS
import os
import shutil
import time
import numpy as np
import time
import argparse
from scripts import audio_control
from scripts import ucb1_algorithm as ucb1
from scripts import misc_helpers as mischelp

import sys
from termcolor import colored, cprint
# Termcolor guide: https://pypi.org/project/termcolor/

#  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#  ARGUMENTS & PARSER (Save this code for scripts working with CLI)

# argParser = argparse.ArgumentParser()

# # Enter any valid integer value
# argParser.add_argument("-b", "--budg", required=False, help="select the budget value (dtype=int)")

# # Enter a valid parameter discritization integer (must match sound library size)
# argParser.add_argument("-d", "--disc", required=False, help="select discritization size (dtype=int)")

# # Enter true if you would like to see hidden print log, including Q-tables
# argParser.add_argument("-p", "--prnt", required=False, help="show hidden print log (dtype=bool)")

# # To load and save, simply enter in the base filename such as "lastsave" or "set_A", system takes care of rest
# argParser.add_argument("-s", "--save", required=False, help="filename to save Q-table on exit (dtype=str)") 
# argParser.add_argument("-l", "--load", required=False, help="load Q-table from filename (dtype=str)") 	

## Initializations

In [None]:
# Parameter discritization
param_disc = 3 

state_descriptions = ["Stuck	  \t- robot needs your help", "Successful \t- robot has completed it's task", "Progressing \t- robot is working and doesn't need help", "None of the above"]
num_of_states = len(state_descriptions) - 1 # Adding a minus 1 since the last state in "state_descriptions" is "none of the above"
state_range = np.arange(num_of_states)


# CREATE SOUND LIBRARY A
# For library A, setup the array using libA
library_A = "libA"

# Create an array of size (N x N x N) where N = number of discretized regions
# number of discretized regions for each param --> i.e. if equals 3 then (0, 1, 2)
# ** must align with the discretization for selected sound library
sound_obj_array_A = np.ndarray((param_disc, param_disc, param_disc),dtype=object)

for param_1_range in range(param_disc):
	for param_2_range in range(param_disc):
		for param_3_range in range(param_disc):
			sound_obj_array_A[param_1_range, param_2_range, param_3_range] = audio_control.audio_object(param_1=param_1_range, param_2=param_2_range, param_3=param_3_range, sound_library=library_A)
			
			
# CREATE SOUND LIBRARY B
# For library B, setup the array using libB
library_B = "libB"

# Create an array of size (N x N x N) where N = number of discretized regions
# number of discretized regions for each param --> i.e. if equals 3 then (0, 1, 2)
# ** must align with the discretization for selected sound library
sound_obj_array_B = np.ndarray((param_disc, param_disc, param_disc),dtype=object)

for param_1_range in range(param_disc):
	for param_2_range in range(param_disc):
		for param_3_range in range(param_disc):
			sound_obj_array_B[param_1_range, param_2_range, param_3_range] = audio_control.audio_object(param_1=param_1_range, param_2=param_2_range, param_3=param_3_range, sound_library=library_B)
			

# MAIN STUDY

Welcome to this study's <span style="color:yellow">**Jupyter notebook**</span>. In this work, we are developing strategies for improving human-robot interaction with nonverbal sounds (<span style="color:yellow">**_beeps & boops_**</span>).

While a robot is working on a task, it can have many different internal states... 

If the robot gets stuck behind an obstacle, the robot's internal state is: <span style="color:Red">**Stuck**</span>

Similarly, if the robot was able to reach it's goal, the robot's internal state is: <span style="color:green">**Successful**</span>

If the robot is actively working on the task but has neither gotten stuck nor completed the task, the robot's internal state is: <span style="color:blue">**Progressing**</span>

In this notebook, you will be asked to run through <span style="color:yellow">**3 sections**</span>. In each of these sections, a virtual robot will play a sound. Once you listen to the sound, you will be asked to select which robot state you think the virtual robot is in. You will have the options: <span style="color:Red">**Stuck**</span>, <span style="color:green">**Successful**</span>, <span style="color:blue">**Progressing**</span> and <span style="color:purple">**Not Sure**</span>

In addition to each answer, you will also self-score how confident you are in your response, on a scale from 1 to 10. 

This process will repeat several times as a learning algorithm is processing in the background. <span style="color:yellow">**If you have any questions, simply ask your study moderator**</span>. Have fun!

## SECTION 1

Start by entering your user ID. 
	
<span style="color:yellow">**Click on the first cell below & hit 'shift + enter'...**</span>

In [None]:
current_user_ID_str = mischelp.get_user_ID(parent_dir=PWD, num_of_states=num_of_states)

In [None]:
mischelp.get_user_accuracy(sound_obj_array=sound_obj_array_A, lib_str=library_A, sect_str="sect1", user_ID_str=current_user_ID_str, num_of_states=num_of_states, states_array=np.ndarray(num_of_states, dtype=object), 
					  state_descriptions=state_descriptions, param_disc=param_disc, load_file="pilotset")

In [None]:
mischelp.get_user_accuracy(sound_obj_array=sound_obj_array_B, lib_str=library_B, sect_str="sect1", user_ID_str=current_user_ID_str, num_of_states=num_of_states, states_array=np.ndarray(num_of_states, dtype=object), 
					  state_descriptions=state_descriptions, param_disc=param_disc, load_file="pilotset")

## Section 2

### Section 2X

<span style="color:yellow">**Click on the first cell below & hit 'shift + enter'...**</span>

### Section 2O

<span style="color:yellow">**Click on the first cell below & hit 'shift + enter'...**</span>

In [None]:
# Initializations:
time_step = 0	 		# Initialize time_step to zero
budget = 50	   			# Max number of total iterations 

save_file = current_user_ID_str + "_sect2O"	# Filename to save Q-table on exit
load_file = None  		                    # No loadfile sets matrix to flat (change this to "pilotset" for other)

printer = True			# Either set to True or None (prints hidden statements for debug)



# Initialize to center of mapping
param_1_idx = 1 
param_2_idx = 1
param_3_idx = 1


# Re-Initialize states array. Each state is initialized with a Q-table based on load_file
states_array = np.ndarray(num_of_states, dtype=object)
for state_idx in range(num_of_states):
	states_array[state_idx] = ucb1.robot_state(state_idx=state_idx, description=state_descriptions[state_idx], param_disc=param_disc, 
											   load_file=load_file, user_ID_str=current_user_ID_str)

	
for i in range(0, budget):
	
	current_state_index = np.random.randint(0, 3) 		# Current actual state of the robot - change this to fluctuate during study

	if time_step == 0 and load_file == None:
		param_1_idx = 1 
		param_2_idx = 1
		param_3_idx = 1
	else:
	# Select new params
		param_1_idx, param_2_idx, param_3_idx = states_array[current_state_index].action_selection()

	time_step_str = f"{time_step:02}"
    print(f"Time Step: {time_step_str}"
	time_step += 1
	
	print("\n----------------------------------------------------------------")
	print("~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~")
	print("----------------------------------------------------------------\n")

	if printer:
		print("(Hidden):")
		print(f"Current actual state of robot: {current_state_index}\n")
		print(f"New Param INDICES (not direct values): \nP1: {param_1_idx} (Beats per Minute - BPM) \nP2: {param_2_idx} (Beeps per Loop - BPL) \nP3: {param_3_idx} (Amplitude of Pitch Change)\n")


	# Play the desired mp3 file, probe user based on sound, then update the Q-Value look-up table...

	# Probe user for perceived state & confidence in their response
	probed_state_index, probed_confidence = sound_obj_array_A[param_1_idx, param_2_idx, param_3_idx].probe(state_descriptions)

	# Update N for audio obj
	sound_obj_array_A[param_1_idx, param_2_idx, param_3_idx].update()

	# Calculate uncertainty signal (U_t) based on N and time_step
	uncertainty_signal = sound_obj_array_A[param_1_idx, param_2_idx, param_3_idx].uncertainty(time_step)

	# For each state, calculate the respective reward signal (R)
	for state_idx in range(num_of_states):

		if probed_state_index == len(state_descriptions) - 1:
			reward_signal = 0.0

		else:
			if probed_state_index == state_idx:
				correct_multiplier = 1.0
			elif probed_state_index != state_idx:
				correct_multiplier = -1.0

			# This is the reward signal R
			reward_signal = correct_multiplier * probed_confidence

		# Calculate new Q_t = {[(1 - 1/n) * Q_t-1] + [(1/n) * R]} + U_t   ~  UCB1 algorithm update equation
		# Takes the mean of previously observed reward and new reward, adding on an uncertainty term
		Q_value = ((1 - 1.0/sound_obj_array_A[param_1_idx, param_2_idx, param_3_idx].n) * states_array[state_idx].action_value_lookup[param_1_idx, param_2_idx, param_3_idx] + (1.0/sound_obj_array_A[param_1_idx, param_2_idx, param_3_idx].n) * reward_signal) + uncertainty_signal

		# Update value in lookup table for state S with new Q_t
		# Added an np.clip so that the mix/max Q-Value in the table cant exceed -10 to +10
		states_array[state_idx].action_value_lookup[param_1_idx, param_2_idx, param_3_idx] = np.clip(Q_value, -10, 10)

		if printer:
			print("\n\n----------------------------------------------------------------\n")
			print("(Hidden):")
			print(f"Uncertainty_signal (U):\t {uncertainty_signal}")
			print(f"Reward_signal (R):\t {reward_signal}")
			print(f"New action value (Q):\t {Q_value}")
			print(f"Q-table after update for state {state_idx}:\n")
			print(states_array[state_idx].action_value_lookup)

		
		np.save("user_data/user_" + current_user_ID_str + "/arrays/" + save_file + "_step" + time_step_str + "_st" + str(state_idx) + ".npy", states_array[state_idx].action_value_lookup)

		time.sleep(1) # Put here to make UI a bit nicer 

## Closing Survey

Thank you for completing this Jupyter Notebook. Please click the folliwng link to answer a short post-study questionnaire.

<span style="color:yellow">**Pre-study Questionnaire:**</span> https://forms.gle/K6RnncY82vSVdyE38   (Participant ID Required)

### NOTES & DEBUG 

In [None]:
%whos

Creating buttons and widgets: https://medium.com/@technologger/how-to-interact-with-jupyter-33a98686f24e