Skip to content

Commit

Permalink
english correction in planning
Browse files Browse the repository at this point in the history
  • Loading branch information
jgrizou committed Dec 4, 2014
1 parent a8a4229 commit bfefb10
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 21 deletions.
30 changes: 15 additions & 15 deletions chapters/planning/planning.tex
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ \chapter{Planning upon Uncertainty}

In the previous chapter, we presented our algorithm allowing to solve a task from unlabeled human instruction signals. We have seen that the performance of our system is affected by the action selection method used by our robot. In this section, we investigate how the agent should plan its action to improve its learning efficiency. To do so, our agent will look for actions that disambiguate between hypothesis, i.e. which reduce the uncertainty about which hypothesis is the correct one.

We start by explaining what are the methods and measures of uncertainty used by a system that has access the meanings of the teaching signals. We then provide an intuitive explanation of what are the additional sources of uncertainty inherent to our problem. We will see that this problem is linked to the symmetries properties described in chapter~\ref{chapter:lfui:symmetries}. We then propose two ways of estimating the uncertainty, one on the signal space and one projected on the meaning space. We finally present simulated experiments showing that our measure of uncertainty allows the robot to plan its actions in order to disambiguate faster between hypotheses. These results considered datasets of different qualities and dimensionality, we will see that the performance of the system is affected by the quality of the data more than their dimensionality.
We start by explaining what are the methods and measures of uncertainty used by a system that has access to the meanings of the teaching signals. We then provide an intuitive explanation of what are the additional sources of uncertainty inherent to our problem. We will see that this problem is linked to the symmetries properties described in chapter~\ref{chapter:lfui:symmetries}. We then propose two ways of estimating the uncertainty, one on the signal space and one projected on the meaning space. We finally present simulated experiments showing that our measure of uncertainty allows the robot to plan its actions in order to disambiguate faster between hypotheses. These results considered datasets of different qualities and dimensionality, we will see that the performance of the system is affected by the quality of the data more than their dimensionality.

% An important feature of our algorithm is its ability to detect when the teaching are not of good enough quality, in such case our algorithm will not select between hypothesis. cases it can not discriminate between task,

Expand All @@ -25,7 +25,7 @@ \section{Uncertainty for known signal to meaning mapping}

% and that it still has to identify the correct task between a finite set of task and has access to the interaction frame,

If the mapping between instruction signals and their meanings is provided to the machine, the learning process is rather trivial. The robot should only compares, for each task, whether the meaning received from the human matches with the meaning predicted by the frame. If the meanings match, the probability of the task is increased, if they do not match the probability is decreased.
If the mapping between instruction signals and their meanings is provided to the machine, the learning process is rather trivial. The robot should only compare, for each task, whether the meaning received from the human matches with the meaning predicted by the frame. If the meanings match, the probability of the task is increased, if they do not match the probability is decreased.

To accelerate its learning progress, the robot must therefore seek for state-action pairs that maximally disambiguate between hypotheses. For example, if for one given state-action pair, half of the hypotheses expect a signal of meaning ``correct'' while the other half expect one signal of meaning ``incorrect'', there is high uncertainty on that action. By performing this action in that state, once the user provides its feedback, the system can rule out half of the hypotheses.

Expand All @@ -37,7 +37,7 @@ \section{Uncertainty for known signal to meaning mapping}

% by for example finding the optimal policy based on the uncertainty map (using for example reinforcement learning methods) scenario the agent select the action that maximizes uncertainty reduction on the task in the long term.

Measuring uncertainty on the task is the basic principle of active learning for inverse reinforcement learning problems \cite{macl09airl}. The idea is to take a query-by-committee approach, where each member of the committee, i.e. each task hypothesis $\xi_k$, votes according to its weight in the committee, to its respective probability $p(\xi_k)$..
Measuring uncertainty on the task is the basic principle of active learning for inverse reinforcement learning problems \cite{macl09airl}. The idea is to take a query-by-committee approach, where each member of the committee, i.e. each task hypothesis $\xi_k$, votes according to its weight in the committee, to its respective probability $p(\xi_k)$.

We can define a vector that accumulates the weighted optimal actions of each hypothesis:
%
Expand Down Expand Up @@ -73,7 +73,7 @@ \section{Where is the uncertainty?}

In order to exemplify the specificity of the uncertainty for our problem, we rely again on our T world scenario and compare the effects of different action selection strategies. We remind that the teacher wants the robot to reach the left edge of the T (G1).

If the agent knew how to interpret the teaching signals, i.e. which signal corresponds to ``correct'' or ``incorrect'' feedback, the optimal actions to discriminate G1 and G2 is to move from right to left in the top part of the T. However, as the classifier is not given, we build a different model for each hypothesis (see Figure~\ref{fig:planningrightleft}). As a result, we end-up with symmetric interpretation of the signals, which are both as valid and do not allow to differentiate between hypothesis.
If the agent knew how to interpret the teaching signals, i.e. which signal corresponds to ``correct'' or ``incorrect'' feedback, the optimal action to discriminate G1 and G2 is to move from right to left in the top part of the T. However, as the classifier is not given, we build a different model for each hypothesis (see Figure~\ref{fig:planningrightleft}). As a result, we end-up with symmetric interpretation of the signals, which are both as valid and do not allow to differentiate between hypothesis.

\begin{figure}[!htbp]
\centering
Expand Down Expand Up @@ -298,7 +298,7 @@ \subsubsection*{Equations}

This measure has the important advantage of using the same equations as the one used for computing the likelihood of each task (chapter~\ref{chapter:lfui:likelihood}). Additionally, we do not have to compute the similarity between continuous distributions, and only rely on the classifiers, that are already computed. We only need to compute the predicted labels ($l^c$) associated to the sampled signals ($e$) once per hypothesis. Then, to compute the full uncertainty map for each state and action pair, we have to compare these predicted labels with the expected labels ($l^f$) from each state-action pair and each hypothesis.

We note $J^{\xi_t}(s,a,e) = p(l^c = l^f | s, a, e, \theta_{xi_t}, \xi_t)$, which is Equation~\ref{eq:matchingoverfitting}) given the classifier $\theta_{\xi_t}$ associated to task $\xi_t$ and a particular state, action, and signal. We note $J^{\xi}(s,a,e)$ the vector $[J^{\xi_1}(s,a,e), \ldots, J^{\xi_T}(s,a,e)]$. And $W_{i}^{\xi} = [W^{\xi_1}, \ldots, W^{\xi_T}]$ the weights associated to each hypothesis. Such weights can be the one defined in Equation~\ref{eq:probapairwise} (i.e. the minimum of pairwise normalized likelihoods) or the probabilities from Equation~\ref{eq:probanormalize} (i.e. the normalized likelihoods).
We note $J^{\xi_t}(s,a,e) = p(l^c = l^f | s, a, e, \theta_{xi_t}, \xi_t)$, which is Equation~\ref{eq:matchingoverfitting} given the classifier $\theta_{\xi_t}$ associated to task $\xi_t$ and a particular state, action, and signal. We note $J^{\xi}(s,a,e)$ the vector $[J^{\xi_1}(s,a,e), \ldots, J^{\xi_T}(s,a,e)]$. And $W_{i}^{\xi} = [W^{\xi_1}, \ldots, W^{\xi_T}]$ the weights associated to each hypothesis. Such weights can be the one defined in Equation~\ref{eq:probapairwise} (i.e. the minimum of pairwise normalized likelihoods) or the probabilities from Equation~\ref{eq:probanormalize} (i.e. the normalized likelihoods).

The uncertainty of one state-action pair ($(s,a)$) given a signal $e$ is computed as the weighted variance of the joint probabilities:

Expand All @@ -318,7 +318,7 @@ \subsubsection*{Equations}
\end{eqnarray}
with $p(e)$ assumed uniform.

Signal samples ($e$) could be sampled randomly in the all feature space. However, there is a high risk of taking non-relevant samples, as well as likely practical computational problem for some classifiers. In practice, it is better to sample some signals from our past history of interaction, which may lead to overfitting problems that can be solved by using a cross validation procedure.
Signal samples ($e$) could be sampled randomly in all the feature space. However, there is a high risk of taking non-relevant samples, as well as likely practical computational problem for some classifiers. In practice, it is better to sample some signals from our past history of interaction, which may lead to overfitting problems that can be solved by using a cross validation procedure.

Our measure of uncertainty $U(s,a)$ will be higher when, for a given state-action there is a high incongruity of expectation between each hypothesis and according to the probability of each hypothesis. This measure is then used as a classical exploration bonus method. We provide an example of planning using this method in the following of this chapter.

Expand All @@ -328,9 +328,9 @@ \subsubsection*{Equations}

\subsection{Why not building model first}

A usual question concerning Figure~\ref{fig:planningupdown}, is why don't we first select state-action pairs which lead to unequivocal interpretation of the signals? Indeed, it allows to first build a database of known signal-label pairs. The resulting classifiers could then be use to classify further teaching signals, as in a calibration procedure.
A usual question concerning Figure~\ref{fig:planningupdown}, is why don't we first select state-action pairs which lead to unequivocal interpretation of the signals? Indeed, it allows to first build a database of known signal-label pairs. The resulting classifiers could then be used to classify further teaching signals, as in a calibration procedure.

Obviously this is not always possible, for example if we add a third hypothesis G3, that is at the bottom of the T trunk, it is not more possible to find actions leading to an unequivocal interpretation are the received signal. Neither the left and right actions (Figure~\ref{fig:planning3hyprightleft}), nor the up and down actions (Figure~\ref{fig:planning3hypupdown}) alone allow to have an unequivocal interpretation of the teaching signals. However taking all the actions and exploring all the state space still highlight hypothesis 1 (G1) has being the goal state the user as in mind (Figure~\ref{fig:planning3hyp}).
Obviously this is not always possible, for example if we add a third hypothesis G3, that is at the bottom of the T trunk, it is no more possible to find actions leading to an unequivocal interpretation of the received signal. Neither the left and right actions (Figure~\ref{fig:planning3hyprightleft}), nor the up and down actions (Figure~\ref{fig:planning3hypupdown}) alone allow to have an unequivocal interpretation of the teaching signals. However taking all the actions and exploring all the state space still highlight hypothesis 1 (G1) as being the goal state the user as in mind (Figure~\ref{fig:planning3hyp}).

In all the experiments presented in this thesis, there are no state-action pairs allowing for an unequivocal interpretation of the teaching signal.

Expand All @@ -346,14 +346,14 @@ \subsection{Why not building model first}
\begin{figure}[!htbp]
\centering
\includegraphics[width=\threetworldsize\columnwidth]{\visualspdf/planning/Tworld_feedback_3hyp_up_down_no_bump.pdf}
\caption{Interpretation hypothesis made by the agent according to G1 (left), G2 (right), and G3 (middle). The agent performs only up and down actions. The labels associated to G1 and G2 are similar but he labels associated to G3 are symmetric. Up and down actions do not create an unequivocal interpretation of signal considering these three hypotheses. Moreover u and down actions do not allow to discard any of the hypothesis.}
\caption{Interpretation hypothesis made by the agent according to G1 (left), G2 (right), and G3 (middle). The agent performs only up and down actions. The labels associated to G1 and G2 are similar but the labels associated to G3 are symmetric. Up and down actions do not create an unequivocal interpretation of signal considering these three hypotheses. Moreover u and down actions do not allow to discard any of the hypothesis.}
\label{fig:planning3hypupdown}
\end{figure}

\begin{figure}[H]
\centering
\includegraphics[width=\threetworldsize\columnwidth]{\visualspdf/planning/Tworld_feedback_3hyp.pdf}
\caption{Interpretation hypothesis made by the agent according to G1 (left), G2 (right), and G3 (middle). The agent performs all possible actions. The labels associated to G1 are more coherent than with the spacial organization of the data than the labels associated to G2 and G3, which tells us G1 is the task the user has in mind.}
\caption{Interpretation hypothesis made by the agent according to G1 (left), G2 (right), and G3 (middle). The agent performs all possible actions. The labels associated to G1 are more coherent than with the spatial organization of the data than the labels associated to G2 and G3, which tells us G1 is the task the user has in mind.}
\label{fig:planning3hyp}
\end{figure}

Expand Down Expand Up @@ -402,7 +402,7 @@ \subsection{Signal properties and classifier}

where $\theta$ represents the ML estimates (mean $\mu_l$ and covariance $\Sigma_l$ for each class $l$) required to estimate the marginal under the Jeffreys prior, $n$ is the number of signals, and $d$ is the dimensionality of a signal feature vector.

Finally to compute the probability of a label given a signal, we use the bayes rules as follows:
Finally to compute the probability of a label given a signal, we use the Bayes rules as follows:
%
\begin{eqnarray}
p(l = l_i|e,\theta) &=& \frac{p(e|l = l_i, \theta)p(l = l_i)}{\sum_{k = 1,\ldots, L}{p(e|l = l_k,\theta)p(l = l_k)}}\nonumber \\
Expand All @@ -415,11 +415,11 @@ \subsection{Task Achievement}

We use Equation~\ref{eq:matchingfiltercrossvalidation} to compute the likelihood of each task using a 10 fold cross-validation to compute the confusion matrix. It implies we train 250 classifiers at each iteration. To compute the probability of each task, we will rely on the minimum of pairwise normalized likelihood measure as defined in Equation~\ref{eq:probapairwise}.

A task is considered completed when the confidence level $\beta$ as been reached for this task and the agent is located at the task associated goal state. If the corresponding state is the one intended by the user, it is a success. Whatever the success or failure of the first task, the user selects a new task, i.e. a new goal state, randomly. The agent resets the task likelihoods, propagates the previous task labels to all hypothesis, and the teaching process starts again. At no point the agent has access to a measure of its performance, it can only refer to the unlabeled feedback signals from the user.
A task is considered completed when the confidence level $\beta$ has been reached for this task and the agent is located at the task associated goal state. If the corresponding state is the one intended by the user, it is a success. Whatever the success or failure of the first task, the user selects a new task, i.e. a new goal state, randomly. The agent resets the task likelihoods, propagates the previous task labels to all hypothesis, and the teaching process starts again. At no point the agent has access to a measure of its performance, it can only refer to the unlabeled feedback signals from the user.

\subsection{Evaluation scenarios}

Using our artificial datasets, three different evaluations are performed: \begin{inparaenum}[(i)] \item the performance of our proposed planning strategy versus a) random action selection, b) greedy action selection, and c) a task-only uncertainty based method; \item the time required by the agent to complete the first task (i.e. to reach the first target with confidence), and \item the number of tasks that can be completed in 500 iterations. \end{inparaenum}
Using our artificial datasets, three different evaluations are performed: \begin{inparaenum}[(i)] \item the performance of our proposed planning strategy versus a) random action selection, b) greedy action selection, and c) the task-only uncertainty based method; \item the time required by the agent to complete the first task (i.e. to reach the first target with confidence), and \item the number of tasks that can be completed in 500 iterations. \end{inparaenum}

\subsection{Settings}

Expand All @@ -438,7 +438,7 @@ \section{Illustration of the grid world scenario}
\begin{figure}[!htbp]
\centering
\includegraphics[width=0.85\columnwidth]{\visualspdf/gridworld/gridworld_feedback.pdf}
\caption{A schematic view of a 3x3 grid world scenario. There are nine possible hypotheses and the agent is acting randomly for this example. We show the results of the labeling process considering the feedback frame. The teacher is providing feedback with respect to hypothesis 1. The labeling process for hypothesis 1 is more coherent with the spacial organization of the data, which indicates it is the one taught by the user. Hypothesis 9 has symmetric properties with hypothesis 1 but the use of the ``no move'' action allows breaking that symmetry.}
\caption{A schematic view of a 3x3 grid world scenario. There are nine possible hypotheses and the agent is acting randomly for this example. We show the results of the labeling process considering the feedback frame. The teacher is providing feedback with respect to hypothesis 1. The labeling process for hypothesis 1 is more coherent with the spatial organization of the data, which indicates it is the one taught by the user. Hypothesis 9 has symmetric properties with hypothesis 1 but the use of the ``no move'' action allows breaking that symmetry.}
\label{fig:planning:gridworldfeedback}
\end{figure}

Expand Down Expand Up @@ -501,7 +501,7 @@ \section{Discussion}

In this chapter, we presented a planning method allowing reducing the number of iterations needed to identify the correct task from unlabeled teaching signals. This method was based on assigning an uncertainty value to each state-action pair. By asking the agent to look for the most uncertain state-action pair, it can collect more useful data to disambiguate faster between the hypotheses. We identified two sources of uncertainty, one coming from the task and the other coming from the signal model associated to each task hypothesis. We presented two methods to measure this uncertainty. The first method measures the uncertainty on the expected signals between each hypothesis. The second method measures uncertainty on the meaning space by making hypothesis on future observed signals.

We want to apply this algorithm to a more concrete scenario with real users. In next chapter, we present a brain computer interaction scenario following the reaching task presented in this section. But instead of using artificial data, we will investigate own our algorithm scale to brain signals, first in simulation and then during online experiments with real subjects.
We want to apply this algorithm to a more concrete scenario with real users. In next chapter, we present a brain computer interaction scenario following the reaching task presented in this section. But instead of using artificial data, we will investigate how our algorithm scales to the use of brain signals, first in simulation and then during online experiments with real subjects.

% The application of this work to BCI is a joint collaboration with I{\~n}aki Iturrate and Luis Montesano.

Loading

0 comments on commit bfefb10

Please sign in to comment.