Skip to content

Commit

Permalink
Cleanup for workshop camera ready.
Browse files Browse the repository at this point in the history
  • Loading branch information
Sam Bowman committed Jun 18, 2015
1 parent 1abcb7d commit adabdda
Show file tree
Hide file tree
Showing 6 changed files with 57 additions and 54 deletions.
2 changes: 1 addition & 1 deletion writing/F'14 paper/cameraready/intro.tex
@@ -1,6 +1,6 @@
\section{Introduction}\label{sec:intro}

Tree-structured recursive neural network models (TreeRNNs; \citealt{goller1996learning}) for sentence meaning
Tree-structured recursive neural network models (TreeRNNs; \citealt{goller1996learning,socher2011semi}) for sentence meaning
have been successful in an array of sophisticated language tasks,
including sentiment analysis \cite{socher2011semi,irsoydeep},
image description \cite{sochergrounded}, and paraphrase detection
Expand Down
54 changes: 27 additions & 27 deletions writing/F'14 paper/cameraready/join.tex
Expand Up @@ -32,7 +32,7 @@ \section{Reasoning about semantic relations}\label{sec:join}
full set of sound such inferences on pairs of premise relations is depicted in
Table~\ref{tab:jointable}. Though these basic inferences do not involve compositional
sentence representations, any successful reasoning using compositional representations
will rely on the ability to perform sound inferences of this kind, so our first experiment studies how well each model can learn to perform them them in isolation.
will rely on the ability to perform sound inferences of this kind in order to be able to use unseen relational facts within larger derivations. Our first experiment studies how well each model can learn to perform them them in isolation.

% about the relations themselves that do not depend on the
% internal structure of the things being compared. For example, given
Expand All @@ -46,29 +46,6 @@ \section{Reasoning about semantic relations}\label{sec:join}
% $a \natneg b$ and $b~|~c$ then $a \sqsupset c$.


\paragraph{Experiments}
We begin by creating a world model
on which we will base the statements in the train and test sets.
This takes the form of a small Boolean structure in which terms denote
sets of entities from a small domain. Fig.~\ref{lattice-figure}a
depicts a structure of this form with three entities ($a$, $b$, and $c$) and eight proposition terms ($p_1$--$p_8$). We then generate a
relational statement for each pair of terms in the model, as shown in Fig.~\ref{lattice-figure}b.
We divide these statements evenly into train and test sets, and delete the test set
examples which cannot be proven from the train examples, for which there is not enough information for even an ideal system to choose a correct label.
In each experimental run, we create a model with 80 terms over a domain of 7 elements, yielding a training set of 3200 examples and a test set of
2960 examples.

We trained models with both the NN and NTN comparison functions on these
data sets.\footnote{Since this task relies crucially on the learning of a pair of vectors, no simpler version of our model is a viable baseline.} %+%
In both cases, the models are implemented as
described in \S\ref{methods}, but since the items being compared
are single terms rather than full tree structures, the composition
layer is not used, and the two models are not recursive. We simply present
the models with the (randomly initialized) embedding vectors for each
of two terms, ensuring that the model has no information about the terms
being compared except for the relations between them that appear in training.


\begin{figure}[t]
\centering
\begin{subfigure}[t]{0.45\textwidth}
Expand Down Expand Up @@ -106,7 +83,7 @@ \section{Reasoning about semantic relations}\label{sec:join}

\labelednode{2.5}{0.5}{}{}
\end{picture}}
\caption{Example boolean structure. The terms $p_1$--$p_8$ name the sets. Not all sets have names, and some sets have multiple names, so that learning $\nateq$ is non-trivial.}
\caption{Example boolean structure, shown with edges idicating inclusion. The terms $p_1$--$p_8$ name the sets. Not all sets have names, and some sets have multiple names, so that learning $\nateq$ is non-trivial.}
\end{subfigure}
\qquad\small
\begin{subfigure}[t]{0.43\textwidth}
Expand All @@ -126,7 +103,7 @@ \section{Reasoning about semantic relations}\label{sec:join}
\end{tabular}

\caption{A few examples of atomic statements about the
model. Test statements that are not provable from the training data shown are
model depicted above. Test statements that are not provable from the training data shown are
crossed out.}
\end{subfigure}
\caption{Small example structure and data for learning relation composition.}
Expand All @@ -150,14 +127,37 @@ \section{Reasoning about semantic relations}\label{sec:join}
\label{joinresultstable}
\end{table}

\paragraph{Experiments}
We begin by creating a world model
on which we will base the statements in the train and test sets.
This takes the form of a small Boolean structure in which terms denote
sets of entities from a small domain. Fig.~\ref{lattice-figure}a
depicts a structure of this form with three entities ($a$, $b$, and $c$) and eight proposition terms ($p_1$--$p_8$). We then generate a
relational statement for each pair of terms in the model, as shown in Fig.~\ref{lattice-figure}b.
We divide these statements evenly into train and test sets, and delete the test set
examples which cannot be proven from the train examples, for which there is not enough information for even an ideal system to choose a correct label.
In each experimental run, we create a model with 80 terms over a domain of 7 elements, yielding a training set of 3200 examples and a test set of
2960 examples.

We trained models with both the NN and NTN comparison functions on these
data sets.\footnote{Since this task relies crucially on the learning of a pair of vectors, no simpler version of our model is a viable baseline.} %+%
In both cases, the models are implemented as
described in \S\ref{methods}, but since the items being compared
are single terms rather than full tree structures, the composition
layer is not used, and the two models are not recursive. We simply present
the models with the (randomly initialized) embedding vectors for each
of two terms, ensuring that the model has no information about the terms
being compared except for the relations between them that appear in training.


\paragraph{Results}
The results (Table \ref{joinresultstable}) show that NTN is able to accurately encode the relations between the terms in the geometric relations between their vectors,
and is able to then use that information to recover relations that
are not overtly included in the training data. The NN also generalizes fairly well,
but makes enough errors that it remains an open question whether
it is capable of learning representations with these properties.
It is not possible for us to rule out the possibility that different optimization techniques or
further hyperparameter tuning could lead an NN model to succeed here.
finer-grained hyperparameter tuning could lead an NN model to succeed.

As an example from our test data, both models correctly labeled $p_1 \natfor p_3$, potentially learning from the training examples $\{p_1 \natfor p_{51},~p_3 \natrev p_{51}\}$ or $\{p_1\natfor p_{65},~p_3 \natrev p_{65} \}$. On another example involving comparably frequent relations, the NTN correctly labeled $p_6 \natrev p_{24}$, likely on the basis of the training examples $\{p_6 \natcov p_{28},~p_{28} \natneg p_{24}\}$, while the NN incorrectly assigned it $\natind$.

Expand Down
10 changes: 5 additions & 5 deletions writing/F'14 paper/cameraready/methods.tex
Expand Up @@ -6,8 +6,8 @@ \section{Tree-structured neural networks} \label{methods}
compositionality}, which says that the meanings for complex
expressions are derived from the meanings of their parts
via specific composition functions \cite{Partee84,Janssen97}. In our
distributed setting, word meanings are embedding vectors of dimension $n$. A learned
composition function maps pairs of them to single phrase vectors of dimension $n$,
distributed setting, word meanings are embedding vectors of dimension $N$. A learned
composition function maps pairs of them to single phrase vectors of dimension $N$,
which can then be merged again to represent more complex
phrases, forming a tree structure. Once the entire sentence-level representation has been
derived at the top of the tree, it serves as a fixed-dimensional input for some subsequent layer function.
Expand Down Expand Up @@ -45,9 +45,9 @@ \section{Tree-structured neural networks} \label{methods}
Here, $\vec{x}^{(l)}$ and $\vec{x}^{(r)}$ are the column vector
representations for the left and right children of the node, and
$\vec{y}$ is the node's output. The TreeRNN concatenates them, multiplies
them by an $n \times 2n$ matrix of learned weights, and adds a bias $\vec{b}$.
them by an $N \times 2N$ matrix of learned weights, and adds a bias $\vec{b}$.
The TreeRNTN adds a learned full rank third-order tensor
$\mathbf{T}$, of dimension $n \times n \times n$, modeling
$\mathbf{T}$, of dimension $N \times N \times N$, modeling
multiplicative interactions between the child vectors.
The comparison layer uses the same layer function as the
composition layers (either an NN layer or an NTN layer) with
Expand Down Expand Up @@ -82,5 +82,5 @@ \section{Tree-structured neural networks} \label{methods}
as the harmonic mean of average precision and average recall, both computed
for all classes for which there is test data, setting precision to 0
where it is not defined.}
Source code and generated data will be released after the review period.
Source code and generated data can be downloaded from \url{http://stanford.edu/~sbowman/}.

2 changes: 1 addition & 1 deletion writing/F'14 paper/cameraready/quantifiers.tex
Expand Up @@ -56,7 +56,7 @@ \section{Reasoning with quantifiers and negation}\label{sec:quantifiers}
% yields 66k sentence pairs. Some examples of these data are provided
% in Table~\ref{examplesofdata}.

In each run, we randomly partition the set of valid \textit{single sentences} into train and test, and then label all of the pairs from within each set to generate a training set of 27k pairs and a test set of 7k pairs. Because the model doesn't see the test sentences at training time, it cannot directly use the kind of reasoning described in \S\ref{sec:join} (treating sentences as unanalyzed symbols), and must instead infer the word-level relations and learn a complete reasoning system over them for our logic.
In each run, we randomly partition the set of valid \textit{single sentences} into train and test, and then label all of the pairs from within each set to generate a training set of 27k pairs and a test set of 7k pairs. Because the model doesn't see the test sentences at training time, it cannot directly use the kind of reasoning described in \S\ref{sec:join} at the sentence level (by treating sentences as unanalyzed symbols), and must instead jointly learn the word-level relations and a complete reasoning system over them for our logic.

We use the same summing baseline as in \S\ref{sec:recursion}.
The highly consistent sentence structure in this experiment means that this model
Expand Down
2 changes: 1 addition & 1 deletion writing/F'14 paper/cameraready/recursion.tex
Expand Up @@ -77,7 +77,7 @@ \section{Recursive structure}\label{sec:recursion}
$\plneg\, (\plneg p_1 \pland \plneg p_2)$ & $\nateq$ & $(p_1 \plor p_2)$ \\
\bottomrule
\end{tabular}
\caption{Examples of the type of statements used for training and testing. These are relations between
\caption{Short examples of the type of statements used for training and testing. These are relations between
well-formed formulae, computed in terms of sets of satisfying
interpretation functions $\sem{\cdot}$.}\label{tab:plexs}
\end{subtable}
Expand Down

0 comments on commit adabdda

Please sign in to comment.