# How are compositionality and mutual exclusivity related?
## Some experiments

### Reproducing original experiments
First, as a baseline, we look at the mutual exclusivity experiment from the original paper. I re-ran the experiment with only one change: instead of the the input language having 4 symbols, and 1 being left out of the support set, I used a language with 5 symbols, with one being left out of the support set again. Given that this is a straightforward extension of the original experiment, we expect no changes.

In [3]:
!python train.py --episode_type ME --fn_out_model net_ME.tar

Results file already exists. Loading file and evaluating...
Loading model: out_models/net_ME.tar
 Loading epoch 10000 of 10000
EncoderMetaNet specs:
 nlayers=2
 embedding_dim=200
 dropout=0.5
 bi_encoder=True
 n_input_symbols=7
 n_output_symbols=7

AttnDecoderRNN specs:
 nlayers=2
 hidden_size=200
 dropout=0.5
 n_output_symbols=7

Acc Retrieval (train): 100.0
Acc Generalize (train): 100.0

Evaluation episode 0
  support items: 
     fep -> YELLOW
     dax -> GREEN
     lug -> BLUE
     zup -> PURPLE
  retrieval items; 100.0% correct
     fep -> YELLOW
     dax -> GREEN
     lug -> BLUE
     zup -> PURPLE
  generalization items; 100.0% correct
     fep wif wif fep fep -> YELLOW RED RED YELLOW YELLOW
     wif dax lug lug -> RED GREEN BLUE BLUE
     zup fep -> PURPLE YELLOW
     wif fep zup fep fep wif -> RED YELLOW PURPLE YELLOW YELLOW RED
     fep zup dax -> YELLOW PURPLE GREEN
     wif fep fep wif -> RED YELLOW YELLOW RED
     dax zup fep zup wif dax -> GREEN PURPLE YELLOW PURPLE RED G

As expected, the model is able to capture both mutual exclusivity and compositionality in this setting.

### Modifying the experimental paradigm
To study whether the model can learn compositionality even when mutual exclusivity does not hold, I created a new type of episode (`CompNoME`) that can test for this property. The number of symbols in the language remains the same as before, and the structure of the support set also remains the same -- 5 symbols, of which 4 are provided with their mapping as the support set.

The change is in the mapping for the 5th symbol, which is set to the same as one of the 4 symbols in the support set. This mapping is them used to create the query examples as before. Since the method of creating query examples is the same, this experiment still tests whether the model can capture compositionality. However, since two symbols in the input language are mapped to the same output symbol, the mapping is no longer mutually exclusive.

In [1]:
!python train.py --episode_type CompNoME --fn_out_model net_CompNoME.tar

RED GREEN RED GREEN BLUE GREEN)
     lug wif wif -> GREEN GREEN GREEN
     wif fep -> GREEN BLUE

Evaluation episode 3
  support items: 
     zup -> YELLOW
     dax -> BLUE
     lug -> RED
     wif -> GREEN
  retrieval items; 100.0% correct
     lug wif wif wif lug wif -> RED GREEN GREEN GREEN RED GREEN
     wif lug dax -> GREEN RED BLUE
     lug zup wif dax zup -> RED YELLOW GREEN BLUE YELLOW
     lug wif zup zup zup -> RED GREEN YELLOW YELLOW YELLOW
     wif wif dax -> GREEN GREEN BLUE
     wif wif -> GREEN GREEN
     lug lug -> RED RED
     zup zup dax -> YELLOW YELLOW BLUE
     dax lug lug zup zup -> BLUE RED RED YELLOW YELLOW
     zup -> YELLOW
     dax -> BLUE
     lug -> RED
     wif -> GREEN
  generalization items; 27.273% correct
     fep lug lug dax zup fep -> RED RED RED BLUE YELLOW RED (** target: GREEN RED RED BLUE YELLOW GREEN)
     wif dax fep -> GREEN BLUE GREEN
     dax fep zup zup zup -> BLUE BLUE YELLOW YELLOW YELLOW (** target: BLUE GREEN YELLOW YELLOW YELLOW)
     

When we train the model on the `CompNoME` task, we see that the model is still able to capture compositionality, but only among symbols it has seen. This is seen in the 100% retrieval accuracy (retrieval here is redefined to mean query samples that only contain input symbols present in the support set). 

The generalisation set (defined here to mean query set samples that contain the symbol omitted from the support), however, does not cause the model to fail completely. The model still has an accuracy on these samples of 23.9%, which is close to the chance of choosing the correct mapping for the held-out symbol uniformly at random (25%). So, we observe that the model does not consistently map the held-out symbol these experiments, as was the case in the original mutual exclusivity experiments.