# Glove Embeddings Demonstration

This notebook is meant to demonstrate the features of Glove Embeddings to see how they can potentially be used in conjunction with the COCO dataset to numerically analyze explanations and such. The link to download the embeddings is [here](https://nlp.stanford.edu/projects/glove), and I downloaded the **6b** one with Wikipedia and all.

## Imports that may be Necessary

```python
from collections import defaultdict
import numpy as np
import gensim
from gensim.models.keyedvectors import KeyedVectors
from sklearn.decomposition import TruncatedSVD
import matplotlib.pyplot as plt
%matplotlib inline
```
*Note: You may need to conda install gensim*

In [None]:
#Put the path to glove here
path = r"./dat/glove.6B.50d.txt.w2v"

#Now load the model into the variable "glove" (may take some time)
glove = KeyedVectors.load_word2vec_format(path, binary=False)

## How to Use Glove
```python
glove["word"] # Will give glove embedding vector for the word

"word" in glove #Checks if word is in glove (acts like a dictionary

glove["husband"] - glove["man"] + glove["woman"] #Should give representation that is wife

#To find most similar term to a vector:
    
glove.similar_by_vector(query)

#More advanced way to do this

glove.most_similar_cosmul(positive=['husband', 'woman'], negative=['man'])

#Since they are vectors, we can find the distance using dot products
```


# General Ideas for Symbolic Reasoning

Suppose we had a bunch of **labels** from each of the outputs for the subsystems (the sem. seg & the two captions), we try to figure out how close all the different labels are across different systems.

We figure out a threshold and if 2-3 of them are super close in their vectors, and another one is not, we suspect that one, generally, but here are some other general ideas:

- Since each subsystem will present its own set of labels, (all the labels must be relatively close to each other) if **any of them seem abnormaly far away from the others** (maybe we can scramble to see this), then we say that it is not reasonable
- We do the same thing across multiple ones as well, and see the distances (maybe min distances) and try to figure out at a high level who is not reasonable
- We combine these local, and high-level checks with symbolic checks to determine overall reasonability