# Binding and Unbinding Vectors

This notebook has as purpose to study the binding and unbinding of different vectors.

The vectors might be good to obtain them from Word2Vec or some other methodology

In [1]:
import numpy as np
import torch
#import pytorch_fft.fft.autograd as fft
import pytorch_fft.fft as fft


In [132]:
import numpy as np
import torch
#import pytorch_fft.fft.autograd as fft
import pytorch_fft.fft as fft

def fftshift(x):
    """
    One dimension vector fftshift
    """
    return fft.roll_n(x,0,int(x.shape[0]/2))

#One dimension vector ifftshift
ifftshift = fftshift

def complex_multiply(x,y):
    """
    Element wise multiplication for complex numbers represented as pairs
    """
    x_re, x_im = x
    y_re, y_im = y
    z_re = (x_re * y_re) - (x_im * y_im)
    z_im = (x_re * y_im) + (x_im * y_re)
    return (z_re,z_im)
    
def complex_divide(x,y):
    """
    Element wise division for complex numbers represented as pairs
    """
    x_re, x_im = x
    y_re, y_im = y #denominator
    num_re, num_im = complex_multiply(x,(y_re, -1*y_im)) #by complex conjugate
    den = (y_re * y_re) - (y_im * (-1*y_im)) # is + because of the conjugate operation
    res = (num_re / den ), (num_im / den)
    return res
    
def convolve(x,y):
    """
    One dimensional vector convolution
    """
    x_re, x_im = x
    y_re, y_im = y
    xtmp = fft.fft(x_re,x_im)
    x_fft = fftshift(xtmp[0]), fftshift(xtmp[1])
    ytmp = fft.fft(y_re, y_im)
    y_fft = fftshift(ytmp[0]), fftshift(ytmp[1])
    tmp_re1, tmp_im1 = complex_multiply(x_fft, y_fft)
    tmp_re2, tmp_im2 = ifftshift(tmp_re1), ifftshift(tmp_im1)
    tmp_re3, tmp_im3 = fft.ifft(tmp_re2, tmp_im2)
    return fftshift(tmp_re3), fftshift(tmp_im3)

def deconvolve(z, y):
    """
    One dimensional vector deconvolution
    """
    z_re, z_im = z
    y_re, y_im = y
    ztmp = fft.fft(z_re, z_im)
    z_fft = fftshift(ztmp[0]), fftshift(ztmp[1])
    ytmp = fft.fft(y_re, y_im)
    y_fft = fftshift(ytmp[0]), fftshift(ytmp[1])
    tmp_re1, tmp_im1 = complex_divide(z_fft, y_fft)
    tmp_re2, tmp_im2 = ifftshift(tmp_re1), ifftshift(tmp_im1)
    tmp_re3, tmp_im3 = fft.ifft(tmp_re2, tmp_im2)
    return fftshift(tmp_re3), fftshift(tmp_im3)

In [114]:
from scipy import fftpack


## IT WORKS UP TO HERE!!!

Ok, now let's get on going with the following experiment ... which is ... 

Try to play with convoluting and deconvoluting word2vec embeddings !

Seems that there is no winner word2vec for PyTorch yet (although there is one implementation)

I was thinking about modeling the language with convolutions, something like:
(I'm just writing ideas out of the blue here to try to say what is messed up in my mind)

1. we assign a random sparse vector (Sparse Distributed Representation SDR) to the words in an embedding. This should give a first step
2. We create a context handler, the context will try to guess the current context
The context can be quite complex, spanning multiple ideas (dimensions), for example a context could be: we are talking about computer science, while we stand in a party at a friends house, it is night, monday, and we are tired....  something like this. This context is used to capture and be able to guess/ expand/ compress not only the context but the meaning of the current conversation.
3. We keep a LSTM memory of current input, this will "decide" what to do with the current input
4. We keep a buffer of the current input, a forgetting buffer
5. the current input is then convoluted in several ways with the buffer, these options are  then used to be added to the current context(s)
6. current context will "reference" some ideas in the current memory state
7. referenced memories will be extracted and correlation will be computed with the current convoluted inputs
8. from there can we guess something ????

______

Or another thing that might be interesting to try first is to do the following:

1. We keep a **LSTM** (or DNC or NTM) that will decide a multiplication factor||vector **MF**
2. We keep a current amount of *"forming embeddings"* (**FE**)  
3. We keep a matrix of all the embeddings we'll be doing (like a DB with a Key Value which is content addressable)
4. Entering the Embedding **E**
 * **E** is looked after in the DB, retrieved if found, created if not, this is the latest **FE**  
 * **E** and **FE** pairs are passed to the **LSTM**, this returns a **MF** for each pair **E, FE**
 * **E** * **MF** and then convoluted with current **FE** for each pair **E, FE**
 * these will replace the current forming embeddings
 * when an embedding does not get updated any more (MF is below a threshold), it passes to the passive side and gets out of the **FE** unless a new element comes its way
 
 
 The latter one might have something interesting in the way ...
 
 
I still don't know how this will be this taken into account, maybe both ideas have something interesting, I like the idea of having several context levels, maybe the embeddings will be not one but a distributted representation among several contexts, so the actual entry for a *symbol S* might be multiple values, each referred to a different context/sum of contexts *Ctx*. Then to recover the actual value would depend on the *context  Ctx* _AND_ the *symbol S* this would give an enormous expression vector space that could be used.

------

As the values are saved in a matrix (DB actually I'm thinking something like RocksDB but somehow better for retrieving like a content addressable memory *CAM*), so these don't take *"neuron power or space"*
  
 

------

Contexts are retroffitted (given as feedback) to the input LSTM, this can handle the issue of changing the multiplication factor for an incoming vector depending on the current context.

The Feedback is given and the system is allowed to do iterations until it converges, this means, at a certain moment there might be more than one possible context (we might be allowing 2,3,4, or N ...)
 New contexts can be created, (how?? maybe 

------

The Embedding Space could be **Quantized** this way, there are only a set of allowed values, this allows for: 
  - memory cleanup (we round the dimensions in the vectors to allowed Quants)
  - Reduced Memory Representation (example, quantizing each dimention to only 256 or 1024 values)

------
 
The issue with convolution is that is NOT conmutative, so for these kind of elements, what we might need is to actually do a binary operation like *OR* or we could simply do an element wise *weighted sum +*
 

------
 
I would go further to ask the question about the space vector of the contexts and the space vector of the embeddings:
 - could be separated? 
 - could be in the same?
 

------

Another idea is that language is NOT a fixed set of rules, languages are ALIVE, they grow, they change, so ... **how to measure that we are doing OK?** I would argue that the ony measure would be if the system converges or diverges, having the meaning of this as **convergence** if the amount of change in each embedding is smaller during time and tends to zero. Of course when a new context is added, then the embeddings will get furious and start changing fast again, and again should go to convergence.

------

What about using Bloom Filters to know if an embedding is or is not in a certain group?

Or just using them as the grouping of the embeddings?
 
 

More questions:

 - How is that we distinguish between right/wrong truth/lie True/False ??
 - How is that the dynamic concept of those elements (previous question) is generated?
 - How is that context change? 
   * example of polyglots, sometimes happens that you chagne language and you don't realize, sometimes you hear something and you don't understand as is not the language you are contextualizing currently, and then a context change happens and then you get it)
   * Example of looking for something (physic or a concept/word), you look for something and you CAN'T find it. Then you forget about it and when you are thinking about other thing you instantly find it/remember it. This might be that while you look you are in a sub-specialized context, but changing to another thought makes you chage contexts and then you can do again the context hierarchy(tree) lookup meaning that this time, you get the context correctly (assuming these "contexts" are like a tree)

