## Compressing Word Embeddings

Downloadable version of GloVe embedding (with fallback source).

Probably best to include instructions for Levy test-suite installation, so that any given embedding can be tested.

Then require two main sections : 
 
*  Lloyd embedding generation

*  Sparsified embedding generation

Include downloadable version of sparsified GloVe embedding from own hosting.

And functions/tools to play with the loaded embedding (of whatever type).

### Download Embeddings

The following needs to be Pythonized :


In [None]:
"""
# http://redcatlabs.com/downloads/deep-learning-workshop/LICENSE
RCL_BASE=http://redcatlabs.com/downloads/deep-learning-workshop/notebooks/data

# Fall-back locations for the gloVe embedding
if [ '' ] && [ ! -e "glove.6B.300d.hkl" ]; then
  # Files in : ${RCL_BASE}/research/ICONIP-2016/
  #   507.206.240 Oct 25  2015 glove.6B.300d.hkl
  
  # Files in : ${RCL_BASE}/research/ICONIP-2016/  :: These are originals - citation desired...
  #    53.984.642 May 15 14:13 sparse.6B.300d_S-21_2n-shuf-noise-after-norm_.2.01_6-75_4000_GPU-sparse_matrix.hkl
  #   122.248.980 May  2 13:09 sparse.6B.300d_T-21_3500.1024@0.05-GPU-sparse_matrix.hkl
  #   447.610.336 May  2 13:04 sparse.6B.300d_T-21_3500.1024@0.05-GPU-sparsity_recreate.hkl
  #   160.569.440 May 14 14:57 vectors.2-17.hkl
fi
"""

import os, requests

def get_embedding_file( hkl ):  
    if not os.path.isfile(os.path.join('embeddings', hkl)):
        pass
    


### Download the Omer-Levy Test Regime

https://levyomer.files.wordpress.com/2015/03/improving-distributional-similarity-tacl-2015.pdf

```
wget https://bitbucket.org/omerlevy/hyperwords/get/688addd64ca2.zip
unzip 688addd64ca2.zip
rm 688addd64ca2.zip

mv omerlevy-hyperwords-688addd64ca2 omerlevy

chmod 755 omerlevy/*.sh omerlevy/scripts/*.sh
```

### Function to test a (text) Embedding

Based on this script : 
```
more omerlevy/test-vectors.sh 
#!/bin/sh

# ./test-vectors.sh /home/andrewsm/sketchpad/redcatlabs/embeddings/data/1-glove-1-billion-and-wiki/window11-lc-36/vectors.txt 

# arg1 == filepath of word-vectors file
VECTORS_FILE=$1
  
# Fix up the 'file header' of a 'glove' vectors file into the one expected here
VECTORS_WORDS=${VECTORS_FILE}.words

if [ ! -f ${VECTORS_WORDS} ]; then 
  echo "Creating ${VECTORS_WORDS}"
  #echo "262144 300" > ${VECTORS_WORDS}
  #head -262144 ${VECTORS_FILE} >> ${VECTORS_WORDS}

  ## Glove min-freq : 36 -> 263633 words (just above 12^18=262144 words)
  echo "131072 300" > ${VECTORS_WORDS}
  head -131072 ${VECTORS_FILE} >> ${VECTORS_WORDS}
fi

VECTORS_NPY=${VECTORS_WORDS}.npy


#word2vecf/word2vecf -train w2.sub/pairs -pow 0.75 -cvocab w2.sub/counts.contexts.vocab -wvocab w2.sub/counts.words.vocab -dumpcv w2.sub/sgns.contexts -output w2.sub/sgns.words -threads 10 -
negative 15 -size 500;

python hyperwords/text2numpy.py ${VECTORS_WORDS}

# No need for this temporary file now
##rm ${VECTORS_WORDS}


#python hyperwords/text2numpy.py w2.sub/sgns.contexts
#rm w2.sub/sgns.contexts


echo
echo "Similarity"
echo "----------"
# Evaluate on Word Similarity
#python hyperwords/ws_eval.py --neg 5 PPMI  w2.sub/pmi testsets/ws/ws353.txt
#python hyperwords/ws_eval.py --eig 0.5 SVD w2.sub/svd testsets/ws/ws353.txt
#python hyperwords/ws_eval.py --w+c SGNS    w2.sub/sgns testsets/ws/ws353.txt

#echo -n "WS353 Results     "
#python hyperwords/ws_eval.py VECTORS ${VECTORS_FILE} testsets/ws/ws353.txt

echo -n "WS353 Similarity  "
python hyperwords/ws_eval.py VECTORS ${VECTORS_FILE} testsets/ws/ws353_similarity.txt

echo -n "WS353 Relatedness "
python hyperwords/ws_eval.py VECTORS ${VECTORS_FILE} testsets/ws/ws353_relatedness.txt

echo -n "Bruni MEN         "
python hyperwords/ws_eval.py VECTORS ${VECTORS_FILE} testsets/ws/bruni_men.txt

echo -n "Radinsky M.Turk   "
python hyperwords/ws_eval.py VECTORS ${VECTORS_FILE} testsets/ws/radinsky_mturk.txt

echo -n "Luoung Rare Words "
python hyperwords/ws_eval.py VECTORS ${VECTORS_FILE} testsets/ws/luong_rare.txt

echo
echo "Geometry"
echo "--------"
# Evaluate on Analogies
#python hyperwords/analogy_eval.py PPMI        w2.sub/pmi testsets/analogy/google.txt
#python hyperwords/analogy_eval.py --eig 0 SVD w2.sub/svd testsets/analogy/google.txt
#python hyperwords/analogy_eval.py SGNS        w2.sub/sgns testsets/analogy/google.txt

echo -n "Google Analogy Results  "
python hyperwords/analogy_eval.py VECTORS ${VECTORS_FILE} testsets/analogy/google.txt

echo -n "MSR Analogy Results     "
python hyperwords/analogy_eval.py VECTORS ${VECTORS_FILE} testsets/analogy/msr.txt

echo
```

In [2]:
import os, subprocess

def test_embedding_file(vectors_txt, vocab_max=131072 ):
    # Do we need to process VECTORS_FILE->{ VECTORS_WORDS, VECTORS_NPY }?
    # Answer = YES : the .words is required, and is used to create .npy and .vocab
    
    vectors_txt_words = '%s.words' % (vectors_txt,)
    if not os.path.isfile(vectors_txt_words):
        # This is just a copy of 'text file' with the vocab_size and embedding_size pre-pended
        #echo "131072 300" > ${VECTORS_WORDS}
        #head -131072 ${VECTORS_FILE} >> ${VECTORS_WORDS}
        with open(vectors_txt) as fin:
            first_line = fim.readline()
            embedding_dim = len(first_line.strip().split()) -1 
            vocab_size = len(fin.readlines()) +1  # Ouch! - read in whole file to find length

        if vocab_size>vocab_max:
            vocab_size=vocab_max
            
        with open(vectors_txt) as fin:
            with open(vectors_txt_words, 'wt') as fout:
                # Write the first line, which, ironically, will be discarded by the omerlevy code
                fout.write("%d %d\n" % (vocab_size, embedding_dim))
                
                # And copy over at most vocab_max lines of the original file 
                for i, line in enumerate(fin.readlines()):
                    if i>vocab_size:
                        break
                    fout.write(line)
    
    return # Early return during download testing

    vectors_txt_npy   = '%s.npy' % (vectors_txt_words,)
    if not os.path.isfile(vectors_txt_words):
        # Sadly, we can't just invoke this as a python function - need to go via shell...
        subprocess.call([ "python", "hyperwords/text2numpy.py", vectors_txt_words ])
    pass
    
    def run_word_similarity(test_str, test_set):
        print(test_str, end=None)
        subprocess.call([ "python", "hyperwords/ws_eval.py", "VECTORS", vectors_txt_npy, "testsets/ws/%s" % (test_set,) ])

    run_word_similarity("WS353 Similarity  ", "ws353_similarity.txt")
    run_word_similarity("WS353 Relatedness ", "ws353_relatedness.txt")
    run_word_similarity("Bruni MEN         ", "bruni_men.txt")
    run_word_similarity("Radinsky M.Turk   ", "radinsky_mturk.txt")
    run_word_similarity("Luoung Rare Words ", "luong_rare.txt")

    #echo -n "Google Analogy Results  "
    #python hyperwords/analogy_eval.py VECTORS ${VECTORS_FILE} testsets/analogy/google.txt

    # Same for word_analogy() once the word_similarity is proven
    
    #echo -n "MSR Analogy Results     "
    #python hyperwords/analogy_eval.py VECTORS ${VECTORS_FILE} testsets/analogy/msr.txt

# Lloyd's Method : 32->3 bits

In [None]:

import time

import argparse
import progressbar

import numpy as np
import theano

# http://blog.mdda.net/oss/2016/04/07/nvidia-on-fedora-23
#theano.config.nvcc.flags = '-D_GLIBCXX_USE_CXX11_ABI=0'

import scipy.spatial.distance
import sklearn.preprocessing

import hickle

default_embedding_file = '../../data/2-pretrained-vectors/glove.6B.300d.hkl'

#... args ...

## http://www-nlp.stanford.edu/projects/glove/?place=topic%2Fglobalvectors%2FBiiXED8vVQg%2Fdiscussion
#While it's of course possible to convert the text files to a binary format, 
#the result would not be equivalent to the binary output from GloVe. 
#The reason is that the pre-trained vectors only contain W + \tilde{W}, 
#i.e. the word vectors plus the context word vectors, and omit the bias terms. 
#As such you don't want to use the evaluation script without modification. 

d = hickle.load(args.embedding)
vocab, embedding = d['vocab'], d['embedding']

#dictionary = dict( (word.lower(), i) for i,word in enumerate(vocab) )
#dictionary = dict( (word, i) for i,word in enumerate(vocab) )
dictionary = dict( (word, i) for i,word in enumerate(vocab) if i<len(embedding) )

print("Embedding loaded :", embedding.shape)   # (vocab_size, embedding_dimension)=(rows, columns)

def np_int_list(n, mult=100., size=3):  # size includes the +/-
  #print( (n * mult).astype(int).tolist() )
  return "[ " + (', '.join([ ('% +*d') % (size,x,) for x in (n * mult).astype(int).tolist()])) + " ]"

if args.mangle is not None:
  if args.mangle=='lloyd':    # Quantise each entry into 'pct' (as an integer) level (optimised per vector location)
    # Suppose that v is a vector of levels
    #          and c is a list of numbers that needs to be quantised, 
    #          each c becomes c' where c' is the closest value in v
    #          :: update v so that (c - c')^2 is as low as possible

    levels_base = int(args.pct)

    c_length = embedding.shape[0]
    
    t0 = time.time()
    for d in range(embedding.shape[1]):   # One-dimensional version
      levels = levels_base  

      """
  9 - took 223
 78 - took 222
121 - took 214
135 - took 203
150 - took 237
186 - took 206
187 - took 205
226 - took 283
236 - took 219
242 - took 221
278 - took 212
279 - took 239
290 - took 216
      """

      #if d in (9, 78, 121, 135, 150, 186, 187, 226, 236, 242, 278, 279, 290):
      #  levels = 16

      i_step = int(c_length/levels)
      i_start = int(i_step/2)
    
      v_indices = np.arange(start=i_start, stop=c_length, step=i_step, dtype='int')

      #if d != 9: continue  # Wierd distribution
      #if d != 1: continue  # Very standard example
      

      # Initialise v by sorting c, and placing them evenly through the list
      
      e_column = embedding[:,d].astype('float32')
      
      c_sorted = np.sort( e_column )
      v_init = c_sorted[ v_indices ]

      #zeros = np.zeros_like(c_sorted)

      # the v_init are the initial centers 
      v=v_init
      
      t1 = time.time()
      epochs=0
      for epoch in range(0, 1000):
        #err, = iterate_v(lr)
        
        #print(" Dimension:%3d, Epoch:%3d, %s" % (d, epoch, np_int_list(v),))
        
        #   works out the values in their middles
        mids_np = (v[:-1] + v[1:])/2.
        
        mids = mids_np.tolist()
        mids.insert( 0, c_sorted[0] )
        mids.append( c_sorted[-1] +1 )
        
        centroids=[]
        for i in range( 0, len(mids)-1 ):
          pattern = np.where( (mids[i] <= c_sorted) & (c_sorted < mids[i+1]) )
          centroids.append( c_sorted[ pattern ].mean() )
        
        centroids_np = np.array(centroids)

        if np.allclose(v, centroids_np):
          if epochs>200:
            print("%3d - took %d" % (d, epochs,))
          break
        
        v = centroids_np
        
        epochs += 1
        
        
      #print("Time per calc %6.2fms" % ((time.time() - t1)/epochs*1000.,))
      
      #print("Check col updated: before ", np_int_list(embedding[0:20,d]))

      # Ok, so now we have the centers in v, and the mids in 'mids'
      for i in range( 0, len(mids)-1 ):
        pattern = np.where( (mids[i] <= e_column) & (e_column < mids[i+1]) )
        embedding[pattern, d] = v[i]

      #print("Check col updated: after  ", np_int_list(embedding[0:20,d]))
      
    offset=101010  # Check rare-ish words
    for d in range(5, embedding.shape[1], 25):
      print("Col %3d updated: " % (d,), np_int_list(embedding[(offset+0):(offset+20),d]))

#sklearn.preprocessing.normalize(embedding, norm='l2', axis=1, copy=False)   # This is in-place
embedding_normed = sklearn.preprocessing.normalize(embedding, norm='l2', axis=1, copy=True) 



if args.save:
  # Save the embedding_normed as a text file
  with open(args.save, 'wb') as f:
    embedding_save = embedding_normed
    
    for l in range(0, embedding_save.shape[0]):
      f.write("%s %s\n" % (vocab[l], ' '.join([ ('0' if x==0. else ("%.6f" % (x,))) for x in embedding_save[l, :].tolist() ]), ))
  print("Saved to %s" % (args.save, ))


# Non-Negative Sparse Embeddings

In [2]:
a="""

python sparsify_lasagne.py --mode=train   --version=21 --save='./sparse.6B.300d_S-21_2n-shuf-noise-after-norm_.2.01_6-75_%04d.hkl' --sparsity=0.0675 --random=1 --iters=4000 | tee sparse.6B.300d_S-21_2n-shuf-noise-after-norm_.2.01_6-75.log
  #sparse_dim = 1024, pre-num_units=embedding_dim*8,   
# -> 4.0 l2 in 4.0k epochs (sigma=39)  # sparsity_std_:,   0.4742,
python sparsify_lasagne.py --mode=predict --version=21 --load='./sparse.6B.300d_S-21_2n-shuf-noise-after-norm_.2.01_6-75_4000.hkl' --sparsity=0.0675 --random=1 \
      --output=sparse.6B.300d_S-21_2n-shuf-noise-after-norm_.2.01_6-75_4000_GPU-sparsity_recreate.hkl \
      --direct=sparse.6B.300d_S-21_2n-shuf-noise-after-norm_.2.01_6-75_4000_GPU-sparse_matrix.hkl 
  # epoch:4009, b:      0, l2:      3.8344, sparsity:6.7476 - hard01
  # epoch:4009, b:  16384, l2:      3.8593, sparsity:6.7503 - hard01
  # epoch:4009, b:  32768, l2:      3.8925, sparsity:6.7489 - hard01
  # epoch:4009, b:  49152, l2:      3.7866, sparsity:6.7482 - hard01
  # epoch:4009, b:  65536, l2:      3.8729, sparsity:6.7476 - hard01
  # epoch:4009, b:  81920, l2:      3.8502, sparsity:6.7489 - hard01
  # epoch:4009, b:  98304, l2:      3.8340, sparsity:6.7480 - hard01
  # epoch:4009, b: 114688, l2:      3.8588, sparsity:6.7476 - hard01
"""

In [None]:


import time

import argparse
import progressbar

import numpy as np
import theano
import lasagne

import hickle

#default_embedding_file = '../../data/2-pretrained-vectors/glove.6B.300d.hkl'
default_embedding_file = '../../data/1-glove-1-billion-and-wiki/window11-lc-36/vectors.2-17.hkl'
default_version=21
default_save_file_fmt  = './sparse.6B.300d_%d_%%04d.hkl' % (default_version, )

#theano.config.nvcc.flags='-D_GLIBCXX_USE_CXX11_ABI=0' # Now in .theanorc

parser = argparse.ArgumentParser(description='')
parser.add_argument('-m','--mode', help='(train|predict)', type=str, default=None)
parser.add_argument('-v','--version', help='Model version to run', type=int, default=default_version)

parser.add_argument('-i','--iters', help='Number of iterations', type=int, default=10000)
parser.add_argument('-e','--embedding', help='Filepath of hickle file containing embedding for testing', type=str, default=default_embedding_file)

parser.add_argument('-s','--save', help='Format of save filenames (use %d for epoch)', type=str, default=default_save_file_fmt)
parser.add_argument('-l','--load', help='Load filename', type=str, default=None)

parser.add_argument('-o','--output', help='Filepath of hickle file to *create* embedding for testing', type=str, default=None)
parser.add_argument('-d','--direct', help='Filepath of hickle file to *create* *binary* embedding for testing', type=str, default=None)

parser.add_argument('-p','--param', help='Set param value initially', type=float, default=None)
parser.add_argument('-k','--sparsity',  help='Sparsity value goal', type=float, default=0.05)

#parser.add_argument('-b','--batchsize', help='batchsize (GTX760 requires <20000)', type=int, default=10000)
parser.add_argument('-b','--batchsize', help='batchsize (GTX760 requires <20000)', type=int, default=16384*1)

#parser.add_argument('-t','--test', help='Which Test to execute', type=str, default=default_test)
#parser.add_argument('-p','--pct', help='filter parameter ~ percentage (0,100)', type=float, default=None)
parser.add_argument('-r','--random', help='Randomly shuffle vocab', type=bool, default=False)
parser.add_argument('-n','--normalize', help='normalize embedding before learning', type=bool, default=False)

args = parser.parse_args()

print("Mode : %s" % (args.mode,)) 

print("Loading embedding : %s" % (args.embedding,)) 

d = hickle.load(args.embedding)
vocab, embedding = d['vocab'], d['embedding']
vocab_np = np.array(vocab, dtype=str)
vocab_orig=vocab_np.copy()

if args.random:
   np.random.seed(1) # No need to get fancy - just want to mix up the word frequencies into different batches
   perm = np.random.permutation(len(embedding))
   embedding = embedding[perm]
   vocab = vocab_np[perm].tolist()
  
dictionary = dict( (word, i) for i,word in enumerate(vocab) )

print("Embedding loaded :", embedding.shape)   # (vocab_size, embedding_dimension)=(rows, columns)

print("Device=%s, OpenMP=%s" % (theano.config.device, ("True" if theano.config.openmp else "False"), ))

def np_int_list(n, mult=100., size=3):  # size includes the +/-
  return "[ " + (', '.join([ ('% +*d') % (size,x,) for x in (n * mult).astype(int).tolist()])) + " ]"

#embedding = np.copy(embedding[ 0:50000, : ])
#embedding = embedding[ 0:50000, : ]
#embedding = embedding[ 0:10000, : ]   # REVERT

embedding_dim = embedding.shape[1]
sparse_dim = 1024
#sparse_dim = 1024/2
#sparse_dim = 1024*4

#batchsize = 10000   # 0.48ms GPU
#batchsize = 20000   # 0.48ms GPU
batchsize = args.batchsize

version=args.version

from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams



In [None]:
class SparseWinnerTakeAllLayer(lasagne.layers.Layer):
    def __init__(self, incoming, sparsity=0.05, **kwargs):
        super(SparseWinnerTakeAllLayer, self).__init__(incoming, **kwargs)
        self.sparsity = sparsity

    def get_output_for(self, input, deterministic=False, **kwargs):
        """
        Parameters
        ----------
        input : tensor
            output from the previous layer
        deterministic : bool
            If true, just use the raw values (don't sparsify)
        """
        if False and deterministic:
            return input
            #return theano.tensor.switch( theano.tensor.gt(input, 0.), 1.0, 0.0)
            
        else:
            # use nonsymbolic shape for this if possible
            
            #input_shape = self.input_shape
            #if any(s is None for s in input_shape):
            #    input_shape = input.shape            

            # Sort within batch
            # input_shape is [ #in_batch, #vector_entries ] ~ [ 20k, 1024 ]

            # theano.tensor.sort(self, axis, kind, order)
            sort_input = input.sort( axis=0, kind='quicksort' )
            
            # Find kth value
            
            hurdles_raw = sort_input[ int( batchsize * (1.0 - self.sparsity) ), : ]
            
            hurdles = theano.tensor.maximum(hurdles_raw, 0.0)  # rectification...
            
            # switch based on >kth value (or create mask)
            # all other entries are zero
            
            #mask = theano.tensor.switch( theano.tensor.gt(input, hurdles), 1.0, 0.0)
            
            # pass those entries along verbatim
            #return mask * input
            
            masked = theano.tensor.switch( theano.tensor.ge(input, hurdles), input, 0.0)
            return masked

        
        
class SparseWinnerTakeAllLayerApprox(lasagne.layers.Layer):
    def __init__(self, incoming, approx_sparsity=0.12, **kwargs):  
        super(SparseWinnerTakeAllLayerApprox, self).__init__(incoming, **kwargs)
        self.sparsity = approx_sparsity

    def get_output_for(self, input, deterministic=False, **kwargs):
        """
        Parameters
        ----------
        input : tensor
            output from the previous layer
        deterministic : bool
            If true, just use the raw values (don't sparsify)
        """
        # input_shape is [ #in_batch, #vector_entries ] ~ [ 20k, 1024 ]
    
        current_sparsity = self.sparsity
        #print(current_sparsity)  # A theano variable
        
        if False:
          # Find the max value in each column - this is the k=1 (top-most) entry
          hurdles_max  = input.max( axis=0 )
          
          input = lasagne.layers.get_output(embedding_batch_middle)
          
          # Find the max value in each column - this is the k=1 (top-most) entry
          hurdles_max  = input.max( axis=0 )
          
          # Find the min value in each column - this is the k=all (bottom-most) entry
          #hurdles_min  = input.min( axis=0 )

          # Let's guess (poorly) that the sparsity hurdle is (0... sparsity ...100%) within these bounds
          #hurdles_guess = hurdles_max * (1.0 - current_sparsity) + hurdles_min * current_sparsity
          
          #hurdles_guess = (hurdles_min + hurdles_max)/2.0
          
          # New approach : We know that the mean() is zero and the std() is 1
          #   simulations suggest that the more stable indicators are at fractions of the max()
          
          hurdles_hi = hurdles_max * 0.5
          hurdles_lo = hurdles_max * 0.3
          
          # Now, let's find the actual sparsity that this creates
          sparsity_flag_hi = theano.tensor.switch( theano.tensor.ge(input, hurdles_hi), 1.0, 0.0)
          sparsity_real_hi = sparsity_flag_hi.mean(axis=0)    # Should be ~ sparsity (likely to be lower, though)

          sparsity_flag_lo = theano.tensor.switch( theano.tensor.ge(input, hurdles_lo), 1.0, 0.0)
          sparsity_real_lo = sparsity_flag_lo.mean(axis=0)    # Should be ~ sparsity (likely to be higher, though)
          
          # But this is wrong!  Let's do another estimate (will be much closer, hopefully) using this knowledge
          #   For each column, the new hurdle guess
          
          #hurdles_better = hurdles_max - ( current_sparsity / (sparsity_guess_real + 0.00001) ) * (hurdles_max - hurdles_guess)
          

          if False: # This assumes that the distribution tails are linear (which is not true)
            hurdles_interp = hurdles_hi + (hurdles_lo-hurdles_hi) * (current_sparsity - sparsity_real_hi) / ((sparsity_real_lo - sparsity_real_hi)+0.00001)
            
          else:  # Assume that the areas under the tails are ~ exp(-x*x)  
            # See (2) in : https://math.uc.edu/~brycw/preprint/z-tail/z-tail.pdf
            # *** See (Remark 15) in : http://m-hikari.com/ams/ams-2014/ams-85-88-2014/epureAMS85-88-2014.pdf
            
            def tail_transform(z):
              return theano.tensor.sqrt( -theano.tensor.log( z ) )
            
            tail_target = tail_transform(current_sparsity)
            tail_hi = tail_transform(sparsity_real_hi)
            tail_lo = tail_transform(sparsity_real_lo)

            hurdles_interp = hurdles_hi + (hurdles_lo-hurdles_hi) * (tail_target - tail_hi) / ((tail_lo - tail_hi)+0.00001)

          
          #hurdles = theano.tensor.maximum(hurdles_better, 0.0)  # rectification... at mininim... (also solves everything-blowing-up problem)
          hurdles = hurdles_interp.clip(hurdles_max*0.2, hurdles_max*0.9)


        if True:
          hurdles_hi, hurdles_lo = [], []
          
          hurdles_guess = []
          sparsity_flag = []
          sparsity_real = []
          
          sparsity_hi, sparsity_lo = [], []


          # Find the max value in each column - this is the k=1 (top-most) entry
          hurdles_max  = input.max( axis=0 )
          
          hurdles_hi.append(hurdles_max)
          sparsity_hi.append(hurdles_max * (1./batchsize) ) 
          

          hurdles_lo_temp = input.mean( axis=0 )  # Different estimate idea...

          hurdles_lo.append(hurdles_lo_temp)
          sparsity_lo_temp = theano.tensor.switch( theano.tensor.ge(input, hurdles_lo_temp), 1.0, 0.0)
          sparsity_lo.append( sparsity_lo_temp.mean(axis=0) )
          
          for i in range(10):  
            if True:   # WINS THE DAY!
              hurdles_guess.append(
                (
                  (hurdles_lo[-1] + hurdles_hi[-1]) * 0.5
                )
              )

            if False:
              hurdles_guess.append(
                (
                  hurdles_hi[-1] + (hurdles_lo[-1] - hurdles_hi[-1]) * 
                    (current_sparsity - sparsity_hi[-1]) / ((sparsity_lo[-1] - sparsity_hi[-1])+0.000001)
                ).clip(hurdles_lo[-1], hurdles_hi[-1])
              )

            if False:
              # switch on closeness to getting it correct
              hurdles_guess.append(
                theano.tensor.switch( theano.tensor.lt( sparsity_lo[-1], current_sparsity * 2.0 ),
                  (
                    hurdles_hi[-1] + (hurdles_lo[-1] - hurdles_hi[-1]) * 
                      (current_sparsity - sparsity_hi[-1]) / ((sparsity_lo[-1] - sparsity_hi[-1])+0.000001)
                  ).clip(hurdles_lo[-1], hurdles_hi[-1]),
                  (
                    (hurdles_lo[-1] + hurdles_hi[-1]) * 0.5
                  )
                )
                
              )
              
            
            sparsity_flag.append( theano.tensor.switch( theano.tensor.ge(input, hurdles_guess[-1] ), 1.0, 0.0) )
            sparsity_real.append( sparsity_flag[-1].mean(axis=0) )
            
            # So, based on whether the real sparsity is greater or less than the real value, change the hi or lo values

            hurdles_lo.append( 
              theano.tensor.switch( theano.tensor.gt(current_sparsity, sparsity_real[-1]), hurdles_lo[-1], hurdles_guess[-1]) 
            )
            hurdles_hi.append( 
              theano.tensor.switch( theano.tensor.le(current_sparsity, sparsity_real[-1]), hurdles_hi[-1], hurdles_guess[-1]) 
            )

          hurdles = hurdles_guess[-1]
          #hurdles = hurdles_lo[-1]  # Better to bound this at the highest relevant sparsity...
          
        masked = theano.tensor.switch( theano.tensor.ge(input, hurdles), input, 0.0)
        return masked




In [None]:



embedding_N = (embedding)  # No Normalization by default

if args.normalize:
  #>>> a=np.array( [ [1,-1,1,-1], [-5,5,5,-5] ])
  #>>> b=np.std(a, axis=1)
  #>>> a / b[:, np.newaxis]
  #array([[ 1., -1.,  1., -1.],
  #       [-1.,  1.,  1., -1.]])
  
  embedding_std  = np.std(embedding, axis=1)
  embedding_N = embedding / embedding_std[:, np.newaxis]    # Try Normalizing  std(row) == 1, making sure shapes are right


embedding_shared = theano.shared(embedding_N.astype('float32'))       # 400000, 300
embedding_shared.name = "embedding_shared"

batch_start_index = theano.tensor.scalar('batch_start_index', dtype='int32')

embedding_batch = embedding_shared[ batch_start_index:(batch_start_index+batchsize) ]

network = lasagne.layers.InputLayer( 
    ( batchsize, embedding_dim ), 
    input_var=embedding_batch,
  )

pre_hidden_dim=embedding_dim*8  ## For sparse_dim=1024 and below
if sparse_dim>1024*1.5:
  pre_hidden_dim=sparse_dim*2   ## Larger sparse_dim

network = lasagne.layers.DenseLayer(
    network,
    num_units=pre_hidden_dim,     
    nonlinearity=lasagne.nonlinearities.rectify,
    W=lasagne.init.GlorotUniform(),
    b=lasagne.init.Constant(0.)
  )

if version==22:
  network = lasagne.layers.DenseLayer(
      network,
      num_units=sparse_dim*2,
      nonlinearity=lasagne.nonlinearities.rectify,
      W=lasagne.init.GlorotUniform(),
      b=lasagne.init.Constant(0.)
    )

network = lasagne.layers.DenseLayer(
    network,
    num_units=sparse_dim,
    nonlinearity=lasagne.nonlinearities.identity,
    W=lasagne.init.GlorotUniform(),
    b=lasagne.init.Constant(0.)
  )

sparse_embedding_batch_linear=network

def hard01(x):
  # http://deeplearning.net/software/theano/library/tensor/basic.html#theano.tensor.switch
  #return theano.tensor.switch( theano.tensor.gt(x, 0.), 0.95, 0.05)
  return theano.tensor.switch( theano.tensor.gt(x, 0.), 1.0, 0.0)
  
sparse_embedding_batch_probs=None
if args.mode == 'train':
  sigma = theano.tensor.scalar(name='sigma', dtype='float32')

  if version==21 or version==22: # winner-take-all idea (GPU-able?)

    if True:
      embedding_batch_middle = lasagne.layers.batch_norm(
          lasagne.layers.NonlinearityLayer( network,  nonlinearity=lasagne.nonlinearities.rectify )
          #lasagne.layers.NonlinearityLayer( network,  nonlinearity=lasagne.nonlinearities.identity ) 
        )

    if True:
      #was in for '1' and '0.1' versions
      embedding_batch_middle = lasagne.layers.GaussianNoiseLayer(
                embedding_batch_middle, 
                #sigma=0.1)
                #sigma=0.1 * theano.tensor.exp((-0.1) * sigma ))  # Noise should die down over time...  (idea, slowish)  BASE
                sigma=0.2 * theano.tensor.exp((-0.01) * sigma ))  # Noise should die down over time...  (idea, slowish)  _.2.01_


    sparsity_blend = theano.tensor.exp((-10.) * sigma )  # Goes from 1 to epsilon
    current_sparsity = 0.50*(sparsity_blend) + args.sparsity*(1. - sparsity_blend)
    
    #if True:
    #  sparse_embedding_batch_squashed = SparseWinnerTakeAllLayer(
    #                                      embedding_batch_middle, 
    #                                      sparsity=current_sparsity  ## Can't do variable indexing thing...
    #                                    )
    
    sparse_embedding_batch_squashed = SparseWinnerTakeAllLayerApprox(
                                        embedding_batch_middle, 
                                        approx_sparsity=current_sparsity
                                      )
    #sparsity_probe      
    
    
elif args.mode == 'predict':
  if version==20 or version==21 or version==22: # winner-take-all idea
    if True:
      embedding_batch_middle = lasagne.layers.batch_norm(
          lasagne.layers.NonlinearityLayer( network,  nonlinearity=lasagne.nonlinearities.rectify )
        )
        
    if version==20:
      sparse_embedding_batch_squashed = SparseWinnerTakeAllLayer(
                                          embedding_batch_middle, 
                                          sparsity=args.sparsity,
                                          #deterministic=True
                                        )
    else:
      sparse_embedding_batch_squashed = SparseWinnerTakeAllLayerApprox(
                                          embedding_batch_middle, 
                                          approx_sparsity=args.sparsity,   # Jam the actual (final) value in...
                                          #deterministic=True
                                        )
    

else:
  print("Need to know mode to do correct non-linearity")
  exit(0)

if sparse_embedding_batch_probs is None:
  sparse_embedding_batch_probs = sparse_embedding_batch_squashed

network = sparse_embedding_batch_squashed

if version==22:
  network = lasagne.layers.DenseLayer(
      network,
      num_units=embedding_dim*2,
      nonlinearity=lasagne.nonlinearities.rectify,
      W=lasagne.init.GlorotUniform(),
      b=lasagne.init.Constant(0.)
    )

network = lasagne.layers.DenseLayer(
    network,
    num_units=embedding_dim,
    nonlinearity=lasagne.nonlinearities.linear,
    W=lasagne.init.GlorotUniform(),
    b=lasagne.init.Constant(0.)
  )

prediction = lasagne.layers.get_output(network)

l2_error = lasagne.objectives.squared_error( prediction, embedding_batch )   
l2_error_mean = l2_error.mean()  # This is a per-element error term
          
#eps = .000001
#sparsity_cost = theano.tensor.mean(theano.tensor.log(sparse_embedding_batch+eps) + theano.tensor.log(1.-eps - sparse_embedding_batch))

#mix = 0.001
#cost = (l2_error + mix*sparsity_cost).astype('float32')

interim_output = lasagne.layers.get_output(sparse_embedding_batch_probs)
if version==21 or version==22:  # Count the number of positive entries
  sparse_flag = theano.tensor.switch( theano.tensor.ge(interim_output, 0.0001), 1.0, 0.0)
  
  #sparsity_mean  = sparse_flag.mean() / args.sparsity  # This is a number 0..1, where 1.0 = perfect = on-target
  sparsity_mean  = sparse_flag.mean() * 100.  # This is realised sparsity 

  sparsity_std  = (sparse_flag.mean(axis=1) / args.sparsity).std()     # assess the 'quality' of the sparsity per-row

  sparsity_probe = sparse_flag.mean(axis=1) / args.sparsity # sparsity across rows may not be ===1.0
  #sparsity_probe = sparse_flag.mean(axis=0) / args.sparsity # sparsity across columns should be ===1.0 (if approximation works)


else:
  sparsity = theano.tensor.mean( (interim_output-0.5)**2 )
  sparsity_mean = sparsity.mean() * 4.0  # This is a number 0..1, where 1=perfect, 0=terrible

sparsity_cost=0.0
if args.mode == 'train':
  mix = theano.tensor.scalar(name='mix', dtype='float32')

  #eps = .000001
  #sparsity_cost = sigma * theano.tensor.mean(theano.tensor.log(interim_output+eps) + theano.tensor.log(1.-eps - interim_output))
  
  sparsity_cost = -mix*sparsity_mean/1000.  # The 1000 factor is because '10' l2 is Ok, and 1 sparsity_mean is Great
  if version==20 or version==21:
    sparsity_cost = mix*0.
  
cost = l2_error_mean + sparsity_cost

params = lasagne.layers.get_all_params(network, trainable=True)


In [None]:

epoch_base=0
if args.load:
  load_vars = hickle.load(args.load)
  print("Saved file had : Epoch:%4d, sigma:%5.2f" % (load_vars['epoch'], load_vars['sigma'], ) )
  #fraction_of_vocab=fraction_of_vocab
  
  epoch_base = load_vars['epoch']
  
  if 'layer_names' in load_vars:
    layer_names = load_vars['layer_names']
  else:
    i=0
    layer_names=[]
    while "Lasagne%d" % (i,) in load_vars:
      layer_names.append( "Lasagne%d" % (i,) )
      i=i+1
    
  layers = [ load_vars[ ln ] for ln in layer_names ]
  
  lasagne.layers.set_all_param_values(network, layers)

  
if args.mode == 'train':
  updates = lasagne.updates.adam( cost, params )

  #iterate_net = theano.function( [batch_start_index], [l2_error_mean,sparsity_mean], updates=updates, 
  iterate_net = theano.function( 
                  [batch_start_index,sigma,mix], 
                  [l2_error_mean,sparsity_mean,sparsity_std,sparsity_probe], 
                  updates=updates, 
                  allow_input_downcast=True,
                  on_unused_input='warn',
                )

  print("Built Theano op graph")
  
  sigma_ = 0.0
  mix_ = 0.0
  if args.param:
    mix_=args.param
  
  t0 = time.time()
  for epoch in range(epoch_base, epoch_base+args.iters):
    t1 = time.time()
    
    if version<8:
      fraction_of_vocab = 0.1 + epoch*(0.05)
      if fraction_of_vocab>1.0: 
        fraction_of_vocab=1.0

      if epoch>20:
        if epoch % 10 == 0:
          sigma_ += 0.02
      
      if epoch>1000:
        sigma_ = 2.0
    
    if version>=8:
      fraction_of_vocab = 1.0

    max_l2_error_mean=-1000.0

    batch_list = np.array( range(0, int(embedding.shape[0]*fraction_of_vocab), batchsize) )
    batch_list = np.random.permutation( batch_list )
    
    for b_start in batch_list.astype(int).tolist():
      #l2_error_mean_,sparsity_mean_ = iterate_net(b_start)
      
      l2_error_mean_,sparsity_mean_,sparsity_std_,sparsity_probe_ = iterate_net(b_start, sigma_, mix_)

      print(" epoch:,%4d, b:,%7d, l2:,%9.2f, sparsity_mean_:,%9.4f, sparsity_std_:,%9.4f, sigma:,%5.2f, mix:,%5.2f, " % 
          (epoch, b_start, 1000*l2_error_mean_, sparsity_mean_, sparsity_std_, sigma_, mix_, ))

      if b_start==0:
        #print("Hurdles : " + np_int_list( sparsity_probe_[0:100] ))
        print("  Row-wise sparsity : " + np_int_list( sparsity_probe_[0:30] ))
        #print("  %d, vector_probe : %s" % (epoch, np_int_list( np.sort(sparsity_probe_[0:100]) ), )) 
        #print("  %d, vector_probe : %s" % (epoch, np_int_list( sparsity_probe_[0:100] ), )) 
        #print("  vector_probe : " + np_int_list( sparsity_probe_[0:1000] ))
      
      if max_l2_error_mean<l2_error_mean_:
        max_l2_error_mean=l2_error_mean_

    print("Time per 100k words %6.2fs" % ((time.time() - t1)/embedding.shape[0]/fraction_of_vocab*1000.*100.,  ))
    #exit()

    boil_limit=10.
    if version==14:
      boil_limit=5.
    
    if args.normalize:
      boil_limit=40.
    
    if max_l2_error_mean*1000.<boil_limit and version<99:
      print("max_l2_error_mean<%6.2f - increasing sparseness emphasis" % (boil_limit,))
      if version<11 and sigma_<2.0 :
        sigma_ += 0.01
      if version>=11:
        sigma_ += 0.01
      mix_ += 0.1

    if (epoch +1) % 10 == 0:
      save_vars = dict(
        version=version,
        epoch=epoch,
        sigma=sigma_,
        mix=mix_,
        fraction_of_vocab=fraction_of_vocab
      )

      layer_names = []
      for i,p in enumerate(lasagne.layers.get_all_param_values(network)):
        if len(p)>0:
          name = "Lasagne%d" % (i,)
          save_vars[ name ] = p
          layer_names.append( name )
      save_vars[ 'layer_names' ] = layer_names
    
      #epoch_thinned = epoch
      #epoch_thinned = int(epoch/10)*10
      #epoch_thinned = int(epoch/50)*50
      epoch_thinned = int(epoch/100)*100
      hickle.dump(save_vars, args.save % (epoch_thinned,), mode='w', compression='gzip')


if args.load and args.mode == 'predict':
  print("Parameters : ", lasagne.layers.get_all_params(network))
  
  get_sparse_linear = theano.function( [batch_start_index], [ lasagne.layers.get_output(sparse_embedding_batch_linear), ])  # allow_input_downcast=True 
  predict_net = theano.function( [batch_start_index], [l2_error_mean,sparsity_mean], allow_input_downcast=True )
  predict_emb = theano.function( [batch_start_index], [prediction], allow_input_downcast=True )

  predict_bin = theano.function( [batch_start_index], [ lasagne.layers.get_output(sparse_embedding_batch_squashed),])

  print("Built Theano op graph")

  if True:  # Shows the error predictions with hard01 sigmoid
    for b_start in range(0, int(embedding.shape[0]), batchsize):
      l2_error_mean_,sparsity_mean_ = predict_net(b_start)

      print(" epoch:%4d, b:%7d, l2:%12.4f, sparsity:%6.4f - hard01" % 
          (epoch_base, b_start, 1000*l2_error_mean_, sparsity_mean_, ))

  if False:  # Shows the linear range of the sparse layer (pre-squashing)
    for b_start in range(0, int(embedding.shape[0]), batchsize * 5):
      sparse_embedding_batch_linear_, = get_sparse_linear(b_start)

      for row in range(0,100,5):
        print(np_int_list( sparse_embedding_batch_linear_[row][0:1000:50], mult=10, size=4 ))

  if args.output:
    predictions=[]
    for b_start in range(0, int(embedding.shape[0]), batchsize):
      prediction_, = predict_emb(b_start)
      
      predictions.append( np.array( prediction_ ) )

      print(" epoch:%3d, b:%7d, Downloading - reconstructed array" % 
          (epoch_base, b_start, ))
    
    embedding_prediction = np.concatenate(predictions, axis=0)
    predictions=None

    print("About to save to %s" % (args.output,))
    d=dict( 
      vocab=vocab, 
      vocab_orig=vocab_orig,
      embedding=embedding_prediction,
    )
    hickle.dump(d, args.output, mode='w', compression='gzip')
  
  if args.direct:
    predictions=[]
    for b_start in range(0, int(embedding.shape[0]), batchsize):
      binarised_, = predict_bin(b_start)
      
      #predictions.append( np.where( binarised_>0.5, 1., 0. ).astype('float32') )
      predictions.append( binarised_.astype('float32') )

      #print(" epoch:%3d, b:%7d, Downloading - hard01 to binary" % 
      print(" epoch:%3d, b:%7d, Downloading - sparse data" % 
          (epoch_base, b_start, ))
    
    embedding_prediction = np.concatenate(predictions, axis=0)
    predictions=None

    print("About to save sparse version to %s" % (args.direct,))
    d=dict( 
      vocab=vocab, 
      vocab_orig=vocab_orig,
      embedding=embedding_prediction,
    )
    hickle.dump(d, args.direct, mode='w', compression='gzip')
  



In [None]:
embedding_file = '../../data/2-pretrained-vectors/glove.6B.300d.hkl'
#embedding_file = '../../data/1-glove-1-billion-and-wiki/window11-lc-36/vectors.2-17.hkl'
#embedding_file = '../4-sparse/sparse.6B.300d_S-21_2n-shuf_4096@1.50_2000_GPU-sparsity_recreate.hkl'
#embedding_file = '../4-sparse/sparse.6B.300d_S-21_2n-shuf_4096@1.50_2000_GPU-sparse_matrix.hkl'

#embedding_file = '../4-sparse/sparse.6B.300d_S-21_2n-shuf_1024@6.75_2000_GPU-sparsity_recreate.hkl'
#embedding_file = '../4-sparse/sparse.6B.300d_S-21_2n-shuf_1024@6.75_2000_GPU-sparse_matrix.hkl'

#embedding_file = '../4-sparse/sparse.6B.300d_S-21_2n-shuf-noise-after-norm_4k_.2.01_1-50_5000_GPU-sparse_matrix.hkl'
embedding_file = '../4-sparse/sparse.6B.300d_S-21_2n-shuf-noise-after-norm_.2.01_6-75_4000_GPU-sparse_matrix.hkl'

import numpy as np
import hickle

d = hickle.load(embedding_file)
vocab, embedding = d['vocab'], d['embedding']
vocab_orig = d['vocab_orig']

dictionary = dict( (word, i) for i,word in enumerate(vocab) if i<len(embedding) )
dictionary_orig = dict( (word, i) for i,word in enumerate(vocab_orig) if i<len(embedding) )

print("Embedding loaded :", embedding.shape)   # (vocab_size, embedding_dimension)=(rows, columns)
embedding_normed = embedding / np.linalg.norm(embedding, axis=1)[:, np.newaxis]

vocab[0]
entries = [ x for x in embedding[0].tolist() if x!=0.0 ]
len(entries)
#45 

for w in 'the iraq baghdad uk london criminal apple some hypothesis maximal innocuous'.split(' '):
  i=dictionary[w]
  entries = [ x for x in embedding[i].tolist() if x!=0.0 ]
  print("%20s @%6d len=%d" % (w,dictionary_orig[w],len(entries),))

  #               the @     0 len=18
  #              some @    60 len=18
  #            london @   266 len=91
  #                uk @   448 len=82
  #              iraq @   606 len=113
  #          criminal @  1449 len=104
  #             apple @  2046 len=112
  #           baghdad @  2320 len=116
  #        hypothesis @  6957 len=136
  #           maximal @ 27962 len=107
  #         innocuous @ 30111 len=86


# Look at per-position best words
for i in range(0, embedding.shape[1], 10):
  best_words_j = np.argsort( -embedding[:, i ] )[0:10]
  for j in best_words_j:
    print("%4i -> %s" % (i, vocab[j],))
  print('')

i=2000
values = [x for x in (-np.sort( -embedding[i] )).tolist() if x>0. ]
print("values: ["+', '.join([ ('%.4f' % (x,)) for x in values ])+']')
#values: [1.1442, 0.9337, 0.9333, 0.9257, 0.7520, 0.5529, 0.4818, 0.4740, 0.4568, 0.4554, 0.4434, 0.4419, 0.4334, 0.4187, 0.4175, 0.4068, 0.4005, 0.3989, 0.3698, 0.3421, 0.3206, 0.3151, 0.3150, 0.3120, 0.3119, 0.3067, 0.3010, 0.2948, 0.2853, 0.2828, 0.2816, 0.2815, 0.2799, 0.2793, 0.2764, 0.2714, 0.2636, 0.2570, 0.2507, 0.2487, 0.2336, 0.2336, 0.2335, 0.2328, 0.2325, 0.2323, 0.2255, 0.2227, 0.2227, 0.2226, 0.2208, 0.2178, 0.2159, 0.2134, 0.2067, 0.2049, 0.1947, 0.1935, 0.1932, 0.1926, 0.1921, 0.1914, 0.1897, 0.1894, 0.1832, 0.1782, 0.1766, 0.1730, 0.1714, 0.1683, 0.1662, 0.1638, 0.1629, 0.1602, 0.1568, 0.1561, 0.1452, 0.1419, 0.1399, 0.1372, 0.1370, 0.1352, 0.1350, 0.1342, 0.1334, 0.1334, 0.1302, 0.1289, 0.1268, 0.1243, 0.1230, 0.1211, 0.1192, 0.1113, 0.1051]

print("changes: ["+', '.join([ ('%.1f' % (values[i+1]/values[i]*100.,)) for i in range(0,len(values)-1) ])+']')
#changes: [81.6, 100.0, 99.2, 81.2, 73.5, 87.1, 98.4, 96.4, 99.7, 97.3, 99.7, 98.1, 96.6, 99.7, 97.4, 98.4, 99.6, 92.7, 92.5, 93.7, 98.3, 100.0, 99.0, 100.0, 98.3, 98.1, 97.9, 96.8, 99.1, 99.6, 100.0, 99.4, 99.8, 99.0, 98.2, 97.1, 97.5, 97.6, 99.2, 93.9, 100.0, 100.0, 99.7, 99.9, 99.9, 97.1, 98.8, 100.0, 100.0, 99.2, 98.6, 99.1, 98.9, 96.9, 99.1, 95.0, 99.4, 99.9, 99.7, 99.8, 99.6, 99.1, 99.8, 96.7, 97.3, 99.1, 98.0, 99.1, 98.2, 98.8, 98.6, 99.4, 98.3, 97.9, 99.5, 93.1, 97.7, 98.6, 98.1, 99.8, 98.7, 99.9, 99.4, 99.4, 100.0, 97.6, 99.0, 98.4, 98.0, 99.0, 98.4, 98.5, 93.4, 94.4]


w='motorcycle'
w_i=dictionary[w]

#top_i =np.argmax(embedding[w_i])
good_i =np.argsort( -embedding[w_i] )

for i in range(0,10):
  best_words_j = np.argsort( -embedding[:, good_i[i] ] )[0:12]
  
  #for j in best_words_j:
  #  print("%s" % (vocab[j],))
  #print('')
  
  print("%s" % (', '.join( [ vocab[j] for j in best_words_j] ), ) )

  


In [1]:
def vector_for(w):
  w_i=dictionary[w]
  return embedding[w_i]

def l2_normed(e):
  return e / np.sqrt( np.dot(e,e) )

def cosine(a,b):
  return np.dot(l2_normed(a), l2_normed(b))

def top_senses_for(e):
  good_i = np.argsort( -e )
  for i in range(0,10):
    best_words_j = np.argsort( -embedding[:, good_i[i] ] )[0:12]
    print("%s" % (', '.join( [ vocab[j] for j in best_words_j] ), ) )

def closest_to(e):
  closest = np.argsort( - np.dot(embedding_normed, l2_normed(e) ) )
  return "%s" % (', '.join( [ vocab[j] for j in closest[0:20] ] ), ) 

def count_positive(e):
  return len( [ x for x in e.tolist() if x>0.0 ] )

def nonneg(e):
  return np.maximum(0, e)

def closest_dist(s):
  ab,xy = s.split('=')
  (a,b),(x,y) = ab.split(':'), xy.split(':')
  print( "%s is to %s as %s is to ?%s? " % (a,b,x,y,))
  (a,b,x,y) = map(vector_for, [a,b,x,y])  # Convert to vectors
  print('  x+b-a           = %s' % (closest_to( x + b - a ),))
  print('  [x+b-a]         = %s' % (closest_to( nonneg(x + b - a) ),))
  print('  x+[b-a]         = %s' % (closest_to( x + nonneg(b-a) ),))
  print('  [x-a]+b         = %s' % (closest_to( nonneg(x-a) + b ),))
  print('  [2x-a]+[2b-a]   = %s' % (closest_to( nonneg(2*x-a) + nonneg(2*b-a) ),))
  print('  x+[b-a]+b+[x-a] = %s' % (closest_to( x+nonneg(b-a) + b+nonneg(x-a) ),))


In [None]:
top_senses_for(vector_for('motorbike'))

man   = vector_for('man')
woman = vector_for('woman')
king  = vector_for('king')
queen = vector_for('queen')

top_senses_for(man)
top_senses_for(woman)
top_senses_for(king)
top_senses_for(queen)

top_senses_for(man * woman) # Intersection
top_senses_for(man + woman) # Union
top_senses_for(man - woman) # ??


>>> closest_to(man)
man, woman, girl, person, men, teenager, she, friend, he, father, her, boy, someone, mother, him, his, victim, son, who, guy
>>> closest_to(woman)
woman, man, girl, mother, teenager, daughter, wife, women, her, person, she, girlfriend, friend, men, husband, widow, couple, boy, someone, victim

>>> closest_to(king)
king, queen, henry, mswati, mongkut, eirik, charles, vajiravudh, thoden, wenceslaus, zvonimir, athelstan, vladislaus, thelred, gojong, prince, jayavarman, kalkaua, sweyn, pomare
>>> closest_to(queen)
queen, princess, elizabeth, king, margrethe, empress, lady, sister, prince, sirikit, mary, cixi, monarch, daughter, duchess, olten, mother, infanta, rania, widow

closest_dist('pound:england=franc:france')


england,pound,america,dollar = map(vector_for, 'england pound america dollar'.split())

curr = england,pound,america,dollar = map(vector_for, 'england pound america dollar'.split())
map(count_positive, curr)
[84, 126, 94, 134]

map(count_positive, [ england*pound, america*dollar, england*america, pound*dollar])
[12, 14, 17, 56]

total = england+pound+america+dollar
map(count_positive, [ england*101-100*total, pound*101-100*total, america*101-100*total, dollar*101-100*total])
