Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 

LinearWordEmbed

This document describes how to learn linear transformation between different word embeddings (e.g. CBOW and word2vec). For more details, see our paper:

Bollegala, Hayashi, Kawarabayashi. Learning Linear Transformations between Counting-based and Prediction-based Word Embeddings. PLoS ONE 12(9): e0184544, 2017.

Unfortunately, the original code is dirty, so I decided to show the core recipe of our learning algorithm.

Recipe

Let u_i be the m-dimensional embedding vector and v_i be the n-dimensional embedding vector for word i. The core idea is to learn C, the m by n matrix that transforms v_i to u_i such that u_i ~= Cv_i. For this purpose, we define the objective function over p words as \sum_{i=1}^p ||u_i - Cv_i||^2 = ||U-VC||^2_F, where U and V are collections of embeddings over p words and ||.||_F denotes the Frobenius norm.

We use stochastic gradient descent (SGD) to learn C. For SGD, vowpal wabbit (VW) is helpful, because it efficiently works for large scale data.

Note that the problem is equivalent to m-variate linear regression. However, because VW cannot handle multidimensional output, we separate the problem as m scalar-output linear regression problems. For each prediction dimension j=1,...,m, we need to create a file in the VW input format. In the VW format, each line corresponds to a training sample, and the entire file is something like this:

u_1j | 1:v_11 2:v_12 ... n:v_1n
u_2j | 1:v_21 2:v_22 ... n:v_2n
...
u_pj | 1:v_p1 2:v_p2 ... n:v_pn

By running VW with the file for j=1,...,m, we can obtain c_j as the part of the transformation C=[c_1;...;c_m].

About

A recipe to learn linear transformation between different word embeddings

Resources

Releases

No releases published

Packages

No packages published