Skip to content

Word Embedding Recreation Techniques (MSc Dissertation Project)

Notifications You must be signed in to change notification settings

Jamesetay1/word-embedding-reconstruction

Repository files navigation

Word Embedding Reconstruction with Subword Embeddings for Language Modelling

This repository is the main hub of work my MSc Dissertation "Word Embedding Reconstruction with Subword Embeddings for Language Modelling", available here

Repository Contents

This repository is a mix of my own work and forks from other repositories.

charCNN is a reimplementation of Learning to Generate Word Representations using Subword Information, which I use as a reconstruction method.

cr_clean is a fork of Compact Reconstruction, which I use as a reconstruction method.

pyTorch_LM is a fork of a word level LM pyTorch tutorial, which I include the ability to use pre-trained embeddings and then use to build a language model and evaluate perplexity of various embeddings.

distance_measuring has inital tests of the quality of the reconstructed embeddings compared to reference embeddings, such as cosine similarity and P@n

results/embedding_comparison is where I put my results!

utils has a few repo-wide scripts for making baseline embeddings, limiting the character set in embeddings, etc.

About

Word Embedding Recreation Techniques (MSc Dissertation Project)

Topics

Resources

Stars

Watchers

Forks