Skipgram

Overview

Word2Vec recognizes semantic closeness between words by transforming words into vectors with meaningful contextual information.

Mikolv et. al. in https://arxiv.org/abs/1301.3781 proposed two architectures for Word2Vec:

Skip-gram
CBOW

The original code was written in C.

This repository contains the implementation of neural network for skip-gram from scratch in Python without using any machine learning or text processing libraries.

How to run

To train the model, run the train_minibatch.py script on command line:

python train_minibatch.py

To predict similar words, run the predict.py script on command line:

python predict.py

How does it work?

"train_minibatch.py" is the training file. It trains the neural network for any given dataset (dataset.csv) and generates skipgram_w1.npy, initialPlot.png (word embeddings of untrained word vectors), and finalPlot.png (word embeddings for trained word vectors).
The resultant trained word vectors are preserved as skipgram_w1.npy.
predict.py uses the trained word vectors to:
- output cosine similarity between two input words.
- output 10 closest context words to any input words. This code has been formatted to fetch input from command line.

After this, I implemented another version using TensorFlow.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
res		res
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
predict.py		predict.py
train_minibatch.py		train_minibatch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skipgram

Table of Contents

Overview

How to run

How does it work?

About

Releases

Packages

Languages

rajshrivastava/Skipgram

Folders and files

Latest commit

History

Repository files navigation

Skipgram

Table of Contents

Overview

How to run

How does it work?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages