Training Skip-Gram Model

Lab Assignment from AI for Beginners Curriculum.

Task

In this lab, you we challenge you to train Word2Vec model using Skip-Gram technique. Train a network with embedding to predict neighboring words in $N$-tokens-wide Skip-Gram window. You can use the code from this lesson, and slightly modify it.

The Dataset

You are welcome to use any book. You can find a lot of free texts at Project Gutenberg, for example, here is a direct link to Alice's Adventures in Wonderland) by Lewis Carroll. Or, you can use Shakespeare's plays, which you can get using the following code:

path_to_file = tf.keras.utils.get_file(
   'shakespeare.txt', 
   'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')

Explore!

If you have time and want to get deeper into the subject, try to explore several things:

How does embedding size affects the results?
How does different text styles affect the result?
Take several very different types of words and their synonyms, obtain their vector representations, apply PCA to reduce dimensions to 2, and plot them in 2D space. Do you see any patterns?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Training Skip-Gram Model

Task

The Dataset

Explore!

Files

README.md

Latest commit

History

README.md

File metadata and controls

Training Skip-Gram Model

Task

The Dataset

Explore!