GitHub

Text embeddings are essential for projects relying on textual context, as computers interpret data as numbers. This project explores converting textual information into machine-readable language using techniques like "TF-IDF," "One-Hot Encoding," "GloVe," and "BERT." Here, I implemented the "Word2Vec" model.

EXAMPLE : { Skip-gram model }

SENTENCE = "This is a word embeddings project".

words -> {word, embeddings, project} 
window size = 2  

central word = {embeddings}
   context , target     
{embeddings, word} 
{embeddings, project}
{embeddings, a}

STEPS FOR PREDICTING THE TARGET WORDS GIVEN CONTEXT :

1. extract frequent words and remove noise
2. perform one-hot encoding for these words(vocabulary)
3. pass into a neural network model with 3 layers
        -> 1st layer - one-hot encoding layer
        -> 2nd layer - embeddings layer
            {one-hot encodings -> dense vectors}
        -> 3rd layer - softmax  layer
            {calculating the probability}
        -> Training the model
            {calculate the loss and backpropagation}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
Untitled-1.ipynb		Untitled-1.ipynb
sample.csv		sample.csv
spam.csv		spam.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

jrspatel/word_embeddings

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages