Skip to content

moussaKam/Lebanese-news-word-embedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Lebanon News Embeddings Explorer logo

Web application developed using flask to explore the word embeddings trained on the lebanese news.

Link to the web application: https://lebanesenewsembeddings.pythonanywhere.com/

word2vec

word2vec is a technique in NLP that implements a neural network to embed words in a euclidean space. Hence, each word in the training corpus is mapped to a vector. This mapping allows the detection of synonyms and semantically related words, by computing the cosine similarity between their attributed vectors. Namely, similar words will be "close" in the euclidean space.
Another interesting property is the ability of this technqiue to produce vectors that reflect analogies in natural language. For example, "king" is for "man" as "queen" is for "woman", then the nearest neighbor to vector('king') - vector('man') + vector('woman') would be the vector('queen').

This website has 3 different tools:

1-Word analogies:

This tool compares the relationship between two pairs of words. For instance, giving 3 words A, B and C, the fourth word D will be found in a way that relation between A and B will be same as between C and D (B - A = D - C). To do so, the tool finds the top 10 similar words to B - A + C.

analogy

2-Similarity score:

This tool finds the cosine similarity score between 2 words.

similarity score

3-Top similar words:

this tool finds the top 10 closest words of the input according to the similarity score between the vectors. most similar

To use locally:

1-Clone the repository.
2-Download the word vectors file here and place it in a folder called models.
3-Place the cloned repository and the folder "models" in the same path.
4-Run "run.py" in the explorer folder using python3

About

Tool to explore Lebanese news words embedding.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published