### Neural Networks for Word Embeddings: Introduction to Natural Language Processing
[link](https://medium.com/analytics-vidhya/neural-networks-for-word-embeddings-4b49e0e9c955)

In [1]:
!pip install wikipedia
!pip install gensim
import wikipedia
import nltk 
from gensim.models import Word2Vec



In [6]:
chess = wikipedia.page("Chess").content
chess

'Chess is a two-player strategy board game played on a checkered board with 64 squares arranged in an 8×8 square grid. Played by millions of people worldwide, chess is believed to be derived from the Indian game chaturanga sometime before the 7th century. Chaturanga is also the likely ancestor of the East Asian strategy games xiangqi (Chinese chess), janggi (Korean chess), and shogi (Japanese chess). Chess reached Europe via Persia and Arabia by the 9th century, due to the Umayyad conquest of Hispania. The queen and bishop assumed their current powers in Spain in the late 15th century, and the modern rules were standardized in the 19th century.\nPlay involves no hidden information. Each player begins with 16 pieces: one king, one queen, two rooks, two knights, two bishops, and eight pawns. Each piece type moves differently, with the most powerful being the queen and the least powerful the pawn. The objective is to checkmate the opponent\'s king by placing it under an inescapable threat

In [8]:
# split our document into sentences
sentences = nltk.sent_tokenize(chess) 

length = len(sentences)

In [9]:
stopwords = set(nltk.corpus.stopwords.words("english"))

for i in range(0, length): 
    
    # further tokenize our sentences
    temp = nltk.word_tokenize(sentences[i])
    
    # removing stop words, non-alpabetical tokens and converting to lower case 
    sentences[i] = [word.lower() for word in temp if word not in stopwords and word.isalpha()]    

In [10]:
# size refers to the desired dimensionality of vectors 
# window is upper bound in dynamic context window
model = Word2Vec(sentences, size=100, window=5)

In [11]:
# Exploring the model 

# Measures the similarity between words using cosine similarity 

model.similarity("rook", "knight") 

  """


0.67347497

In [12]:
# Finds the top n most similar words 

model.similar_by_word("king",10)

  This is separate from the ipykernel package so we can avoid doing imports until


[('chess', 0.8685276508331299),
 ('world', 0.8283013105392456),
 ('opponent', 0.8192204833030701),
 ('pawn', 0.8153437376022339),
 ('moves', 0.8127784729003906),
 ('two', 0.8082280158996582),
 ('pieces', 0.7987949252128601),
 ('play', 0.7981538772583008),
 ('players', 0.7963162660598755),
 ('player', 0.7919891476631165)]

In [13]:
# Find word using vector addition. Does opponent + checkmate = lose? 

opponent_checkmate = model.wv["opponent"] + model.wv["checkmate"] # add the vectors for king and checkmate

In [14]:
model.most_similar(positive = [opponent_checkmate], topn=10) # find most similar word to vector

  """Entry point for launching an IPython kernel.


[('opponent', 0.9237053394317627),
 ('checkmate', 0.8880871534347534),
 ('chess', 0.8735392093658447),
 ('one', 0.8249163627624512),
 ('king', 0.8226971626281738),
 ('pawn', 0.8185840249061584),
 ('game', 0.8140382170677185),
 ('player', 0.8128865957260132),
 ('white', 0.811844527721405),
 ('pieces', 0.8100132346153259)]