<img src='we.png'>
<img src='analogy.png'>

- converting 300 dimensions to 2 dimens
<img src='2d.png'>

# <u> Implementing Word Embedding </u>

### steps to implement word embedding
1. vocab size
2. one hot representation
3. embedding Representation 
    - pad_sequences
    - Sequential Model
    - layer with dimensions (features)
<img src ='232.png'>

### word embedding techniques using embedding layer in Keras

In [5]:
# sentences
sents=['the glass of milk',
     'the glass of juice',
     'the cup of tea',
     'I am a good developer',
     'understand the meaning of developer',
     'your videos are good']

In [6]:
sents

['the glass of milk',
 'the glass of juice',
 'the cup of tea',
 'I am a good developer',
 'understand the meaning of developer',
 'your videos are good']

In [4]:
# step 1: vocabulary size
voc_size=1000

In [8]:
# step 2: one hot representation
from tensorflow.keras.preprocessing.text import one_hot
sents_onehot=[one_hot(sent,voc_size) for sent in sents]

In [9]:
sents_onehot

[[561, 137, 274, 916],
 [561, 137, 274, 676],
 [561, 405, 274, 700],
 [844, 616, 969, 672, 999],
 [456, 561, 579, 274, 999],
 [635, 22, 950, 672]]

In [11]:
# step 3: Embedding layer
from tensorflow.keras.layers import Embedding
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential

In [12]:
import numpy as np

In [14]:
sent_length=8

embedded_docs=pad_sequences(sents_onehot,padding='pre',maxlen=sent_length)

embedded_docs

array([[  0,   0,   0,   0, 561, 137, 274, 916],
       [  0,   0,   0,   0, 561, 137, 274, 676],
       [  0,   0,   0,   0, 561, 405, 274, 700],
       [  0,   0,   0, 844, 616, 969, 672, 999],
       [  0,   0,   0, 456, 561, 579, 274, 999],
       [  0,   0,   0,   0, 635,  22, 950, 672]])

In [17]:
model=Sequential()
# vocab size - no of columns --> no of words
# dim (2nd paramerter) --> no of rows --> no of features
model.add(Embedding(voc_size,10,input_length=sent_length))

# adam optimizer, mse error
model.compile('adam','mse')

In [18]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 8, 10)             10000     
Total params: 10,000
Trainable params: 10,000
Non-trainable params: 0
_________________________________________________________________


In [19]:
print(model.predict(embedded_docs))

[[[ 0.04284041  0.01286211 -0.01976429 -0.04456614 -0.0169171
   -0.02869103 -0.01046633  0.04654875 -0.031579   -0.03542005]
  [ 0.04284041  0.01286211 -0.01976429 -0.04456614 -0.0169171
   -0.02869103 -0.01046633  0.04654875 -0.031579   -0.03542005]
  [ 0.04284041  0.01286211 -0.01976429 -0.04456614 -0.0169171
   -0.02869103 -0.01046633  0.04654875 -0.031579   -0.03542005]
  [ 0.04284041  0.01286211 -0.01976429 -0.04456614 -0.0169171
   -0.02869103 -0.01046633  0.04654875 -0.031579   -0.03542005]
  [-0.04489905  0.0448199   0.04858373 -0.00115131  0.03810059
    0.00381806 -0.0154647   0.00492752 -0.01108494  0.01081719]
  [-0.019838    0.0432758  -0.03252403  0.04544988 -0.01862431
   -0.03500415 -0.02742245 -0.04164118  0.04620595 -0.01324952]
  [ 0.01059176  0.00477562 -0.02030197 -0.04631165 -0.01681845
    0.0252024   0.04952595 -0.00351278 -0.04620087  0.00945089]
  [-0.03323106  0.02133368 -0.03055931  0.02604817  0.00768939
    0.01120262  0.04542447  0.02127612 -0.03154254 -

In [20]:
embedded_docs[0]

array([  0,   0,   0,   0, 561, 137, 274, 916])

In [21]:
print(model.predict(embedded_docs[0]))

[[[ 0.04284041  0.01286211 -0.01976429 -0.04456614 -0.0169171
   -0.02869103 -0.01046633  0.04654875 -0.031579   -0.03542005]]

 [[ 0.04284041  0.01286211 -0.01976429 -0.04456614 -0.0169171
   -0.02869103 -0.01046633  0.04654875 -0.031579   -0.03542005]]

 [[ 0.04284041  0.01286211 -0.01976429 -0.04456614 -0.0169171
   -0.02869103 -0.01046633  0.04654875 -0.031579   -0.03542005]]

 [[ 0.04284041  0.01286211 -0.01976429 -0.04456614 -0.0169171
   -0.02869103 -0.01046633  0.04654875 -0.031579   -0.03542005]]

 [[-0.04489905  0.0448199   0.04858373 -0.00115131  0.03810059
    0.00381806 -0.0154647   0.00492752 -0.01108494  0.01081719]]

 [[-0.019838    0.0432758  -0.03252403  0.04544988 -0.01862431
   -0.03500415 -0.02742245 -0.04164118  0.04620595 -0.01324952]]

 [[ 0.01059176  0.00477562 -0.02030197 -0.04631165 -0.01681845
    0.0252024   0.04952595 -0.00351278 -0.04620087  0.00945089]]

 [[-0.03323106  0.02133368 -0.03055931  0.02604817  0.00768939
    0.01120262  0.04542447  0.02127612