## Word Embedding 
Text can be converted into numbers for training a model by using a technique called **vectorization**. Vectorization involves representing each word in the text as a numerical vector, which is then used as input to the machine learning model. Vectorization allows the model to understand the meaning of words in the text and make predictions based on the data. Vectorization can also be used to create features from the text, such as word count or frequency.

**Word embedding** is a technique used in natural language processing (NLP) to represent words as numerical vectors. Word embeddings capture the semantic meaning of words by mapping them to a vector space. This allows the model to understand the context of words and make better predictions. **Word embeddings can be created using techniques such as word2vec or GloVe.**

Word embedding is a process where words are represented as numerical vectors. This is done by mapping each word to a vector space, where the position of the vector corresponds to its semantic meaning. The process of creating word embeddings involves training an artificial neural network on a large dataset of text. The network learns the relationships between words and uses them to generate numerical vectors that capture their semantic meanings.

**Word2vec** is a technique used in natural language processing (NLP) to represent words as numerical vectors. It uses a shallow two-layer neural network to learn the vector representations of words from large datasets. The model can then be used to generate numerical vectors that capture the semantic meaning of words in the dataset. Word2vec is commonly used for tasks such as sentiment analysis and text classification.

**GloVe (Global Vectors for Word Representation)** is a technique used in natural language processing (NLP) to represent words as numerical vectors. It uses a weighted least squares regression model to learn the vector representations of words from large datasets. The model can then be used to generate numerical vectors that capture the semantic meaning of words in the dataset. GloVe is commonly used for tasks such as sentiment analysis and text classification.

<img src = "img.jpg" width = "900px" height = "600px"></img>

### Method 1: Unique Numbers
Unique Number Technique (UNT) is a technique used in natural language processing (NLP) to represent text as numerical vectors. It uses a dictionary of numbers and maps each number in the text to its corresponding vector representation. UNT can be used to convert text into numerical vectors for use in machine learning models. It is commonly used for tasks such as sentiment analysis and text classification.
* Converting each word into a specific unique number. 
* Follow the images.

<img src = "img1.jpg" width = "900px" height = "600px"></img>

* **The issue with this approach is as follow:**
    1. Numbers are random. They don't capture relationship between words. For example Axar and Dhone are both players and have similarity but we defined 2 and 7 random numbers for them. Next ashes is a tournament and Axar is a player but they have near distance with each other. 

### Method 2: One Hot Encoding
- One-hot encoding is a technique used in machine learning to represent categorical data as numerical vectors. It encodes a category as a vector of zeros, with a single “1” indicating the presence of that category. One-hot encoding can be used to convert text into numerical vectors for use in machine learning models. It is commonly used for tasks such as sentiment analysis and text classification. 

<img src = "img2.jpg" width = "900px" height = "600px"></img>

* **The issues with this approach is as follow:**
    1. Doesn't capture relationship between words.
    2. Computationally in-efficient.

### Method 3: Word Embedding
- Word embedding is a technique used in natural language processing (NLP) to represent words as numerical vectors. Word embeddings capture the semantic meaning of words by mapping them to a vector space. This allows the model to understand the context of words and make better predictions. Word embeddings can be created using techniques such as word2vec or GloVe.
- Word Embedding allows you to capture the relationship (similarities) between two words.
- Follow the bellow images to clear the idea how word embedding word:

<img src = "img3.jpg" width = "900px" height = "600px"></img>

<img src = "img4.jpg" width = "900px" height = "600px"></img>

<img src = "img5.jpg" width = "900px" height = "600px"></img>

<img src = "img6.jpg" width = "900px" height = "600px"></img>

So till now we covered that word embedding is a better technique to represent text as numbers. Now the question is how we should come out with the features? It's looking to be a very challenging task, but the good thing is that Neural Network can compute these features for us. The features aren't hand crafted (Embedding aren't hand crafted.), They are derived while training the neural network.  
- **There are two techniques to compute word embedding:**
    1. Supervised Learning
    2. Self Supervised Learning (Word2Vec & Glove)

#### Supervised Learning technique
In supervised learning technique we take an NLP problem and we try to solve the NLP problem, so as side effect we get word embedding. So based on general concept of Neural Network the feature vectors are generated.

* Let's practically implement **Supervised Learning technique** for word embedding:

In [1]:
# Required Libraries:
import numpy as np
from tensorflow.keras.preprocessing.text import one_hot
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Embedding

In [5]:
# So here we classifying the restaurant reviews.
# Total we have total 10 reviews.
reviews = ['nice food',
        'amazing restaurant',
        'too good',
        'just loved it!',
        'will go again',
        'horrible food',
        'never go there',
        'poor service',
        'poor quality',
        'needs improvement']

In [4]:
# The labels of reviews are shown bellow, the first five reviews are positive and the last five reviews are negative:
sentiment = np.array([1,1,1,1,1,0,0,0,0,0])

In [6]:
# So first we want to convert the reviews into numbers using One Hot Encoding. 
# Here we use one_hot method which take the review and your vocabulary size. 
# So if we say 30 for the vocabulary size, it will gibe the unique number for each word (the number is between 1 and 30).
# If we say 500, then the assigned numbers will be between 1 and 500.
# So One Hot Encoding is giving the fix (unique) number and internally the layers will convert it into 0,0,1,0,1 and so on.
one_hot("amazing restaurant", 30)                           

[24, 6]

In [9]:
# Now we want to encode all the reviews into one hot encoding. So first we assign the vocabulary size as 30.
# So here we did simple list comprehension, we go through all the reviews, and for each review we created one hot encoded 
# vectors.
voc_size = 30;
encoded_reviews = [one_hot(d, voc_size) for d in reviews]
#print(encoded_reviews)
encoded_reviews

[[1, 25],
 [24, 6],
 [25, 1],
 [3, 25, 24],
 [19, 15, 23],
 [22, 25],
 [21, 15, 7],
 [6, 25],
 [6, 8],
 [27, 23]]

In [11]:
# So here some reviews have two words and some reviews have three words. So for the two words reviews we need to do padding.
# Padding means, we add zeroes instead of the third words into two words reviews.
# So first we define 'max_lenght' which may be 3 or 4.
# Then we use keras 'pad_sequence' method, and we supply all the encoded sequences to the function with addition to max_
# lenght. paddin = 'post' means tht pad the reviews towards the end. Toward the end we'll get the zeroes.
max_length = 3
padded_reviews = pad_sequences(encoded_reviews, maxlen=max_length, padding='post')
padded_reviews

array([[ 1, 25,  0],
       [24,  6,  0],
       [25,  1,  0],
       [ 3, 25, 24],
       [19, 15, 23],
       [22, 25,  0],
       [21, 15,  7],
       [ 6, 25,  0],
       [ 6,  8,  0],
       [27, 23,  0]])

In [24]:
# So now our padded reviews have equal size. Every vector size is three.
# The next step is embedded vectors size. Let's we have 5 embedded vector size. By embeded vector size means the number of 
# features which we'll have for each word.
# Next we create NN model. And our first layer will be embedding layer. For embedding layer we use 'Embedding' class.
# The way embedding class work is, it takes couple of arguments: the first argument is vocabulary size and the second argum-
# ent is the embeded vector size and we also supply the max lenght of the reviews and finally we give it a name to use it l-
# ater. The second layer is (when we get embedded vectors from the embedded layer, the next stpe (layer) is to flatten the -
# vectors to create a general vectors.
# The next layer is One neuron sigmoid activation function (dense layer with sigmoid activation function).

embeded_vector_size = 5

model = Sequential()
model.add(Embedding(voc_size, embeded_vector_size, input_length=max_length,name="embedding"))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

In [25]:
# So the next stpe is to define X and Y. So to keep thing simple, our X is padded reviews and Y is sentiment.
X = padded_reviews
y = sentiment

In [26]:
# Next we compile the model and print the model summary:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 3, 5)              150       
                                                                 
 flatten_2 (Flatten)         (None, 15)                0         
                                                                 
 dense_2 (Dense)             (None, 1)                 16        
                                                                 
Total params: 166
Trainable params: 166
Non-trainable params: 0
_________________________________________________________________
None


In [27]:
# Now we want to train the model for epochs 50.
model.fit(X, y, epochs = 100, verbose = 0)

<keras.callbacks.History at 0x18354a4b1c0>

In [28]:
# Now once the model is trined, let's check the accuracy.
loss, accuracy = model.evaluate(X, y)
accuracy



1.0

* **So as the dataset is very small, our accuracy is 100%.**
* So **Word Embedding** is nothing but those parameters which are in your neural network. 

In [30]:
# We can use a method called 'get_layer' which takes layer_name and will return all the weights. So these are the weights 
# which we were looking for.
weights = model.get_layer('embedding').get_weights()[0]
weights

array([[ 0.06113086, -0.1187233 , -0.09578586, -0.11541521,  0.08865828],
       [ 0.04690395, -0.12556437, -0.12741852,  0.08548369,  0.14064033],
       [-0.00943569, -0.02478651, -0.04124054,  0.00321704,  0.04346119],
       [-0.05460171, -0.10659345, -0.17304026,  0.12057173,  0.08760702],
       [-0.03408168, -0.0275035 , -0.004375  ,  0.04581155, -0.01392312],
       [ 0.00162426,  0.04818442, -0.03545678,  0.03306409, -0.04704807],
       [ 0.05532457,  0.12023149, -0.07434893, -0.10334461, -0.13416561],
       [ 0.08107556, -0.14074881, -0.13753182, -0.09706683,  0.08575559],
       [-0.11736707, -0.13170387,  0.11831835,  0.08755741, -0.0598025 ],
       [-0.04500035,  0.03035133, -0.03501581, -0.00991637,  0.04487015],
       [-0.04602529,  0.00416138, -0.01348054, -0.02405175,  0.03791169],
       [-0.01787271, -0.02889364, -0.00493195,  0.00060501, -0.03145068],
       [ 0.02629424,  0.0340098 , -0.04178872,  0.00815178,  0.03532988],
       [ 0.01060029, -0.0106612 , -0.0

In [31]:
# If we check the lenght of these weights, it should be 30, because our vocabulary size was 30.
len(weights)

30

In [35]:
# Now let's check the weights for some words. If we check it for words 'nice' and 'amazing', the generated vectors will be 
# almost similar because these words are looking to have similar meaning. 
# Weights for word 'nice'. // we give 1 for word nice, because when we were doing one hot encoding, the number '1' was 
                          # assigned for word 'nice'.
weights[1]

array([ 0.04690395, -0.12556437, -0.12741852,  0.08548369,  0.14064033],
      dtype=float32)

In [36]:
# Weights for word 'amazing':
weights[24]

array([-0.0820071 ,  0.04073719,  0.03709897,  0.12229957,  0.10632784],
      dtype=float32)

* So as our dataset is very very small, the weights of these two words 'nice' and 'amazing' are looking different. If we have big dataset, then surly the weights of these words be similar and we can compute the cosine similarity.
* So the way Keras Embedding layer works, it compute the embedding on different NLP tasks.
* The other way of computing embedding is, we can save the computted embedding and save it in a file and later on when we want to perform a different type of tasks, we can load the saved embeddings from that file into the embedding layer and use it.

#### To know more about word embedding (using saved weights ...) you can read the following article:
    https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/