In [None]:
'''
--------------------------------------------------------------------------------
1. One Hot Encoding. 
    Text                    O/P
    D1 The food is good      1
    D2 The food is bad       0
    D3 Pizza is Amazing      1

Total Vocabulary or Unique words:
The food is good bad Pizza Amazing
1   0    0  0    0   0     0
0   1    0  0    0   0     0
0   0    1  0    0   0     0
0   0    0  1    0   0     0
0   0    0  0    1   0     0
0   0    0  0    0   1     0
0   0    0  0    0   0     1

D1 [[1 0 0 0 0 0 0], [0 1 0 0 0 0 0], [0 0 1 0 0 0 0], [0 0 0 1 0 0 0]] Dim: 4x7
D2 [[1 0 0 0 0 0 0], [0 1 0 0 0 0 0], [0 0 1 0 0 0 0], [0 0 0 1 0 0 0]] Dim: 4x7
S3 [[0 0 0 0 0 1 0], [0 0 1 0 0 0 0], [0 0 0 0 0 0 1]] Dim: 3x7
And similar. 

    # Advantages of One Hot Encoding. 
    -> Easy to implement with Python. 
        Like OneHotEncoder, pd.get_dummies

    # Disadvantages of One Hot Encoding. 
    -> Creates sparse matrix meaning matrix containing lots of zeros which leads to overfitting. 
    -> ML Algorithms Expect fixed size inputs.
    -> We cannot calculate the semantic meaning like static like "river bank" and "bank money".
    -> Out of Vocabulary. When the words does not present in the dataset, In this case, Out of Vocabulary.

--------------------------------------------------------------------------------
2. Bags of Words. 
Dataset. 
Text                    O/P
He is a good boy         1
She is a good girl       1
Boy and girl are good    1

steps:
    - Lower all the words. 
        S1: he is a good boy
        S2: she is a good boy
        S3: boy and girl are good
    - Apply stopwords. 
        he, is, a, she, and, are -->  gets deleted. 
        S1: good boy
        S2: good girl
        S3: boy girl good

    Total Unique Vocabulary             Frequency
        good                                3
        boy                                 2
        girl                                2

        then sort the frequency in the descending order which is already in order. 
        Based on the top most frequency, I will make at as a features. 
        Like this. 
            good        boy      girl         0/P
        S1:  [1          1        0]           1
        S2:  [0          0        1]           1
        S3:  [1          1        1]           1
            What if suppose 'good' increased or repeated, Just increased the count 
        Binary BoW: Even the word repeated, it forced to 1. [1 or 0]
        But in case of the Normal BoW, word count gets updated based on the frequency.

        Advantages:
        - Easy to Implement and Intuitive. 
        - Fixed Sized I/P. 

        Dis-Advantages:
        - Sparse Matrix on arrays. 
        - Ordering of the words is getting changed, means the meaning of the words gets changed. 
        - Out of Vocabulary(OOV) still issues here. 
        - Semantic Meaning is still not getting captured. 

3. Tf-IDF (Term Frequency - Inverse Document Frequency)

Term Frequency (TF) = No. of repetition of words in sentence / No. of words in sentence
Inverse Document Frequency = log(No. of sentences/No. of sentences containing the words)

s1 --> good boy
s2 --> good girl
s3 --> boy girl good

            Term Frequency
            s1      s2      s3
    good    1/2     1/2     1/3
    boy     1/2     0/2     1/3
    girl    0/2     1/2     1/3

            Inverse Document Frequency
    Words       IDF
    good        log_e(3/3)
    boy         log_e(3/2)
    girl        log_e(3/2)

                TF-IDF
        good     boy                girl                    0/P
    s1   0        1/2 * log_e(3/2)    0                      
    s2   0        0                   1/2 * log_e(3/2)
    s3   0        1/3 * log_e(3/2)    1/3 * log_e(3/2)

    Advantages
    - Intuitive. 
    - Fixed Size -> Vocab size. 
    - Word Importance is getting Captured. 

    Dis-Advantages
    - sparsity still exist. 
    - OOV.

4. Word2Vec. or What is Word Embedding.
- representation of words.  so, that words are closer in the vector space. 
Eg: Happy  Excited

- Word2Vec: Uses a Neural Network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. As the name implies, word2vec represents each distinct word with a particular list of numbers called a vector.

5. AverageWord2Vec.

'''

'\n--------------------------------------------------------------------------------\n1. One Hot Encoding. \n    Text                    O/P\n    D1 The food is good      1\n    D2 The food is bad       0\n    D3 Pizza is Amazing      1\n\nTotal Vocabulary or Unique words: \nThe food is good bad Pizza Amazing\n1   0    0  0    0   0     0\n0   1    0  0    0   0     0\n0   0    1  0    0   0     0\n0   0    0  1    0   0     0\n0   0    0  0    1   0     0\n0   0    0  0    0   1     0\n0   0    0  0    0   0     1\n\nD1 [[1 0 0 0 0 0 0], [0 1 0 0 0 0 0], [0 0 1 0 0 0 0], [0 0 0 1 0 0 0]] Dim: 4x7\nD2 [[1 0 0 0 0 0 0], [0 1 0 0 0 0 0], [0 0 1 0 0 0 0], [0 0 0 1 0 0 0]] Dim: 4x7\nS3 [[0 0 0 0 0 1 0], [0 0 1 0 0 0 0], [0 0 0 0 0 0 1]] Dim: 3x7\nAnd similar. \n\n# Advantages of One Hot Encoding. \n-> Easy to implement with Python. \n    Like OneHotEncoder, pd.get_dummies\n\n# Disadvantages of One Hot Encoding. \n-> Creates sparse matrix meaning matrix containing lots of zeros which leads t