In [48]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

## **Embdedding layer**
- tokenize
- encode
- padding
- embedding
- weights

In [49]:
import numpy as np


from tensorflow.keras.preprocessing.text import one_hot 
from tensorflow.keras.preprocessing.sequence import pad_sequences # Corrected typo: pad_sequence -> pad_sequences

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Embedding

In [50]:
import warnings
warnings.filterwarnings('ignore')

In [51]:
reviews = ['nice food',
        'amazing restaurant',
        'too good',
        'just loved it!',
        'will go again',
        'horrible food',
        'never go there',
        'poor service',
        'poor quality',
        'needs improvement']

sentiment = np.array([1,1,1,1,1,0,0,0,0,0])

In [52]:
# gives unique number to the words to embed 
one_hot('ncie food', 30)

[2, 29]

In [53]:
vocab_size = 30
encoded_reviews = [one_hot(d, vocab_size) for d in reviews]
print(encoded_reviews)

[[18, 29], [26, 20], [14, 8], [26, 26, 24], [29, 6, 6], [2, 29], [13, 6, 7], [20, 23], [20, 4], [1, 20]]


In [54]:
# TO have equal length of all the reviews used here

max_length = 4
padded_reviews = pad_sequences(encoded_reviews, maxlen=max_length, padding='post')
print(padded_reviews)

[[18 29  0  0]
 [26 20  0  0]
 [14  8  0  0]
 [26 26 24  0]
 [29  6  6  0]
 [ 2 29  0  0]
 [13  6  7  0]
 [20 23  0  0]
 [20  4  0  0]
 [ 1 20  0  0]]


In [55]:
# Now do the embedding to the reviews

embeded_vector_size = 5

model = Sequential()
model.add(Embedding(vocab_size, embeded_vector_size, input_length=max_length,name="embedding"))
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))

In [56]:
X = padded_reviews
y = sentiment

In [67]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())

None


In [58]:
model.fit(X, y, epochs=50, verbose=0)

<keras.src.callbacks.history.History at 0x7ad32c3d3850>

In [59]:
# evaluate the model
loss, accuracy = model.evaluate(X, y)
accuracy

[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 245ms/step - accuracy: 1.0000 - loss: 0.6248


1.0

In [63]:
weights = model.get_layer('embedding').get_weights()[0]
len(weights)

# Same as vocab size

30

In [66]:
weights[18] # for review "nice"

array([ 0.04153833, -0.07388881, -0.09074349,  0.08594728,  0.00381914],
      dtype=float32)

## 1. Problem Setup: Text Classification

Goal:
Train a simple neural network to classify short text reviews as **positive (1)** or **negative (0)**.

Example inputs:

```
“nice food” → positive  
“poor service” → negative
```

The model learns to associate patterns of words (or combinations) with sentiment.



## 2. Text to Numeric Conversion — One-Hot Encoding

Before a model can process text, it must be converted into numbers.

```python
from tensorflow.keras.preprocessing.text import one_hot
vocab_size = 30
encoded_reviews = [one_hot(d, vocab_size) for d in reviews]
```

* **One-hot encoding here doesn’t mean a long binary vector**.
  Keras’s `one_hot()` function maps each word to a **unique integer index** between `1` and `vocab_size`.

Example:

```
“amazing restaurant” → [4, 23]
```

So the vocabulary of all possible words is treated as numbers from 1 to 30.

**Conceptually:**

* Each word is just a placeholder ID.
* The model has no semantic understanding yet — it will learn that later through embeddings.



## 3. Sequence Padding

Neural networks require all input sequences to have **equal length**.

```python
padded_reviews = pad_sequences(encoded_reviews, maxlen=4, padding='post')
```

* Pads shorter reviews with zeros (`0` represents “no word”).
* Ensures each input sequence has length = 4.

Example:

```
[13, 21] → [13, 21, 0, 0]
[8, 15, 16] → [8, 15, 16, 0]
```

**Conceptually:**
Padding allows batch processing and fixed-size input layers.


## 4. Embedding Layer — Core Concept

```python
model.add(Embedding(vocab_size, embeded_vector_size, input_length=max_length))
```

This is the most important conceptual part.

### What it does:

* Takes integer word IDs (like `[13, 21, 0, 0]`)
* Converts each into a **dense vector** of length 5 (the `embeded_vector_size`)
* Learns these vectors during training

So, the output shape becomes `(batch_size, max_length, embed_dim)`
→ in this case, `(None, 4, 5)`

### Conceptually:

* Each word gets represented as a **learned feature vector** (embedding).
* The embedding captures meaning — similar words end up with similar vectors.
* This is the **same concept as Word2Vec**, but learned **within the task** instead of pre-trained.


## 5. Flatten and Dense Layers

```python
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
```

* **Flatten:** Converts the 2D embedding matrix (4 × 5 = 20 values) into a single vector `[20]`.
* **Dense layer:** Applies a sigmoid classifier to produce an output between 0 and 1.

### Conceptually:

* The flattened embeddings represent the sentence as a feature vector.
* The dense neuron learns to map that representation → sentiment label.



## 6. Compilation and Training

```python
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X, y, epochs=50, verbose=0)
```

* **Loss:** Binary cross-entropy (since it’s a 2-class problem)
* **Optimizer:** Adam — adjusts weights to minimize loss
* **Training:** Each epoch updates both:

  * The embedding weights (word representations)
  * The classifier weights



## 7. Model Evaluation

```python
loss, accuracy = model.evaluate(X, y)
```

Accuracy = 1.0 means it perfectly classified the 10 sample reviews (tiny dataset).

**Conceptually:**
The model learned meaningful relationships between words and sentiment from just a few examples — but this accuracy is not generalizable (overfitting on small data).



## 8. Inspecting the Learned Word Embeddings

```python
weights = model.get_layer('embedding').get_weights()[0]
```

* `weights` is a matrix of shape `(vocab_size, embed_dim)` → `(30, 5)`
* Each row is the **embedding vector** for one word index.

Example:

```
weights[13] → embedding for the word with ID 13 (“nice”)
weights[4]  → embedding for ID 4 (“amazing”)
```

### Conceptually:

These vectors encode **learned meaning**.
Words used in similar contexts will have **similar embedding vectors**.
That’s how deep learning captures semantic similarity.



## 9. What the Model Really Learned

* The **Embedding layer** learned to map words into a 5-dimensional “meaning space”.
* The **Dense layer** learned to classify the sentence-level representation into 0 or 1.
* This is a **task-specific embedding** — the meaning is aligned with “positive” or “negative” sentiment.

---

## 10. Concept Summary

| Step              | Concept                       | Role                                |
| ----------------- | ----------------------------- | ----------------------------------- |
| One-hot           | Integer IDs for words         | Converts text → numeric tokens      |
| Padding           | Equalizes sequence lengths    | Enables batch training              |
| Embedding         | Dense, trainable word vectors | Learns semantic meaning             |
| Flatten           | Merges all embeddings         | Forms a sentence vector             |
| Dense + Sigmoid   | Binary classifier             | Predicts sentiment                  |
| Embedding weights | Learned word meaning          | Shows task-based word relationships |
