✅ 7. Computer Vision & NLP in TensorFlow


🔹 A. Computer Vision (CV)

1. Image Classification

========================================================================================================

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import cifar10

# ---------------------- 1. Load and Preprocess Data ----------------------
# Load CIFAR-10 dataset: 60,000 32x32 color images in 10 classes
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values to be in range [0, 1] for faster convergence
x_train, x_test = x_train / 255.0, x_test / 255.0

# ---------------------- 2. Define the CNN Model ----------------------
model = models.Sequential([
    # First convolutional layer: 32 filters of size 3x3, ReLU activation
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),

    # Max pooling: reduces spatial dimensions by 2
    layers.MaxPooling2D((2, 2)),

    # Second convolutional layer: 64 filters of size 3x3
    layers.Conv2D(64, (3, 3), activation='relu'),

    # Flatten: convert 2D feature maps into 1D feature vector
    layers.Flatten(),

    # Dense hidden layer with 64 units and ReLU activation
    layers.Dense(64, activation='relu'),

    # Output layer with 10 units (one for each CIFAR-10 class), softmax for multiclass classification
    layers.Dense(10, activation='softmax')
])

# ---------------------- 3. Compile the Model ----------------------
# Use Adam optimizer, sparse categorical crossentropy loss (for integer labels), and accuracy metric
model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# ---------------------- 4. Train the Model ----------------------
# Train the model on training data for 5 epochs
# Use 10% of the training data for validation
model.fit(
    x_train, y_train,
    epochs=5,
    validation_split=0.1
)


========================================================================================================

2. Image Augmentation


Image augmentation is a technique used in computer vision and deep learning to artificially expand the size and diversity of a training dataset by applying various transformations to the original images — while preserving their labels.


✅ Why use image augmentation?

To prevent overfitting by teaching the model to generalize better.

To simulate real-world variations (e.g., rotations, lighting changes, flips).

To improve robustness and accuracy on unseen data.



In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# ---------------------- 1. Create Image Data Generator ----------------------
# This generator will augment image data by applying random transformations.
datagen = ImageDataGenerator(
    rotation_range=30,         # Randomly rotate images by up to 30 degrees
    horizontal_flip=True,      # Randomly flip images horizontally (useful for symmetrical images)
    zoom_range=0.2             # Randomly zoom in on images by up to 20%
)

# ---------------------- 2. Fit the Generator on Training Data ----------------------
# Compute any required statistics for normalization or transformations (not mandatory here but often used).
datagen.fit(x_train)


========================================================================================================

3. Object Detection


### 📦 3. **Object Detection** — Explained Clearly

**Object Detection** is a computer vision task that involves:
1. **Detecting** the presence of objects in an image.
2. **Classifying** each object (e.g., "person", "car", "dog").
3. **Locating** each object using **bounding boxes**.

---

### 📌 What makes Object Detection different?
| Task              | Output                                       |
|-------------------|----------------------------------------------|
| Image Classification | "Dog" (whole image has a dog)               |
| Object Detection     | "Dog at (x1, y1, x2, y2)", "Cat at (...)"  |
| Semantic Segmentation| Per-pixel classification (no boxes)        |

---

### 🧠 Key Components:
- **Bounding Box**: A rectangle that encloses the detected object.
- **Class Label**: The type of object detected.
- **Confidence Score**: How sure the model is about the detection.

---

### 🔧 Popular Object Detection Models:
| Model     | Type         | Description                        |
|-----------|--------------|------------------------------------|
| YOLO      | One-stage     | Fast and accurate (You Only Look Once) |
| SSD       | One-stage     | Real-time speed, good accuracy     |
| Faster R-CNN | Two-stage | High accuracy, slower              |
| DETR      | Transformer-based | Accurate and simple pipeline      |

---

### 🖼️ Example Output:
You give an image → the model outputs:
```json
[
  {"label": "cat", "bbox": [50, 30, 200, 180], "confidence": 0.92},
  {"label": "dog", "bbox": [220, 80, 400, 300], "confidence": 0.88}
]
```

---

### 💡 Use Cases:
- Self-driving cars (detect pedestrians, vehicles)
- Security surveillance
- Healthcare (detect tumors in scans)
- Retail (automated checkout)

Would you like a simple code example using TensorFlow or YOLOv5 for object detection?

========================================================================================================

4. Image Segmentation (U-Net)


### 🧠 4. **Image Segmentation (U-Net)** — Explained Clearly

**Image Segmentation** is a computer vision task where every pixel in an image is **classified into a category**.

---

### 🔍 What It Does:
- Unlike **object detection**, which draws bounding boxes...
- **Image segmentation** gives **pixel-level precision** — like coloring inside the lines!

---

### 🎯 Types of Image Segmentation:
| Type                     | Description                                   |
|--------------------------|-----------------------------------------------|
| **Semantic Segmentation** | Classifies pixels (e.g., car, road, sky)     |
| **Instance Segmentation** | Distinguishes **instances** (e.g., 2 different dogs) |
| **Panoptic Segmentation** | Combines both semantic + instance             |

---

### 🏗️ U-Net Architecture (Popular for Segmentation)

**U-Net** is a **Convolutional Neural Network (CNN)** designed for biomedical image segmentation. It's shaped like a "U" and has two main parts:

#### 🌀 Encoder (Downsampling):
- Extracts features using **Conv + MaxPool** layers
- Like shrinking the image to understand **what** is in it

#### 🔁 Bottleneck:
- The compressed representation (deepest layer)

#### 🔺 Decoder (Upsampling):
- Uses **ConvTranspose** or upsampling to **rebuild** the image
- Helps decide **where** the object is in the image
- Uses **skip connections** from the encoder to improve precision

```
Input Image --> ↓↓↓ Encoder --> Bottleneck --> ↑↑↑ Decoder --> Segmented Output
```

---

### 🧪 Output Example:
Input: MRI scan  
Output: Mask where the tumor area is white and the rest is black

---

### 💡 Use Cases:
- **Medical imaging** (e.g., tumor segmentation)
- **Satellite imagery** (e.g., segment roads, forests)
- **Autonomous vehicles** (e.g., lane and pedestrian detection)

---

### ⚙️ Code Outline (Keras U-Net)
```python
# Input: image (128x128x3)
# Output: segmentation mask (128x128x1)

inputs = tf.keras.Input((128, 128, 3))
# Downsampling
c1 = Conv2D(16, (3,3), activation='relu', padding='same')(inputs)
p1 = MaxPooling2D((2,2))(c1)
# Bottleneck
b = Conv2D(64, (3,3), activation='relu', padding='same')(p1)
# Upsampling
u1 = UpSampling2D((2,2))(b)
c2 = Conv2D(16, (3,3), activation='relu', padding='same')(u1)
outputs = Conv2D(1, (1,1), activation='sigmoid')(c2)

model = tf.keras.Model(inputs, outputs)
```

Would you like the **full U-Net implementation in TensorFlow/Keras** for a dataset like **Oxford Pets** or **lung segmentation**?

In [None]:
import tensorflow as tf
from tensorflow.keras import layers

def unet_model(input_shape):
    # Input layer for RGB images of size 128x128
    inputs = tf.keras.Input(shape=input_shape)

    # Downsampling path (Encoder)
    x = layers.Conv2D(64, 3, activation='relu', padding='same')(inputs)  # Extract features
    x = layers.MaxPooling2D()(x)  # Reduce spatial dimensions by 2

    # Upsampling path (Decoder)
    x = layers.Conv2DTranspose(64, 3, strides=2, padding='same')(x)  # Restore original size

    # Output layer with 1 filter for binary segmentation, sigmoid for pixel-wise classification
    outputs = layers.Conv2D(1, 1, activation='sigmoid')(x)

    # Create the model
    return tf.keras.Model(inputs, outputs)

# Instantiate the model with input shape (128, 128, 3)
model = unet_model((128, 128, 3))

# Compile the model with binary crossentropy loss for binary segmentation
model.compile(optimizer='adam', loss='binary_crossentropy')


========================================================================================================

========================================================================================================

# NLP

========================================================================================================

1. Tokenization


Option 1: TextVectorization Layer (for pipelines)

The TextVectorization layer is used within TensorFlow models to preprocess text before feeding it into neural networks. It provides a convenient way to tokenize and vectorize text directly in the model's input pipeline.

Key Features:

Vocabulary Creation: It builds a vocabulary of all words or tokens from the text data.

Text-to-Integer Conversion: It converts words into integer indices, allowing neural networks to process the text data.

Handles Padding & Truncation: It can pad or truncate text to ensure that input sequences are of uniform length.

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import TextVectorization

# Sample input texts for text preprocessing
texts = ["TensorFlow is powerful", "NLP is cool"]

# Initialize the TextVectorization layer
# - output_mode='int' means the output will be integers representing the index of words in the vocabulary
# - max_tokens=100 limits the vocabulary size to the top 100 most frequent words in the dataset
vectorizer = TextVectorization(output_mode='int', max_tokens=100)

# Adapt the vectorizer on the input texts. This builds the vocabulary based on the texts.
# The `adapt()` method updates the internal vocabulary of the TextVectorization layer based on the input texts.
vectorizer.adapt(texts)

# Applying the vectorizer to the input texts and printing the result
# This will convert each word in the texts to its index in the vocabulary.
print(vectorizer(texts))


========================================================================================================

2. Embeddings

Embeddings are a way to represent words, sentences, or other text-based data as dense, continuous vectors in a lower-dimensional space. Unlike traditional one-hot encoding, which results in sparse vectors, embeddings allow for a more compact and meaningful representation of words that captures semantic relationships between them.

For example, words with similar meanings are often mapped to similar vectors in the embedding space (e.g., "king" and "queen" might be closer in vector space than "king" and "apple"). Embeddings are commonly used in natural language processing (NLP) tasks like sentiment analysis, machine translation, and text classification.

Types of Embeddings:

Word Embeddings: Represent individual words. Examples include Word2Vec, GloVe, FastText, etc.

Sentence Embeddings: Represent entire sentences or phrases. Examples include BERT, GPT, or Universal Sentence Encoder.

Pre-trained Embeddings: You can use embeddings that have been pre-trained on a large corpus of data and fine-tune them on your specific task.

Trainable Embeddings: You can also train embeddings from scratch, especially for tasks where domain-specific embeddings are needed.

In [None]:
model = tf.keras.Sequential([
    # Embedding layer: Converts input integers into dense vectors
    tf.keras.layers.Embedding(input_dim=10000, output_dim=16, input_length=100),
    
    # Global Average Pooling: Averages the sequence of word vectors into a single vector
    tf.keras.layers.GlobalAveragePooling1D(),
    
    # Fully connected layer (Dense): Outputs a binary prediction (0 or 1)
    tf.keras.layers.Dense(1, activation='sigmoid')
])


========================================================================================================

3. LSTM (Long Short-Term Memory)

LSTM is a type of Recurrent Neural Network (RNN) designed to capture long-term dependencies in sequential data. Unlike traditional RNNs, LSTMs are specifically designed to address the vanishing gradient problem, which allows them to learn from long sequences of data more effectively.

Key Concepts:

Memory Cells: LSTM units have memory cells that allow them to remember information for long periods.

Gates: LSTMs use three gates (input, output, and forget) to regulate the flow of information and learn which data to keep and which to discard.

Structure of LSTM:

Forget Gate: Decides what information from the cell state should be discarded.

Input Gate: Determines what new information should be added to the cell state.

Output Gate: Decides what part of the cell state should be outputted

In [None]:
import tensorflow as tf

# Define the Sequential model
model = tf.keras.Sequential([

    # Embedding layer: converts integer sequences into dense vector representations (word embeddings)
    tf.keras.layers.Embedding(10000, 64),  # 10000 is the size of the vocabulary, 64 is the dimension of the embedding vectors

    # LSTM (Long Short-Term Memory) layer: processes the sequential data, capturing long-term dependencies in the input sequence
    tf.keras.layers.LSTM(64),  # 64 units in the LSTM layer, outputting a 64-dimensional vector

    # Dense layer: final output layer for binary classification
    tf.keras.layers.Dense(1, activation='sigmoid')  # 1 unit (output), sigmoid activation for binary classification (output between 0 and 1)
])

# Compile the model with the Adam optimizer and binary crossentropy loss function for binary classification
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Example data for training (replace with actual data)
# x_train = [...]  # Input text data (integer-encoded sequences)
# y_train = [...]  # Labels (0 or 1 for binary classification)

# Train the model with the training data (adjust epochs and batch_size as needed)
# model.fit(x_train, y_train, epochs=5, batch_size=32)


========================================================================================================

4. Using BERT via TensorFlow Hub


To use BERT (Bidirectional Encoder Representations from Transformers) for Natural Language Processing tasks via TensorFlow Hub, we can load a pre-trained BERT model and fine-tune it for a downstream task like text classification.

Here's an example of how you can use BERT with TensorFlow Hub:

In [None]:
import tensorflow_hub as hub
import tensorflow as tf

# 1. Load BERT Preprocessing and Encoder Models from TensorFlow Hub
# The preprocessing model will handle the tokenization and conversion of input text into BERT-compatible format.
bert_preprocess = hub.load("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")

# The encoder model is the actual BERT model that generates contextual embeddings from the preprocessed input.
bert_encoder = hub.load("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")

# 2. Define Input Layer for the Model
# The model expects input as a string (text). Here, we define the input layer for the model.
text_input = tf.keras.layers.Input(shape=(), dtype=tf.string)

# 3. Preprocess the Input Text
# The BERT preprocessing model converts the input text into the appropriate tokenized format 
# that BERT expects (i.e., token ids, segment ids, etc.).
preprocessed = bert_preprocess(text_input)

# 4. Pass the Preprocessed Text through the BERT Encoder
# The BERT encoder generates contextualized embeddings for the input text.
# The 'pooled_output' is typically used for classification tasks, representing a fixed-length vector 
# derived from the entire sequence of tokens.
outputs = bert_encoder(preprocessed)['pooled_output']

# 5. Build the Model
# Create the final model by specifying the input and output layers. The output of the model 
# will be the BERT 'pooled_output', which can be used for downstream tasks (e.g., classification).
model = tf.keras.Model(text_input, outputs)

# Model summary (optional) to check the structure
model.summary()


========================================================================================================

| Task | Key APIs | Use Case |
|------|----------|----------|
| Image Classification | `Conv2D`, `MaxPooling2D` | Object recognition |
| Image Augmentation | `ImageDataGenerator`, `tf.image` | Overfitting prevention |
| Segmentation | Custom U-Net | Medical, satellite |
| Tokenization | `TextVectorization`, `Tokenizer` | NLP preprocessing |
| Embedding | `Embedding` layer, TF Hub | Word representation |
| Sequence Models | `LSTM`, `TransformerBlock` | Language modeling |
| Pretrained NLP | `TF Hub BERT` | QA, classification |