Word CNN, short for Word Convolutional Neural Network, is a type of convolutional neural network (CNN) architecture specifically designed for natural language processing (NLP) tasks that operate on a word level. Unlike traditional CNNs used for image classification, which operate on pixel values, Word CNNs process text data by treating words as input features.

Here's how a typical Word CNN works:

Word Embedding: The input text is first converted into dense vector representations using techniques like word embeddings (e.g., Word2Vec, GloVe). Each word is represented by a fixed-length vector, capturing its semantic meaning.

Convolutional Layers: Similar to image CNNs, Word CNNs use convolutional layers to extract local patterns or features from the word embeddings. Convolutional filters with small receptive fields slide over the input word embeddings to capture patterns such as word sequences or combinations of words.

Pooling Layers: After convolution, pooling layers are applied to reduce the dimensionality of the feature maps and capture the most relevant information. Common pooling operations include max pooling, which takes the maximum value from each feature map, and average pooling, which computes the average value.

Fully Connected Layers: The output of the pooling layers is flattened and passed through one or more fully connected layers, followed by activation functions like ReLU (Rectified Linear Unit), to learn complex relationships between the extracted features.

Output Layer: The final fully connected layer is connected to the output layer, which generates predictions for the task at hand, such as sentiment analysis, text classification, or named entity recognition.

<b>Word CNNs are particularly effective for tasks where <span style="color:red">local word order matters</b></span>, such as text classification and sentiment analysis. They can capture hierarchical patterns and dependencies in text data, making them suitable for a wide range of NLP applications.

Example:
A Word CNN can be used for sentiment analysis, where the task is to classify the sentiment of a given text (e.g., positive, negative, or neutral). The Word CNN would take the word embeddings of the input text, apply convolutional and pooling layers to extract features, and then use fully connected layers to predict the sentiment category.





In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split

# Load the IMDb dataset
vocab_size = 20000  # Vocabulary size
max_len = 100  # Maximum sequence length
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=vocab_size)

# Pad sequences to ensure uniform length
x_train = pad_sequences(x_train, maxlen=max_len)
x_test = pad_sequences(x_test, maxlen=max_len)

# Convert labels to one-hot encoding
y_train = to_categorical(y_train, num_classes=2)
y_test = to_categorical(y_test, num_classes=2)

# Define the Word CNN model
model = models.Sequential([
    layers.Embedding(vocab_size, 300, input_length=max_len),  # Embedding layer with word embeddings (300-dimensional)
    layers.Conv1D(128, 5, activation='relu'),  # Convolutional layer with 128 filters and kernel size 5
    layers.GlobalMaxPooling1D(),  # Global max pooling layer
    layers.Dense(64, activation='relu'),  # Dense layer with 64 units
    layers.Dense(2, activation='softmax')  # Output layer with softmax activation for binary classification
])

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train-test split for validation
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=42)

# Train the model
model.fit(x_train, y_train, epochs=5, batch_size=128, validation_data=(x_val, y_val))



Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


Epoch 1/5


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x22edccd6260>

In [5]:
model.predict([x_test[0].reshape(1, -1)])




array([[0.9163171 , 0.08368296]], dtype=float32)

Thus it confirms that CNN also can be configured to use for text classification