# 🧠 CNN for Multi-label Depression Symptom Classification (ReDSM5)

This notebook shows how to train a simple Convolutional Neural Network (CNN) for **multi-label text classification** on the `ReDSM5` dataset, a Reddit corpus annotated with sentence-level DSM-5 depression symptoms.

The workflow includes:
- Loading and preprocessing the data,
- Tokenizing the text and converting it to padded sequences,
- Building and training a 1D CNN model using Keras,
- Evaluating performance with precision, recall, F1-score, and accuracy metrics.

Although CNNs are simpler than transformers like BERT, they still serve as useful baselines for capturing local patterns in text (e.g., n-gram-like features). This notebook provides a fully reproducible pipeline for benchmarking and educational purposes.

> 🧪 This is one of the baseline models reported in the ReDSM5 paper. Use it to replicate results or explore light-weight alternatives to transformers.

## 📦 Importing Required Libraries

We start by importing all the necessary libraries for:
- **Data handling**: `pandas`, `sklearn`,
- **Text preprocessing**: Keras's `Tokenizer` and `pad_sequences`,
- **Model building**: `Sequential`, `Embedding`, `Conv1D`, and pooling/dense layers from Keras,
- **Metrics and preprocessing**: for multi-label classification.

This setup allows us to implement and train a CNN model for detecting multiple DSM-5 depression symptoms from Reddit posts.


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
from sklearn.preprocessing import MultiLabelBinarizer

from keras.models import Sequential
from keras.layers import Dense, Embedding, Flatten, Conv1D, GlobalMaxPooling1D
from tensorflow.keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

## 📊 Loading and Preprocessing the Dataset

We load the `ReDSM5` dataset from a CSV file. Each record contains:
- a Reddit post (`text`),
- a list of annotated DSM-5 symptom labels (`labels`),
- and a clinical explanation (not used here).

We convert the semicolon-separated label string into Python lists and then apply `MultiLabelBinarizer` to convert those lists into binary vectors suitable for training. This format supports multi-label classification, where each post may have one or more active labels.


In [9]:
# Load dataset
data = pd.read_csv("data/redsm5.csv")
data["labels"] = data["labels"].apply(lambda x: x.split(";"))  # Convert labels to list

# MultiLabel Binarization
mlb = MultiLabelBinarizer()
labels = mlb.fit_transform(data["labels"])
texts = data["text"].tolist()

## ✂️ Splitting Data into Train and Test Sets

We divide the dataset into training (80%) and testing (20%) subsets using `train_test_split`. This ensures that model evaluation is done on unseen data, providing a realistic sense of performance.


In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    texts, labels, test_size=0.2, random_state=42
)

## 🔡 Text Tokenization and Padding

To prepare the text for input into a neural network:
1. We tokenize each post using Keras’s `Tokenizer`, which assigns an integer to every unique word.
2. The posts are converted into sequences of word indices.
3. We pad these sequences to a fixed maximum length (`512` tokens), ensuring that all input examples are the same size and compatible with batch training.

Padding is done post-truncation to maintain temporal structure near the end of each post.


In [None]:
tokenizer = Tokenizer()
tokenizer.fit_on_texts(X_train)

X_train_sequences = tokenizer.texts_to_sequences(X_train)
X_test_sequences = tokenizer.texts_to_sequences(X_test)

maxlen = 512
X_train_padded = pad_sequences(X_train_sequences, maxlen=maxlen)
X_test_padded = pad_sequences(X_test_sequences, maxlen=maxlen)

## 🧱 Building the CNN Model

We define a lightweight convolutional neural network (CNN) architecture suitable for multi-label text classification:
- **Embedding layer** maps word indices to dense vectors.
- **Conv1D layer** captures local n-gram patterns using sliding filters.
- **Global Max Pooling** reduces the output to the most important features across time.
- **Dense layers** introduce non-linear transformations.
- **Final layer with sigmoid** produces one probability per symptom class, enabling multi-label outputs.

This CNN structure provides a strong and efficient baseline for text classification tasks without transformers.


In [None]:
model = Sequential()
model.add(Embedding(input_dim=len(tokenizer.word_index) + 1, output_dim=64))
model.add(Conv1D(filters=128, kernel_size=5, activation="relu"))
model.add(GlobalMaxPooling1D())
model.add(Dense(64, activation="relu"))
model.add(Dense(len(mlb.classes_), activation="sigmoid"))

## 🧪 Compiling and Training the CNN Model

We compile the CNN model using:
- `binary_crossentropy` loss (standard for multi-label classification),
- `adam` optimizer for efficient training,
- and accuracy as the monitored metric (though limited in multi-label settings).

We train the model for 100 epochs using a validation split of 20%. Keras automatically shuffles the training data and tracks both training and validation metrics across epochs. This setup allows us to monitor learning progress and detect overfitting.


In [None]:
model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

model.fit(X_train_padded, y_train, epochs=100, batch_size=32, validation_split=0.2)

Epoch 1/100
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 58ms/step - accuracy: 0.1867 - loss: 0.6202 - val_accuracy: 0.2647 - val_loss: 0.3458
Epoch 2/100
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.2575 - loss: 0.3434 - val_accuracy: 0.1891 - val_loss: 0.3188
Epoch 3/100
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.2608 - loss: 0.3244 - val_accuracy: 0.2647 - val_loss: 0.3138
Epoch 4/100
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 10ms/step - accuracy: 0.2796 - loss: 0.3174 - val_accuracy: 0.2605 - val_loss: 0.3132
Epoch 5/100
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 9ms/step - accuracy: 0.3063 - loss: 0.3150 - val_accuracy: 0.2647 - val_loss: 0.3141
Epoch 6/100
[1m30/30[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step - accuracy: 0.3067 - loss: 0.3054 - val_accuracy: 0.2605 - val_loss: 0.3124
Epoch 7/100
[1m30/30[0m [32

<keras.src.callbacks.history.History at 0x7e7d9e786250>

## 🔍 Predicting on the Test Set

After training, we use the trained CNN to predict the labels for the test set. Each prediction is a probability for each class, which we threshold at `0.5` to produce binary outputs.

This binary matrix is used to assess how well the model identifies the presence or absence of each depression symptom in previously unseen data.


In [None]:
predictions = (model.predict(X_test_padded) > 0.5).astype(int)

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step


## 📊 Classification Metrics

We evaluate the model using the `classification_report` and `accuracy_score` from `scikit-learn`. The metrics include:
- **Precision, Recall, and F1-score** per symptom class,
- **Micro, Macro, and Weighted averages**,
- **Samples average**: how well predictions match across all labels per post.

These metrics provide insight into model performance, especially on imbalanced symptom classes. The low scores for many classes suggest this CNN baseline struggles with rare symptoms, an expected outcome that supports the value of transformer-based models.


In [None]:
print(
    "Classification Report:\n",
    classification_report(
        y_test, predictions, target_names=mlb.classes_, zero_division=0
    ),
)
print(f"Accuracy: {accuracy_score(y_test, predictions)}")

Classification Report:
                    precision    recall  f1-score   support

        ANHEDONIA       1.00      0.04      0.08        25
  APPETITE_CHANGE       0.00      0.00      0.00        10
 COGNITIVE_ISSUES       0.00      0.00      0.00        10
   DEPRESSED_MOOD       0.67      0.11      0.20        70
          FATIGUE       0.67      0.14      0.24        28
      NO_SYMPTOMS       0.56      0.07      0.12        73
      PSYCHOMOTOR       0.00      0.00      0.00         8
     SLEEP_ISSUES       0.70      0.39      0.50        18
SUICIDAL_THOUGHTS       0.65      0.39      0.49        28
    WORTHLESSNESS       0.79      0.21      0.33        72

        micro avg       0.69      0.15      0.25       342
        macro avg       0.50      0.14      0.19       342
     weighted avg       0.64      0.15      0.23       342
      samples avg       0.16      0.16      0.16       342

Accuracy: 0.1414141414141414
