# Experiment: Train Tensorflow DistilBERT Model Experiment

## Confirm Environment

In [1]:
!conda info


     active environment : base
    active env location : /shared/EL9/explorer/anaconda3/2024.06
            shell level : 1
       user config file : /home/neiderer.c/.condarc
 populated config files : 
          conda version : 24.5.0
    conda-build version : 24.5.1
         python version : 3.12.4.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=broadwell
                          __conda=24.5.0=0
                          __cuda=12.3=0
                          __glibc=2.34=0
                          __linux=5.14.0=0
                          __unix=0=0
       base environment : /shared/EL9/explorer/anaconda3/2024.06  (read only)
      conda av data dir : /shared/EL9/explorer/anaconda3/2024.06/etc/conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64

## Setup and Imports

In [2]:
from emolex.preprocessing import load_mental_health_sentiment_dataset, clean_text, encode_sentiment_labels, split_data, dl_text_vectorization
from emolex.train import train_distilbert_model_tf
from emolex.evaluation import plot_training_history, generate_confusion_matrix, generate_classification_report
from emolex.utils import detect_and_set_device

2025-07-03 00:23:41.996349: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1751516622.016768 4145173 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1751516622.023146 4145173 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1751516622.039645 4145173 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751516622.039659 4145173 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1751516622.039661 4145173 computation_placer.cc:177] computation placer alr

## Device Setup

In [3]:
# Detect and set up GPU or use CPU
device_used = detect_and_set_device()
print(f"TensorFlow is configured to use: {device_used}")

GPU is available. Attempting to use GPU.
Successfully configured GPU: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
TensorFlow is configured to use: GPU


## Load Data

In [4]:
df = load_mental_health_sentiment_dataset()
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51093 entries, 0 to 51092
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   text    51093 non-null  object
 1   label   51093 non-null  object
dtypes: object(2)
memory usage: 798.5+ KB


Unnamed: 0,text,label
0,oh my gosh,Anxiety
1,"trouble sleeping, confused mind, restless hear...",Anxiety
2,"All wrong, back off dear, forward doubt. Stay ...",Anxiety
3,I've shifted my focus to something else but I'...,Anxiety
4,"I'm restless and restless, it's been a month n...",Anxiety


## Clean Data

In [5]:
print(f"\n--- Cleaning Text ---")
df['clean_text'] = df["text"].apply(clean_text)
print("Text cleaning complete. Sample cleaned text:")
print("\n", df[["text", "clean_text"]].sample(5))


--- Cleaning Text ---
Text cleaning complete. Sample cleaned text:

                                                     text  \
35113  Fear of Neurological disease because I suck at...   
11113  Sorry about the title, just wanted to grab att...   
47438  Genuinely, why live? I don’t enjoy being alive...   
37712  i feel the need of depending on people for me ...   
6400   Girls who are often hurt will be surprised if ...   

                                              clean_text  
35113  fear neurological disease suck balancing one l...  
11113  sorry title wanted grab attention slowly recov...  
47438  genuinely live dont enjoy alive dont understan...  
37712  feel need depending people feel better comfort...  
6400   girl often hurt surprised meet good patient ca...  


## Encode Labels

In [6]:
print(f"\n--- Encoding Labels ---")
df, encoder = encode_sentiment_labels(df)
print("Label encoding complete. Sample encoded labels:")
print("\n", df[['label', 'label_encoded']].sample(5))


--- Encoding Labels ---
Label Encoding Map: {'Anxiety': 0, 'Bipolar': 1, 'Depression': 2, 'Normal': 3, 'Personality disorder': 4, 'Stress': 5, 'Suicidal': 6}
Label encoding complete. Sample encoded labels:

             label  label_encoded
20358  Depression              2
5912       Normal              3
16267  Depression              2
39434  Depression              2
32777      Normal              3


## Train-Test Split

In [7]:
print("\n--- Perform Train-Test Split ---")
X_train_raw, X_test_raw, y_train, y_test = split_data(df) 
print(f"Train set size: {len(X_train_raw)} samples")
print(f"Test set size: {len(X_test_raw)} samples")


--- Perform Train-Test Split ---
Train set size: 40874 samples
Test set size: 10219 samples


## Train Model

In [9]:
model, history = train_distilbert_model_tf(X_train_raw, y_train, X_test_raw, y_test, len(encoder.classes_), epochs=3)

Initializing DistilBERT tokenizer...
Creating Hugging Face Datasets from input data...




Tokenizing datasets...


Map:   0%|          | 0/40874 [00:00<?, ? examples/s]

Map:   0%|          | 0/10219 [00:00<?, ? examples/s]

Ensuring labels are native integers...


Map:   0%|          | 0/40874 [00:00<?, ? examples/s]

Map:   0%|          | 0/10219 [00:00<?, ? examples/s]

Setting dataset format to TensorFlow...
Creating data collator...
Converting Hugging Face Datasets to TensorFlow Datasets...


Old behaviour: columns=['a'], labels=['labels'] -> (tf.Tensor, tf.Tensor)  
             : columns='a', labels='labels' -> (tf.Tensor, tf.Tensor)  
New behaviour: columns=['a'],labels=['labels'] -> ({'a': tf.Tensor}, {'labels': tf.Tensor})  
             : columns='a', labels='labels' -> (tf.Tensor, tf.Tensor) 
I0000 00:00:1751516673.403216 4145173 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 11437 MB memory:  -> device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:03:00.0, compute capability: 6.0


Loading TFDistilBertForSequenceClassification model with 7 labels...


Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFDistilBertForSequenceClassification: ['vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_layer_norm.weight', 'vocab_projector.bias']
- This IS expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFDistilBertForSequenceClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
Some weights or buffers of the TF 2.0 model TFDistilBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['pre_classifier.weight', 'pre_classifier.bias', 'classifier.weight', 'classifier.bias']
You should 

Compiling model...
Starting DistilBERT model training for 3 epochs...
Epoch 1/3


I0000 00:00:1751516688.461976 4145261 service.cc:152] XLA service 0x150744644580 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
I0000 00:00:1751516688.462004 4145261 service.cc:160]   StreamExecutor device (0): Tesla P100-PCIE-12GB, Compute Capability 6.0
2025-07-03 00:24:48.469645: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
E0000 00:00:1751516688.489562 4145261 cuda_dnn.cc:522] Loaded runtime CuDNN library: 9.1.0 but source was compiled with: 9.3.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
E0000 00:00:1751516688.492653 4145261 cuda_dnn.cc:522] Loaded runtime CuDNN library: 9.1.0 but source was comp

FailedPreconditionError: Graph execution error:

Detected at node Adam/StatefulPartitionedCall_103 defined at (most recent call last):
  File "/home/neiderer.c/.conda/envs/tensorflow-gpu-neiderer/lib/python3.9/runpy.py", line 197, in _run_module_as_main

  File "/home/neiderer.c/.conda/envs/tensorflow-gpu-neiderer/lib/python3.9/runpy.py", line 87, in _run_code

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/ipykernel_launcher.py", line 18, in <module>

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/traitlets/config/application.py", line 1075, in launch_instance

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/ipykernel/kernelapp.py", line 739, in start

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tornado/platform/asyncio.py", line 211, in start

  File "/home/neiderer.c/.conda/envs/tensorflow-gpu-neiderer/lib/python3.9/asyncio/base_events.py", line 601, in run_forever

  File "/home/neiderer.c/.conda/envs/tensorflow-gpu-neiderer/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once

  File "/home/neiderer.c/.conda/envs/tensorflow-gpu-neiderer/lib/python3.9/asyncio/events.py", line 80, in _run

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 545, in dispatch_queue

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 534, in process_one

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 437, in dispatch_shell

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 362, in execute_request

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/ipykernel/kernelbase.py", line 778, in execute_request

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/ipykernel/ipkernel.py", line 449, in do_execute

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/ipykernel/zmqshell.py", line 549, in run_cell

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3009, in run_cell

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3064, in _run_cell

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3269, in run_cell_async

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3448, in run_ast_nodes

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/IPython/core/interactiveshell.py", line 3508, in run_code

  File "/tmp/ipykernel_4145173/4226381187.py", line 1, in <module>

  File "/courses/IE7500.202550/students/neiderer.c/project/NLP_Project/src/emolex/train.py", line 309, in train_distilbert_model_tf

  File "/home/neiderer.c/.conda/envs/tensorflow-gpu-neiderer/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 1161, in fit

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py", line 65, in error_handler

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/engine/training.py", line 1804, in fit

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/engine/training.py", line 1398, in train_function

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/engine/training.py", line 1381, in step_function

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/engine/training.py", line 1370, in run_step

  File "/home/neiderer.c/.conda/envs/tensorflow-gpu-neiderer/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 1641, in train_step

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/optimizers/optimizer.py", line 623, in minimize

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/optimizers/optimizer.py", line 1309, in apply_gradients

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/optimizers/optimizer.py", line 731, in apply_gradients

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/optimizers/optimizer.py", line 1339, in _internal_apply_gradients

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/optimizers/optimizer.py", line 1431, in _distributed_apply_gradients_fn

  File "/home/neiderer.c/.local/lib/python3.9/site-packages/tf_keras/src/optimizers/optimizer.py", line 1426, in apply_grad_to_update_var

DNN library initialization failed. Look at the errors above for more details.
	 [[{{node Adam/StatefulPartitionedCall_103}}]] [Op:__inference_train_function_16403]

## Evaluate Model

In [None]:
print("\n--- Plot Training History ---")
plot_training_history(history)

In [None]:
print("\n--- Predict Test Classes ---")
y_pred = model.predict(X_test_pad_filtered)
y_pred_classes = y_pred.argmax(axis=1)

In [None]:
print("\n--- Generate Confusion Matrix ---")
fig, ax = generate_confusion_matrix(y_test_filtered, y_pred_classes, class_labels=encoder.classes_)

In [None]:
print("\n--- Generate Classification Report ---")
generate_classification_report(y_test_filtered, y_pred_classes, class_labels=encoder.classes_)