# Fine-tuning BERT models on Question Type

A version of this notebook was used to fine-tune the BERT models on the training set we make available. The fine-tuned models can be found in the GitHub repository. This notebook is based on the informative tutorial by Chris McCormick and Nick Ryan. (2019, July 22). BERT Fine-Tuning Tutorial with PyTorch. Retrieved from http://www.mccormickml.com

In [1]:
# import libraries
import tensorflow as tf
import torch
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
from keras.preprocessing.sequence import pad_sequences
from sklearn.model_selection import train_test_split
from transformers import BertModel, BertTokenizer, BertForSequenceClassification
from transformers import AdamW

from tqdm import tqdm, trange
import pandas as pd
import io
import random
import numpy as np



Using TensorFlow backend.


In [3]:
# make sure that the same seed is used all over the place for better reproducibility
seed_val = 30
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)
tf.random.set_seed(seed_val)

# Fine-tuning the model

In [4]:
# read the training set
df_train = pd.read_csv("rquet_trainset.csv", delimiter='\t', header=0)
df_train.sample(10)

Unnamed: 0,ID,contextBefore,question,contextAfter,Merged
862,1114435,It triggered headlines to that effect on blogs...,"So what does Media Matters really do, and what...",Joining me now in San Francisco is the founder...,0
1069,1114839,I believe in free press.,Did you feel like you were being used to give...,"No, I didnt.",1
970,1114594,I heard that you got close to Medgar Evers fam...,And I was wondering if that was true and was t...,"Well, the story is true.",0
1393,1115273,So some readers might wonder if this is the be...,"Chrystia, you wanted to get in here?","I was just going to say, I thought actually th...",1
1473,1115388,No.,"But he goes after her and says, -How many peop...",She said that Barack Obama may have anti-Ameri...,0
30,1113127,"Now, you also went on MSNBCs -Hardball- to tal...",What happened there?,"In that circumstance, I had appeared, as I con...",1
585,1114044,And they talked to everybody.,But were we going to go on TV with it?,No.,0
577,1114033,"Lola, Ive got 20 seconds.",Does it embarrass you at all when you say that...,"It doesnt embarrass me at all, no.",1
21,1113115,"Well, right now on the site there are about 12...","Will you do this again for PBS, or does it de...",It depends.,1
1349,1115218,"I mean, that...",Why not move on?,Why not move on?,0


In [None]:
# replace the golden labels with 0,1 to make ti sutiable for BERT
df_train.replace("NISQ", '0', inplace=True)
df_train.replace("ISQ", '1', inplace=True)

In [5]:
# load tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased', do_lower_case=True)

HBox(children=(IntProgress(value=0, description='Downloading', max=231508, style=ProgressStyle(description_wid…




HBox(children=(IntProgress(value=0, description='Downloading', max=28, style=ProgressStyle(description_width='…




HBox(children=(IntProgress(value=0, description='Downloading', max=466062, style=ProgressStyle(description_wid…




In [6]:
# just run a sample tokenization to get the max_len
max_len = 0
sentences_A_train = df_train.question.values
sentences_B_train = df_train.contextBefore.values
sentences_C_train = df_train.contextAfter.values
for i in range(0,len(sentences_A_train), 1):
    #Tokenize the text and add `[CLS]` and `[SEP]` tokens.
    # NOTE: depending on what exactly you would like to fine-tune, adjust the following call:
    # if you want to fine-tune only on the question itself, you only need: sentences_A_train[i]
    # if you want to fine-tune on the question and its context-before, you need: sentences_A_train[i], sentences_B_train[i]
    # if you want to fine-tune on the question and its context-after, you need: sentences_A_train[i], sentences_C_train[i]
    input_ids = tokenizer.encode(sentences_A_train[i], sentences_C_train[i], 
                                 add_special_tokens=True)
    #Update the maximum sentence length.
    max_len = max(max_len, len(input_ids))

print('Max sentence length: ', max_len)

Max sentence length:  147


In [7]:
# Run proper tokenization now
# Tokenize all of the sentences and map the tokens to their word IDs.
input_ids_train = []
attention_masks_train = []
labels_train = df_train.Merged.values

for i in range(0,len(sentences_A_train), 1):
        # NOTE: depending on what exactly you would like to fine-tune, adjust the following call:
    # if you want to fine-tune only on the question itself, you only need: sentences_A_train[i]
    # if you want to fine-tune on the question and its context-before, you need: sentences_A_train[i], sentences_B_train[i]
    # if you want to fine-tune on the question and its context-after, you need: sentences_A_train[i], sentences_C_train[i]
    encoded_dict = tokenizer.encode_plus(
                        sentences_A_train[i], sentences_C_train[i], 
                        add_special_tokens = True,
                        truncation = True,
                        max_length = 128,          
                        pad_to_max_length = True,
                        return_attention_mask = True,
                        return_tensors = 'pt'
                   )
    
    # Add the encoded sentence to the list.    
    input_ids_train.append(encoded_dict['input_ids'])
    
    # And its attention mask.
    attention_masks_train.append(encoded_dict['attention_mask'])

# Convert the lists into tensors.
input_ids_train = torch.cat(input_ids_train, dim=0)
attention_masks_train = torch.cat(attention_masks_train, dim=0)
labels_train = torch.tensor(labels_train)

# Print sentence 0, now as a list of IDs.
print('Original: ', sentences_A_train[0])
print('Token IDs:', input_ids_train[0])



Original:  And why was the mainstream media so late to the party?
Token IDs: tensor([  101,  1998,  2339,  2001,  1996,  7731,  2865,  2061,  2397,  2000,
         1996,  2283,  1029,   102,  5241,  2149,  2085,  1010,  8282, 10533,
         1010,  7009,  1996,  1011,  2980,  6462,  1011,  5930,  2005,  1011,
         1996,  2899,  2335,  1011,  1025,  9617,  5032,  9574,  1010,  2120,
        11370,  2005,  2250,  2637,  2557,  1025,  1998,  3581,  7367,  2015,
         3630,  1010,  2934,  1997,  2865,  1998,  2270,  3821,  2012,  1996,
         2577,  2899,  2118,  1998,  6605,  3559,  1997,  1996,  1011,  4774,
         2830,  1010,  1011,  2029,  2092,  2831,  2055,  2101,  1999,  1996,
         2565,  1012,   102,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
            0,     0,     0,     0,     0,     0,     0,     0,  

In [8]:
from torch.utils.data import TensorDataset, random_split

# Combine the training inputs into a TensorDataset.
dataset = TensorDataset(input_ids_train, attention_masks_train, labels_train)

# Create a 90-10 train-validation split.

# Calculate the number of samples to include in each set.
train_size = int(0.9 * len(dataset))
val_size = len(dataset) - train_size

# Divide the dataset by randomly selecting samples.
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

print('{:>5,} training samples'.format(train_size))
print('{:>5,} validation samples'.format(val_size))

1,429 training samples
  159 validation samples


In [9]:
batch_size = 32

# Create the DataLoaders for our training and validation sets.
# We'll take training samples in random order. 
train_dataloader = DataLoader(
            train_dataset,
            sampler = RandomSampler(train_dataset), 
            batch_size = batch_size
        )

# For validation the order doesn't matter, so we'll just read them sequentially.
validation_dataloader = DataLoader(
            val_dataset,
            sampler = SequentialSampler(val_dataset),
            batch_size = batch_size
        )

In [11]:
# Load BERT
model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased", 
    num_labels = 2,  
    output_attentions = False,
    output_hidden_states = True
)

# uncomment to use GPU if it is available
#model.cuda()

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at

In [12]:
# Function to calculate the accuracy of our predictions vs labels
def flat_accuracy(preds, labels):
    pred_flat = np.argmax(preds, axis=1).flatten()
    labels_flat = labels.flatten()
    return np.sum(pred_flat == labels_flat) / len(labels_flat)

In [13]:
import time
import datetime

def format_time(elapsed):
    '''
    Takes a time in seconds and returns a string hh:mm:ss
    '''
    # Round to the nearest second.
    elapsed_rounded = int(round((elapsed)))
    
    # Format as hh:mm:ss
    return str(datetime.timedelta(seconds=elapsed_rounded))


In [14]:
# function to run the actual training/fine-tuning of the model
def run_train_and_val(epochs):
  training_stats = []

  # Measure the total training time for the whole run.
  total_t0 = time.time()


  for epoch_i in range(0, epochs):
    
    # ========================================
    #               Training
    # ========================================
    
      # Perform one full pass over the training set.

      print("")
      print('\n======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
      print('\nTraining...')

      # Measure how long the training epoch takes.
      t0 = time.time()

      # Reset the total loss for this epoch.
      total_train_loss = 0

      # Put the model into training mode. 
      model.train()

      # For each batch of training data...
      for step, batch in enumerate(train_dataloader):
          # Progress update every 40 batches.
          if step % 40 == 0 and not step == 0:
              # Calculate elapsed time in minutes.
              elapsed = format_time(time.time() - t0)      
              print('\nBatch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))

        # Unpack this training batch from our dataloader. 
        #
        # As we unpack the batch, we'll also copy each tensor to the GPU using the 
        # `to` method.
        #
        # `batch` contains three pytorch tensors:
        #   [0]: input ids 
        #   [1]: attention masks
        #   [2]: labels 
          b_input_ids = batch[0].to(device)
          b_input_mask = batch[1].to(device)
          b_labels = batch[2].to(device)

          # Clear any previously calculated gradients before performing a
          # backward pass.
          model.zero_grad()        

          # Perform a forward pass (evaluate the model on this training batch).
          outputs = model(b_input_ids, 
                             token_type_ids=None, 
                             attention_mask=b_input_mask, 
                             labels=b_labels, output_hidden_states=True, output_attentions=True )
      
          loss = outputs.loss
          logits = outputs.logits
          hstates = outputs.hidden_states
          # Accumulate the training loss over all of the batches so that we can
          # calculate the average loss at the end. 
          total_train_loss += loss.item()

          # Perform a backward pass to calculate the gradients.
          loss.backward()

          # Clip the norm of the gradients to 1.0.
          # This is to help prevent the "exploding gradients" problem.
          torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

          # Update parameters and take a step using the computed gradient.
          # The optimizer dictates the "update rule"--how the parameters are
          # modified based on their gradients, the learning rate, etc.
          optimizer.step()

          # Update the learning rate.
          scheduler.step()

      # Calculate the average loss over all of the batches.
      avg_train_loss = total_train_loss / len(train_dataloader)            
    
      # Measure how long this epoch took.
      training_time = format_time(time.time() - t0)

      print("\n")
      print("\nAverage training loss: {0:.2f}".format(avg_train_loss))
      print("\nTraining epcoh took: {:}".format(training_time))
        
    # ========================================
    #               Validation
    # ========================================
    # After the completion of each training epoch, measure our performance on
    # our validation set.

      print("\n")
      print("\nRunning Validation...")

      t0 = time.time()

      # Put the model in evaluation mode--the dropout layers behave differently
      # during evaluation.
      model.eval()

      # Tracking variables 
      total_eval_accuracy = 0
      total_eval_loss = 0
      nb_eval_steps = 0

      # Evaluate data for one epoch
      for batch in validation_dataloader:
        
        # Unpack this training batch from our dataloader. 
        #
        # As we unpack the batch, we'll also copy each tensor to the GPU using 
        # the `to` method.
        #
        # `batch` contains three pytorch tensors:
        #   [0]: input ids 
        #   [1]: attention masks
        #   [2]: labels 
          b_input_ids = batch[0].to(device)
          b_input_mask = batch[1].to(device)
          b_labels = batch[2].to(device)
        
          # Tell pytorch not to bother with constructing the compute graph during
          # the forward pass, since this is only needed for backprop (training).
          with torch.no_grad():        

              # Forward pass, calculate logit predictions.
              outputs = model(b_input_ids, 
                                   token_type_ids=None, 
                                   attention_mask=b_input_mask,
                                   labels=b_labels, output_hidden_states=True, output_attentions=True )
            
              loss = outputs.loss
              logits = outputs.logits
            
          # Accumulate the validation loss.
          total_eval_loss += loss.item()

          # Move logits and labels to CPU
          logits = logits.detach().cpu().numpy()
          label_ids = b_labels.to('cpu').numpy()

          # Calculate the accuracy for this batch of test sentences, and
          # accumulate it over all batches.
          total_eval_accuracy += flat_accuracy(logits, label_ids)
        

      # Report the final accuracy for this validation run.
      avg_val_accuracy = total_eval_accuracy / len(validation_dataloader)
      print("\nAccuracy: {0:.2f}".format(avg_val_accuracy))

      # Calculate the average loss over all of the batches.
      avg_val_loss = total_eval_loss / len(validation_dataloader)
    
      # Measure how long the validation run took.
      validation_time = format_time(time.time() - t0)
    
      print("\nValidation Loss: {0:.2f}".format(avg_val_loss))
      print("\nValidation took: {:}".format(validation_time))

      # Record all statistics from this epoch.
      training_stats.append(
          {
            'epoch': epoch_i + 1,
            'Training Loss': avg_train_loss,
            'Valid. Loss': avg_val_loss,
            'Valid. Accur.': avg_val_accuracy,
            'Training Time': training_time,
            'Validation Time': validation_time
          }
      )

  print("\n")
  print("\nTraining complete!")
  print("\nTotal training took {:} (h:mm:ss)".format(format_time(time.time()-total_t0)))
  print("\n")
  print("\nTraining complete!")
  print("\nTotal training took {:} (h:mm:ss)".format(format_time(time.time()-total_t0)))



In [15]:
from transformers import get_linear_schedule_with_warmup

optimizer = AdamW(model.parameters(), lr = 2e-5, eps = 1e-8)
epochs = 2
total_steps = len(train_dataloader) * epochs
# Create the learning rate scheduler.
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps = 0, num_training_steps = total_steps)
run_train_and_val(epochs)





Training...


KeyboardInterrupt: 

# Evaluating the model

In [None]:
# Run model ON TEST set
# Load the dataset into a pandas dataframe.
df_test = pd.read_csv("rquet_testset.csv", delimiter='\t', header=0)

# Report the number of sentences.
print('Number of test sentences: {:,}\n'.format(df_test.shape[0]))

# Create sentence and label lists
sentences_A_test = df_test.question.values
sentences_B_test = df_test.contextBefore.values
sentences_C_test = df_test.contextAfter.values
labels_test = df_test.Merged.values

# Tokenize all of the sentences and map the tokens to thier word IDs.
input_ids_test = []
attention_masks_test = []

for i in range(0,len(sentences_A_test)):
    # NOTE: depending on what exactly you would like to evaluate, adjust the following call:
    # if you want to fine-tune only on the question itself, you only need: sentences_A_train[i]
    # if you want to fine-tune on the question and its context-before, you need: sentences_A_train[i], sentences_B_train[i]
    # if you want to fine-tune on the question and its context-after, you need: sentences_A_train[i], sentences_C_train[i]
    encoded_dict = tokenizer.encode_plus(
                        sentences_A_test[i], sentences_C_test[i],
                        add_special_tokens = True,
                        max_length = 128,         
                        pad_to_max_length = True,
                        return_attention_mask = True,   
                        return_tensors = 'pt', 
                   )
    
    # Add the encoded sentence to the list.    
    input_ids_test.append(encoded_dict['input_ids'])
    
    # And its attention mask (simply differentiates padding from non-padding).
    attention_masks_test.append(encoded_dict['attention_mask'])

# Convert the lists into tensors.
input_ids_test = torch.cat(input_ids_test, dim=0)
attention_masks_test = torch.cat(attention_masks_test, dim=0)
labels_test = torch.tensor(labels_test)

# Set the batch size.  
batch_size = 32  

# Create the DataLoader.
prediction_data = TensorDataset(input_ids_test, attention_masks_test, labels_test)
prediction_sampler = SequentialSampler(prediction_data)
prediction_dataloader = DataLoader(prediction_data, sampler=prediction_sampler, batch_size=batch_size)

Number of test sentences: 1,588





In [None]:
# Prediction on test set

print('Predicting labels for {:,} test sentences...'.format(len(input_ids_test)))

# Put model in evaluation mode
model.eval()

# Tracking variables 
total_test_accuracy = 0
nb_test_steps = 0

# Tracking variables 
predictions , true_labels, all_embeds = [], [], []

# Predict 
for batch in prediction_dataloader:
  # Add batch to GPU
  batch = tuple(t.to(device) for t in batch)
  
  # Unpack the inputs from our dataloader
  b_input_ids, b_input_mask, b_labels = batch
  
  # Telling the model not to compute or store gradients, saving memory and 
  # speeding up prediction
  with torch.no_grad():
      # Forward pass, calculate logit predictions
      outputs = model(b_input_ids, token_type_ids=None, 
                      attention_mask=b_input_mask)

  logits = outputs[0]
  hidden_states = outputs.hidden_states

  # Move logits and labels to CPU
  logits = logits.detach().cpu().numpy()
  label_ids = b_labels.to('cpu').numpy()

  i = 0
  # go through each sentence at the second from last layer:
  while i < len(hidden_states[-2]):
    # following code to get the sentence embedding from the CLS (first token of each sentence)
    sentence_embedding = hidden_states[-2][i][0]
     # following code to get the sentence embedding as the average of all tokens
    # get the tokens of each sentence
    #token_vecs = hidden_states[-2][i]
    #print (token_vecs.shape)
    # average those tokens to get an average sentence embedding
    #sentence_embedding = torch.mean(token_vecs, dim=0)
    #print (sentence_embedding.shape)
    # add the embeding to the list of snetence embeddings 
    all_embeds.append(sentence_embedding)
    i += 1
  
  # Store predictions and true labels
  predictions.append(logits)
  true_labels.append(label_ids)

  tmp_test_accuracy = flat_accuracy(logits, label_ids)
    
  total_test_accuracy += tmp_test_accuracy
  nb_test_steps += 1


print('    DONE.')
print("Test Accuracy: {}".format(total_test_accuracy/nb_test_steps))

Predicting labels for 1,588 test sentences...
    DONE.
Test Accuracy: 0.9377500000000001


In [None]:
# Flatten the predictions and true values
flat_predictions = [item for sublist in predictions for item in sublist]
flat_predictions = np.argmax(flat_predictions, axis=1).flatten()
flat_true_labels = [item for sublist in true_labels for item in sublist]

# get ids of predictions and 
ids = df_test.ID.values

i = 0
# create new Dataframe to hold the fine-tuned embeds
df_embeds = pd.DataFrame()

# add each prediction to the new dataframe
for pred in all_embeds:
  # convert tensor to numpy array
  numpy_pred = pred.cpu().numpy()
  reshaped_numpy = np.reshape(numpy_pred, (1,768))
  numpy_df = pd.DataFrame(reshaped_numpy)
  # append
  df_embeds = df_embeds.append(numpy_df,ignore_index=True)
  #print (df_embeds)  
  i +=1

  
print (i)
# insert the ids at the front
df_embeds.insert(0,"ID",ids)
df_embeds

1588


Unnamed: 0,ID,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,...,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767
0,1113080,-0.624018,-0.349868,-0.382977,0.036641,-0.363415,0.023820,-0.332245,-0.255287,-0.318352,0.174629,-0.124925,-0.085574,0.125850,0.546151,0.358586,-0.262353,-0.736644,-0.391401,0.586341,0.271116,0.661377,0.148248,-0.436195,0.399141,0.398655,0.005976,-0.212906,-0.789012,-0.264681,1.282379,-0.394703,-0.158896,-0.327694,-0.465828,-0.358759,-0.088071,0.043820,0.590825,0.237959,...,0.226612,-0.582723,-0.156315,-0.198098,-1.296372,-0.297298,-1.050315,0.036169,-0.011958,-0.025190,0.244999,0.573721,-0.100538,-0.444017,-0.188314,-0.258208,0.470525,0.210102,-0.128480,0.093418,-0.753367,-0.305138,0.198005,-0.104836,-0.978931,0.174753,-0.869373,0.431609,-0.450684,-1.275477,-0.380110,-0.372245,-0.123544,0.467679,0.132380,-0.082750,-0.174937,0.445593,0.550356,-0.129122
1,1113081,0.270131,-0.086376,-0.155895,-0.137007,0.305852,0.566150,-0.135364,-0.181811,0.273565,0.310873,0.351044,-0.352945,0.378537,-0.017702,-0.719168,0.302320,-0.520170,0.061731,-0.526917,0.668576,0.461040,-0.026566,0.352110,-0.148385,0.508185,0.267785,-0.016402,-0.049626,-1.111515,1.238145,-0.501640,0.376888,-0.492257,0.253948,-0.546699,-0.049612,0.248486,0.448389,0.030507,...,0.171693,-1.030645,-0.079492,0.065303,0.377806,0.093399,0.263002,-0.387891,-0.544686,-0.629705,0.287146,0.314549,0.085783,-0.538970,0.119640,-0.347085,1.125620,-0.264630,-0.589397,-0.491137,-1.186629,0.657936,0.017783,-0.325245,-0.671269,0.758901,-0.352198,-0.469969,-1.113843,-0.140534,-0.092931,-0.493651,-0.716700,-0.913317,0.171843,-0.463660,-0.168094,0.166773,0.260086,-0.778423
2,1113082,0.157268,-0.372572,-0.191488,-0.215519,-0.199283,0.039909,0.258195,-0.224172,0.443172,0.338801,0.236452,-0.444333,0.022490,-0.372710,-0.418878,0.110051,-0.503503,-0.164290,-0.344353,0.497209,0.319400,0.029244,0.249894,-0.240682,0.746816,0.259935,-0.071188,-0.395146,-0.839992,0.841247,-0.245682,-0.107896,-0.378679,0.013903,-0.217864,0.220568,0.126097,-0.124670,-0.329885,...,-0.188724,-0.886269,-0.202307,0.116288,0.082040,0.126691,0.131280,-0.209587,-0.425719,-0.399182,0.033350,0.094186,0.154107,-0.174667,-0.141069,0.111347,0.777951,-0.526614,-0.481488,-0.584185,-0.522272,0.622787,0.168862,-0.015256,-0.585916,0.572168,-0.141197,-0.468831,-0.625200,-0.368668,0.178913,-0.801333,-0.491507,-0.779445,0.148867,-0.203044,-0.355888,0.121072,0.166022,-0.544440
3,1113083,0.186450,-0.347765,-0.481607,-0.267755,-0.505731,-0.555540,-0.370373,-0.313331,0.009728,0.436975,0.443511,-0.057212,0.226982,0.359988,0.039835,0.039175,-0.463561,-0.471285,0.428875,-0.016978,0.041745,0.586759,-0.340973,0.280632,0.247990,-0.373314,0.194916,-0.501824,-0.507364,0.810175,-0.060308,-0.259293,-0.723265,-0.310226,-0.028130,0.249415,-0.140284,-0.035377,-0.038462,...,0.152780,-0.896757,-0.412354,0.215597,-0.316971,-0.112582,0.199049,-0.442989,-0.088435,-0.215524,0.076784,0.554259,0.061386,-0.203523,0.034283,-0.042308,0.200920,-0.341862,-0.040199,-0.084518,-0.130297,0.110891,0.217866,0.133420,-0.332300,0.130252,-0.340088,-0.207937,-0.245231,-0.495615,-0.386236,-0.569470,-0.515972,-0.266854,-0.326021,0.188339,-0.198137,0.094648,0.341003,-0.155572
4,1113086,-0.275234,-0.017988,-0.059701,-0.494237,0.154583,-0.198371,-0.408940,0.304458,-0.073714,0.465623,0.656702,-0.334627,0.038798,-0.043217,0.019939,-0.341887,-0.436489,-0.142792,-0.151310,0.254449,0.369298,0.561602,-0.263519,-0.314660,0.198678,-0.390328,-0.020252,-0.515327,-0.816254,1.156812,-0.323708,-0.153294,-0.448872,-0.186908,0.424003,0.009882,0.514401,0.025714,0.025089,...,-0.083639,-1.148470,-0.666298,-0.075288,0.174708,-0.115766,0.299112,-0.187756,-0.417176,-0.422001,0.120929,0.616716,-0.057233,-0.259054,-0.330097,-0.170284,0.351943,-0.487349,-0.045927,-0.351488,-0.454666,0.366093,-0.084367,-0.213052,-0.302986,0.016164,-0.118510,-0.780789,-0.206282,-0.348325,0.164422,-0.707787,-0.986103,-0.830617,0.139130,-0.662303,0.021782,-0.160354,0.427581,-0.132886
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1583,1115541,0.158176,-0.012408,-0.074602,-0.407357,-0.424088,-0.239985,0.151819,0.192066,0.296537,0.320044,-0.044743,-0.515114,0.014144,-0.024073,-0.165450,-0.045915,-0.386078,0.379277,0.016094,-0.130435,0.082425,0.590273,-0.385874,0.056497,0.266962,-0.224571,0.304705,-0.086112,-0.872228,1.146898,-0.292897,-0.267964,-0.254341,-0.179149,0.358983,0.367780,-0.067155,0.105555,-0.133914,...,-0.078326,-0.693557,-0.099018,0.141188,-0.383212,-0.216709,0.347864,-0.453237,-0.373840,-0.734838,-0.045990,0.553289,-0.565405,0.009190,-0.027467,-0.167227,0.552373,-0.623909,-0.083111,-0.372948,-0.159280,-0.197918,0.037804,0.020381,-0.321624,0.530154,-0.094181,-0.099354,-0.476008,-0.558274,-0.514167,-0.230656,-0.200308,-0.336612,0.202371,-0.133938,-0.109084,-0.218883,0.179575,0.034809
1584,1115542,0.364921,-0.023813,-0.065696,-0.529765,-0.591624,-0.259808,-0.264047,0.303148,0.306474,-0.005742,0.512823,0.662081,0.404310,0.675601,0.077073,0.013429,-0.499592,0.139314,-0.334764,0.533451,0.481596,0.700882,0.440649,0.102899,0.292141,-0.468394,0.290697,-0.763011,-0.759818,1.409413,-0.116099,-0.063310,-0.479284,-0.437163,0.678432,0.053086,0.527613,0.325971,0.158564,...,-0.096074,-0.723249,-0.783010,0.389968,-0.540631,-0.678437,-0.159369,-0.357611,-0.458857,-0.475237,0.044635,0.706214,0.440963,-0.129983,-0.091840,-0.643618,0.676874,0.401087,-0.514711,-0.075312,0.034887,-0.104844,0.200641,-0.754569,0.154211,0.526147,-0.096984,0.137251,-0.112814,-0.642588,-0.744786,-0.274648,-0.674487,-0.549828,0.123528,0.340532,-0.374444,0.304584,0.548691,0.546940
1585,1115543,0.018611,-0.224506,-0.253279,-0.494927,0.195240,0.009077,-0.192719,0.477200,-0.325116,0.500421,0.477306,-0.186611,0.195496,-0.001629,-0.200533,-0.200700,-0.304475,-0.352280,-0.000038,0.282758,0.269599,0.544799,-0.437565,-0.225807,-0.229354,-0.186276,0.013526,-0.866562,-0.662177,1.141256,-0.219979,-0.128027,-0.689995,-0.287059,0.303403,-0.023610,0.016225,0.104124,-0.029865,...,0.094832,-0.465815,-0.277912,0.157774,0.031133,-0.573566,0.449876,-0.128346,-0.135731,-0.522898,-0.800787,0.888971,-0.462421,-0.577863,-0.016536,-0.150312,0.681165,-0.566920,0.093249,-0.222362,-0.296507,-0.116297,-0.166789,-0.071280,-0.311343,0.444339,-0.393851,-0.442204,-0.737572,-0.528610,-0.063539,-0.484303,-0.513213,-0.125252,-0.391486,-0.385144,-0.136336,-0.032583,0.222046,0.072368
1586,1115544,0.185569,0.446492,-0.016041,-0.388242,0.152466,0.484421,0.004052,0.183431,0.158993,0.546126,0.216773,-0.361427,0.134217,0.031427,-0.429827,0.110094,-0.632053,0.014069,-0.517289,0.195487,0.354109,0.230142,0.350955,-0.430518,0.206621,0.124877,-0.271989,0.063415,-0.761377,0.962448,-0.850341,0.172674,-0.455386,0.058108,-0.499253,0.209036,0.130056,0.101431,-0.265344,...,-0.171251,-0.982094,-0.134183,-0.027092,0.213803,-0.056180,-0.078376,-0.310602,-0.411686,-0.635661,0.257639,-0.016885,-0.014283,-0.558791,0.098306,-0.279386,0.762423,-0.304405,-0.577119,-0.570099,-0.737446,0.583719,0.017126,-0.078349,-0.683977,0.419438,-0.276743,-0.629005,-1.000195,-0.447015,-0.148025,-0.561821,-0.384105,-0.648979,0.267398,-0.143278,-0.369142,0.098671,0.429855,-0.537094


In [None]:
# write dataframe to csv file
df_embeds.to_csv("fine-tuned_bert_embeds_on_queAndCtxAfter_testset.csv")

In [None]:
# Save model if necessary
model_save_name = 'que_contextAfter_fine-tuned_bert.pt'
torch.save(model.state_dict(), model_save_name)

Drive already mounted at /content/gdrive; to attempt to forcibly remount, call drive.mount("/content/gdrive", force_remount=True).
