#### Fine Tuning Sentence Tranformer
- https://huggingface.co/blog/how-to-train-sentence-transformers

In [11]:
%%capture
%pip install sentence-transformers
#  sentence-transformers is a python framework for state-of-the-art sentence, text and image embeddings. It is backed by the popular HuggingFace transformers library. It provides a simple interface for computing embeddings while hiding the complex machinery behind it. It also supports fine-tuning of embeddings models on custom datasets.

In [12]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AdamW
import torch
from torch.utils.data import DataLoader
from sentence_transformers import InputExample, SentenceTransformer
import scipy

  from .autonotebook import tqdm as notebook_tqdm
2023-07-19 10:02:59.555506: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [29]:
# Prepare your training data
train_examples = []

# Generate train examples with keywords and labels
train_examples = []

# premise,hypothesis, label

# Example 1
sentence1 = "Gradient descent is an optimization algorithm which is commonly-used to train machine learning models and neural networks. Training data helps these models learn over time, and the cost function within gradient descent specifically acts as a barometer, gauging its accuracy with each iteration of parameter updates."
sentence2 = "Gradient descent (GD) is an iterative first-order optimisation algorithm used to find a local minimum/maximum of a given function. This method is commonly used in machine learning (ML) and deep learning(DL) to minimise a cost/loss function (e.g. in a linear regression). Due to its importance and ease of implementation, this algorithm is usually taught at the beginning of almost all machine learning courses."
label = .61  # Similar sentences

train_examples.append(InputExample(texts=[sentence1, sentence2], label=label))

# # Example 2
# sentence1 = "Artificial intelligence is the simulation of human intelligence processes by machines, especially computer systems. Specific applications of AI include expert systems, natural language processing, speech recognition and machine vision."
# sentence2 = "Gradient descent (GD) is an iterative first-order optimisation algorithm used to find a local minimum/maximum of a given function. This method is commonly used in machine learning (ML) and deep learning(DL) to minimise a cost/loss function (e.g. in a linear regression). Due to its importance and ease of implementation, this algorithm is usually taught at the beginning of almost all machine learning courses."
# label = 0  # Similar sentences

sentence1 = """
Bangladesh rose from 101st place in the January edition of the Henley Passport Index to 96th place now. Kosovo also holds the same place. Bangladeshi passport holders now have access to on-arrival visa options in 40 countries, according to the Henley and Partners, a London-based organization that released the most recent index of passports on Tuesday.  With its passport holders having access to on-arrival visa options in as many as 192 countries, Singapore presently occupies the top spot in the index.
"""
sentence2 = """Bangladesh has climbed to the 96th position in Henley Passport Index, from its previous ranking of 101st in the January edition. It shares the position with Kosovo. The Henley and Partners, a London-based organisation, released the latest passport index on Tuesday, saying that the Bangladeshi passport holders now enjoy on-arrival visa facilities in 40 countries.  Singapore currently holds the top position in the index, with its passport holders enjoying on-arrival visa facilities in as many as 192 countries.
"""
label = 1
train_examples.append(InputExample(texts=[sentence1, sentence2], label=label))

In [9]:
# Generate train examples with keywords and labels
train_examples = []

In [1]:
import pandas as pd

In [7]:
df = pd.read_json('data.json')
# lines = True means each line is a json object

df.shape
df.head()

Unnamed: 0,question,context,label
0,Home,fa fa-home,0
1,About,fa fa-info,0
2,Contact,fa fa-phone,0
3,Settings,fa fa-cog,0
4,Logout,fa fa-sign-out,0


In [15]:
# train_examples.append(InputExample(texts=[df['question'][0], df['context'][0]], label=df['label'][0]))

In [17]:
# Load pre-trained Sentence Transformer model
model_name = 'sentence-transformers/distilbert-base-nli-mean-tokens'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)

Downloading (…)okenizer_config.json: 100%|██████████| 450/450 [00:00<00:00, 2.32MB/s]
Downloading (…)lve/main/config.json: 100%|██████████| 550/550 [00:00<00:00, 3.56MB/s]
Downloading (…)solve/main/vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 414kB/s]
Downloading (…)/main/tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 679kB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 112/112 [00:00<00:00, 691kB/s]
Downloading pytorch_model.bin: 100%|██████████| 265M/265M [01:12<00:00, 3.66MB/s] 
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at sentence-transformers/distilbert-base-nli-mean-tokens and are newly initialized: ['pre_classifier.weight', 'classifier.bias', 'pre_classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [30]:
# Prepare your training data
train_examples = train_examples

In [31]:
# Tokenize and convert train examples to features
train_features = tokenizer.batch_encode_plus(
    [(example.texts[0], example.texts[1]) for example in train_examples],
    padding=True,
    truncation=True,
    max_length=128,
    return_tensors='pt'
)
train_labels = torch.tensor([example.label for example in train_examples])

In [32]:
# Fine-tuning setup
train_dataset = torch.utils.data.TensorDataset(train_features['input_ids'],
                                               train_features['attention_mask'],
                                               train_labels)
train_dataloader = DataLoader(train_dataset, batch_size=16, shuffle=True)

In [33]:
# Fine-tuning loop
num_epochs = 3
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)

optimizer = AdamW(model.parameters(), lr=2e-5)

for epoch in range(num_epochs):
    model.train()
    total_loss = 0

    for batch in train_dataloader:
        input_ids = batch[0].to(device)
        attention_mask = batch[1].to(device)
        labels = batch[2].to(device)

        optimizer.zero_grad()

        outputs = model(input_ids=input_ids,
                        attention_mask=attention_mask,
                        labels=labels)
        loss = outputs.loss

        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    average_loss = total_loss / len(train_dataloader)
    print(f'Epoch {epoch+1}/{num_epochs} - Average Loss: {average_loss:.4f}')


Epoch 1/3 - Average Loss: 0.6373
Epoch 2/3 - Average Loss: 0.3688
Epoch 3/3 - Average Loss: 0.2468


In [34]:
# Save the fine-tuned model
model.save_pretrained('fine_tuned_model')
tokenizer.save_pretrained('fine_tuned_model')

('fine_tuned_model/tokenizer_config.json',
 'fine_tuned_model/special_tokens_map.json',
 'fine_tuned_model/vocab.txt',
 'fine_tuned_model/added_tokens.json',
 'fine_tuned_model/tokenizer.json')

In [35]:
# Load the fine-tuned model
model_name = 'fine_tuned_model'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

In [36]:
# Create SentenceTransformer for encoding
sentence_transformer = SentenceTransformer(model_name)

No sentence-transformers model found with name fine_tuned_model. Creating a new one with MEAN pooling.
Some weights of the model checkpoint at fine_tuned_model were not used when initializing DistilBertModel: ['pre_classifier.weight', 'classifier.bias', 'pre_classifier.bias', 'classifier.weight']
- This IS expected if you are initializing DistilBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [49]:
# Example inference
queries = ['What is gradient descent?',]
answers = ['The impact of climate change on ecosystems','Gradient descent (GD) is an iterative first-order optimisation algorithm used to find a local minimum/maximum of a given function.',]

In [50]:
# Encode queries and answers into embeddings
query_embeddings = sentence_transformer.encode(queries, convert_to_tensor=True)
answer_embeddings = sentence_transformer.encode(answers, convert_to_tensor=True)

In [51]:
# Calculate cosine similarity between queries and answers
cosine_scores = 1 - scipy.spatial.distance.cdist(query_embeddings.cpu(), answer_embeddings.cpu(), 'cosine')

In [52]:
# Print results
for i, query in enumerate(queries):
    print(f'Query: {query}')
    print('Top 2 Answers:')
    for j in range(len(answers)):
        answer = answers[j]
        score = cosine_scores[i][j]
        print(f'Answer: {answer}  Score: {score:.4f}')
    print()

Query: What is gradient descent?
Top 2 Answers:
Answer: The impact of climate change on ecosystems  Score: 0.4218
Answer: Gradient descent (GD) is an iterative first-order optimisation algorithm used to find a local minimum/maximum of a given function.  Score: 0.5071

