**Objective:**
In the previous notebook, we have used setnece-transformer library for Siamese Network. In this notebook, we will understand the different components for SIamese network.This will help us to use Huggingface Trainer for training Siamese Network.

**Plan**
1. Set Environment
2. Load Dataset
3. Understanding Tokenization for Siamese Network
4. Understanding DataCollator for Siamese Network
5. Understanding Model (SBERT) for Siamese Network
6. Understanding validation metrics for Siamese Network

# <font color = 'indianred'> **1. Setting up the Environment** </font>



In [1]:
if 'google.colab' in str(get_ipython()):
    from google.colab import drive
    drive.mount("/content/drive")
    !pip install datasets transformers  -U -qq

Mounted at /content/drive
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m542.0/542.0 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.0/9.0 MB[0m [31m44.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m22.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m18.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m388.9/388.9 kB[0m [31m43.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.6/3.6 MB[0m [31m85.4 MB/s[0m eta [36m0:00:00[0m
[?25h

<font color = 'indianred'> *Load Libraries* </font>

In [2]:
# standard data science libraries for data handling and visualization

import numpy as np
from sklearn.metrics.pairwise import paired_cosine_distances


# New libraries introduced in this notebook

import torch
from datasets import load_dataset, DatasetDict, ClassLabel

from transformers import AutoTokenizer
from transformers import PreTrainedModel

from transformers.modeling_outputs import ModelOutput
from transformers import BertModel, BertConfig


import torch
import torch.nn as nn


# <font color = 'indianred'> **2. Load Data set**
    


**Quora Dataset**

The Quora dataset is composed of question pairs, and the task is to determine if the questions are paraphrases of each other (have the same meaning).



In [3]:
quora_dataset = load_dataset("quora")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading data:   0%|          | 0.00/35.9M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/404290 [00:00<?, ? examples/s]

In [4]:
# Renaming 'is_duplicate' column to 'labels' to match the naming convention expected by Hugging Face Trainer
train_dataset = quora_dataset.rename_column('is_duplicate', 'labels')

# Retrieve the features of the 'train' split from the quora_dataset
features = train_dataset['train'].features

# Define the 'labels' feature as a ClassLabel with two classes: 'not_duplicate' and 'duplicate'
features['labels'] = ClassLabel(num_classes=2, names=['not_duplicate', 'duplicate'])

# Cast the 'labels' column in the dataset to the ClassLabel type, ensuring compatibility with Hugging Face's Trainer
train_dataset= train_dataset.cast(features)

Casting the dataset:   0%|          | 0/404290 [00:00<?, ? examples/s]

In [5]:
train_dataset

DatasetDict({
    train: Dataset({
        features: ['questions', 'labels'],
        num_rows: 404290
    })
})

In [6]:
train_dataset['train'][0]

{'questions': {'id': [1, 2],
  'text': ['What is the step by step guide to invest in share market in india?',
   'What is the step by step guide to invest in share market?']},
 'labels': 0}

In [7]:
train_dataset['train'][0]['questions']['text'][0]

'What is the step by step guide to invest in share market in india?'

In [8]:
train_dataset['train'][0]['questions']['text'][1]

'What is the step by step guide to invest in share market?'

We have created the datset. The next step is to tokenize the dataset in a format so that we can pass the tokenized inputs to the pre-trained model.

# <font color = 'indianred'>**3. Understanding Tokenization for Siamese Network**</font>

- In our next step, we will understand tokenization for Siamese network

In [9]:
checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

In [10]:
def tokenize_fn(batch):
  question1 = []
  question2 = []
  for question_pair in batch['questions']:
    question1.append(question_pair['text'][0])
    question2.append(question_pair['text'][1])

  tokenized_question1 = tokenizer(question1, truncation=True)
  tokenized_question2 = tokenizer(question2, truncation=True)
  return {
      'input_ids_q1': tokenized_question1['input_ids'],
      'attention_mask_q1': tokenized_question1['attention_mask'],
      'input_ids_q2': tokenized_question2['input_ids'],
      'attention_mask_q2': tokenized_question2['attention_mask'],
  }


In [11]:
tokenized_dataset = train_dataset.map(tokenize_fn, batched=True).remove_columns( ['questions'])

Map:   0%|          | 0/404290 [00:00<?, ? examples/s]

In [12]:
tokenized_dataset.set_format(type='torch')

In [13]:
tokenized_dataset['train'].features

{'labels': ClassLabel(names=['not_duplicate', 'duplicate'], id=None),
 'input_ids_q1': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'attention_mask_q1': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'input_ids_q2': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None),
 'attention_mask_q2': Sequence(feature=Value(dtype='int64', id=None), length=-1, id=None)}

In [14]:
print(len(tokenized_dataset["train"]["input_ids_q1"][2]))
print(len(tokenized_dataset["train"]["input_ids_q1"][1]))

18
21


The varying lengths in the dataset indicate that padding has not been applied yet. Instead of padding the entire dataset, we prefer processing small batches during training. Padding is done selectively for each batch based on the maximum length in the batch. We will discuss this in more detail in a later section of this notebook.

#  <font color = 'indianred'> **4. Understanding Data Collator for Siamese Network** </font>

- We need custom data collator because the default data collator can process only one set of input_ids and attentions_masks.
- Previously when we passed pair of texts we passed it like <text1> <sep>  <text2>. This generated only one set of input_ids and attention mask.
- Since we have two sets of input_ids and attention_mask, we need a custom data collator.

In [15]:
class SiameseDataCollatorWithPadding:
    def __init__(self, tokenizer, padding=True):
        """
        Custom data collator for Siamese network structure with separate tokenization for two inputs.

        Args:
        tokenizer (PreTrainedTokenizer): The tokenizer used for encoding the text inputs.
        padding (bool, optional): Whether to pad the inputs to the maximum length in the batch. Defaults to True.
        """
        self.tokenizer = tokenizer
        self.padding = padding

    def __call__(self, features):
        # Separate features for question1 and question2
        features_q1 = [{"input_ids": feature["input_ids_q1"], "attention_mask": feature["attention_mask_q1"]} for feature in features]
        features_q2 = [{"input_ids": feature["input_ids_q2"], "attention_mask": feature["attention_mask_q2"]} for feature in features]

        # Pad each set of features independently
        batch_q1 = self.tokenizer.pad(features_q1, padding=self.padding, return_tensors="pt")
        batch_q2 = self.tokenizer.pad(features_q2, padding=self.padding, return_tensors="pt")

        # Combine the padded features into one dictionary
        batch = {
            "input_ids_q1": batch_q1["input_ids"],
            "attention_mask_q1": batch_q1["attention_mask"],
            "input_ids_q2": batch_q2["input_ids"],
            "attention_mask_q2": batch_q2["attention_mask"],
        }

        # If labels exist, include them in the batch
        if "labels" in features[0]:
            batch["labels"] = torch.tensor([feature["labels"] for feature in features], dtype=torch.long)

        return batch


<font color = 'indianred'>*Check the function above*</font>

In [16]:
model = BertModel.from_pretrained(checkpoint)

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

In [17]:
sample = tokenized_dataset['train'].shuffle(999).select(range(4))

In [18]:
sample

Dataset({
    features: ['labels', 'input_ids_q1', 'attention_mask_q1', 'input_ids_q2', 'attention_mask_q2'],
    num_rows: 4
})

In [19]:
data_collator = SiameseDataCollatorWithPadding(tokenizer=tokenizer)

In [20]:
model_inputs = data_collator(sample)

You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


In [21]:
model_inputs

{'input_ids_q1': tensor([[  101,  2003,  2009,  2204,  2000,  2031,  3348,  2077,  1037,  3510,
           1029,   102,     0,     0,     0,     0],
         [  101,  2129,  2064,  1045, 12776,  3593,  3959,  1029,   102,     0,
              0,     0,     0,     0,     0,     0],
         [  101,  2029,  2024,  1996,  2659,  4578,  1998,  2659,  3635,  3835,
          18105,  2800,  1999,  2634,  1029,   102],
         [  101,  2151,  4784,  2005,  1037, 22752,  1029,   102,     0,     0,
              0,     0,     0,     0,     0,     0]]),
 'attention_mask_q1': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
         [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]]),
 'input_ids_q2': tensor([[  101,  2003,  2383,  3348,  2077,  3510,  2157,  1029,   102,     0,
              0,     0,     0,     0,     0,     0,     0,     0,     0,     0,
              0

#  <font color = 'indianred'> **5. Understanding Model for Siamese Network** </font>
- Here again since are passing two sets of input_ids and attention mask , the AutoModelForSequenceClassification will not work.
- The whole idea behind SBERT is that cls token does not  give document level embeddings.
- We will use all the tokens and need pooling function to pool embeddings from different tokens.

In [22]:
def mean_pool(token_embeds, attention_mask):
    # reshape attention_mask to cover 768-dimension embeddings
    in_mask = attention_mask.unsqueeze(-1).expand(token_embeds.size()).float()
    # perform mean-pooling but exclude padding tokens (specified by in_mask)
    pool = torch.sum(token_embeds * in_mask, 1) / torch.clamp(in_mask.sum(1), min=1e-9 )
    return pool

In [23]:
class SiameseBertModel(PreTrainedModel):
    config_class = BertConfig

    def __init__(self, config):
        super().__init__(config)
        self.bert = BertModel.from_pretrained('bert-base-uncased')
        self.classifier = nn.Linear(config.hidden_size * 3, 2)  # Assuming binary classification (duplicate or not)

    def forward(self, input_ids_q1, attention_mask_q1, input_ids_q2, attention_mask_q2, labels=None):
        u = self.bert(input_ids_q1, attention_mask=attention_mask_q1).last_hidden_state
        v = self.bert(input_ids_q2, attention_mask=attention_mask_q2).last_hidden_state

        # get the mean pooled vectors
        u = mean_pool(u, attention_mask_q1)
        v = mean_pool(v, attention_mask_q2)

        # build the |u-v| tensor
        uv = torch.sub(u, v)
        uv_abs = torch.abs(uv)

        # concatenate u, v, |u-v|
        x = torch.cat([u, v, uv_abs], dim=-1)
        logits = self.classifier(x)

        # compute the loss if labels are provided
        loss = None
        if labels is not None:
            loss_fct = nn.CrossEntropyLoss()
            loss = loss_fct(logits.view(-1, self.config.num_labels), labels.view(-1))

        return ModelOutput(
            loss=loss.mean(),
            embeddings=(u,v)
        )


<font color = 'indianred'>*Understanding the mean_pool function and the model output*</font>

In [24]:
model = SiameseBertModel(config=BertConfig.from_pretrained('bert-base-uncased'))

In [25]:
u = model.bert(model_inputs['input_ids_q1'], model_inputs['attention_mask_q1'])

In [26]:
u.keys()

odict_keys(['last_hidden_state', 'pooler_output'])

In [27]:
u = model.bert(model_inputs['input_ids_q1'], model_inputs['attention_mask_q1']).last_hidden_state
v = model.bert(model_inputs['input_ids_q2'], model_inputs['attention_mask_q2']).last_hidden_state

In [28]:
u.shape, v.shape

(torch.Size([4, 16, 768]), torch.Size([4, 38, 768]))

In [29]:
in_mask_u = model_inputs['attention_mask_q1']

In [30]:
in_mask_u.shape

torch.Size([4, 16])

In [31]:
in_mask_u = in_mask_u.unsqueeze(-1)

In [32]:
in_mask_u.shape

torch.Size([4, 16, 1])

In [33]:
in_mask_u = in_mask_u.expand(u.size()).float()

In [34]:
in_mask_u.shape

torch.Size([4, 16, 768])

In [35]:
in_mask_u = model_inputs['attention_mask_q1'].unsqueeze(-1).expand(u.size()).float()
in_mask_v = model_inputs['attention_mask_q2'].unsqueeze(-1).expand(v.size()).float()

In [36]:
u

tensor([[[-0.0516, -0.0113, -0.1260,  ..., -0.3779,  0.1815,  0.7578],
         [ 0.1219, -0.2177, -0.3866,  ...,  0.4279,  0.3145,  1.0420],
         [ 0.0493, -0.3095,  0.0305,  ...,  0.2918, -0.5610,  0.4532],
         ...,
         [ 0.0017, -0.0111,  0.0194,  ...,  0.1675, -0.0778, -0.1097],
         [ 0.2624,  0.3161,  0.3934,  ...,  0.0282, -0.1008, -0.0133],
         [-0.1088, -0.0946, -0.0695,  ...,  0.0746, -0.0662, -0.0093]],

        [[-0.3242,  0.2547,  0.1094,  ..., -0.4894,  0.3260,  0.4994],
         [-0.2275,  0.2351,  0.3060,  ..., -0.0245,  1.0327,  0.2607],
         [ 0.2176,  0.0937,  0.8829,  ..., -0.9910,  0.0999,  0.1136],
         ...,
         [-0.3471,  0.1181,  0.5570,  ..., -0.0450, -0.0239,  0.2013],
         [-0.2136,  0.3034,  0.5487,  ..., -0.1348, -0.0237,  0.3597],
         [-0.2709,  0.2453,  0.5915,  ...,  0.0341, -0.1076,  0.2019]],

        [[-0.6831, -0.2070,  0.0347,  ..., -0.5690,  0.6953,  0.0339],
         [-0.3597, -0.5691,  0.3053,  ..., -0

In [37]:
pool_u_num = u * in_mask_u
pool_u_num

tensor([[[-0.0516, -0.0113, -0.1260,  ..., -0.3779,  0.1815,  0.7578],
         [ 0.1219, -0.2177, -0.3866,  ...,  0.4279,  0.3145,  1.0420],
         [ 0.0493, -0.3095,  0.0305,  ...,  0.2918, -0.5610,  0.4532],
         ...,
         [ 0.0000, -0.0000,  0.0000,  ...,  0.0000, -0.0000, -0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000, -0.0000, -0.0000],
         [-0.0000, -0.0000, -0.0000,  ...,  0.0000, -0.0000, -0.0000]],

        [[-0.3242,  0.2547,  0.1094,  ..., -0.4894,  0.3260,  0.4994],
         [-0.2275,  0.2351,  0.3060,  ..., -0.0245,  1.0327,  0.2607],
         [ 0.2176,  0.0937,  0.8829,  ..., -0.9910,  0.0999,  0.1136],
         ...,
         [-0.0000,  0.0000,  0.0000,  ..., -0.0000, -0.0000,  0.0000],
         [-0.0000,  0.0000,  0.0000,  ..., -0.0000, -0.0000,  0.0000],
         [-0.0000,  0.0000,  0.0000,  ...,  0.0000, -0.0000,  0.0000]],

        [[-0.6831, -0.2070,  0.0347,  ..., -0.5690,  0.6953,  0.0339],
         [-0.3597, -0.5691,  0.3053,  ..., -0

In [38]:
pool_u_num.shape

torch.Size([4, 16, 768])

In [39]:
pool_u_num = torch.sum(pool_u_num, 1)
pool_u_num.shape

torch.Size([4, 768])

In [40]:
torch.clamp??

In [41]:
pool_u_den = torch.clamp(in_mask_u.sum(1), min=1e-9)
pool_u_den, pool_u_den.shape

(tensor([[12., 12., 12.,  ..., 12., 12., 12.],
         [ 9.,  9.,  9.,  ...,  9.,  9.,  9.],
         [16., 16., 16.,  ..., 16., 16., 16.],
         [ 8.,  8.,  8.,  ...,  8.,  8.,  8.]]),
 torch.Size([4, 768]))

In [42]:
pooled_u = pool_u_num / pool_u_den

In [43]:
pooled_u.shape

torch.Size([4, 768])

In [44]:
pooled_u = torch.sum(u * in_mask_u, 1)/torch.clamp(in_mask_u.sum(1), min=1e-9).cpu()
pooled_v = torch.sum(v * in_mask_v, 1)/torch.clamp(in_mask_v.sum(1), min=1e-9)


In [45]:
labels = model_inputs['labels']

In [46]:
labels

tensor([1, 1, 0, 0])

In [47]:
pooled_u.shape, pooled_v.shape

(torch.Size([4, 768]), torch.Size([4, 768]))

In [48]:
pooled_u

tensor([[ 0.3387, -0.1501, -0.0345,  ..., -0.2487, -0.2216,  0.1747],
        [-0.0709,  0.0411,  0.5113,  ..., -0.6353,  0.1695,  0.0300],
        [-0.2432, -0.4806,  0.1316,  ..., -0.2644,  0.2468, -0.4703],
        [ 0.3089, -0.6763,  0.0201,  ..., -0.4630,  0.0020, -0.4256]],
       grad_fn=<DivBackward0>)

#  <font color = 'indianred'> **6. Understanding Validation Metrics for Siamese Network** </font>

We need a different method for evaluation as well. The metrics should be derived based on the threshold of the similarity rather than the threshold of logits:

- **Purpose of Siamese Network Architecture**:
  - SBERT employs a Siamese network architecture tailored to generate sentence embeddings that facilitate the direct comparison of semantic similarity between sentences. This architecture is specifically designed to assess how closely sentences are related in meaning rather than classifying them into discrete categories.

- **Implications for Model Evaluation**:
  - Due to the purpose of extracting semantically meaningful embeddings, the evaluation of SBERT models is based on similarity scores (like cosine similarity) between embeddings. This method is essential for tasks where understanding the degree of similarity between texts directly impacts the application, such as in matching questions, detecting paraphrases, or linking similar content.

- **Rationale for Using Specific Metrics**:
  - As a result, traditional classification metrics based on logits (like cross-entropy loss) are less relevant. Instead, metrics such as F1-score and accuracy are derived by applying thresholds to similarity scores. This approach accurately reflects the model's effectiveness in its primary role of semantic similarity measurement.



In [49]:
pooled_u = pooled_u.cpu().detach().numpy()
pooled_v = pooled_v.cpu().detach().numpy()

In [50]:
scores = 1 - paired_cosine_distances(pooled_u, pooled_v)

In [51]:
scores

array([0.86140877, 0.93818796, 0.8370034 , 0.78171575], dtype=float32)

In [52]:
labels

tensor([1, 1, 0, 0])

In [53]:
rows = list(zip(scores, labels))

In [54]:
rows = sorted(rows, key=lambda x: x[0], reverse=True)
rows

[(0.93818796, tensor(1)),
 (0.86140877, tensor(1)),
 (0.8370034, tensor(0)),
 (0.78171575, tensor(0))]

In [55]:
# the function is borrowed from the sentence-transformers library
def find_best_acc_and_threshold(scores, labels, high_score_more_similar: bool):
    assert len(scores) == len(labels)
    rows = list(zip(scores, labels))

    rows = sorted(rows, key=lambda x: x[0], reverse=high_score_more_similar)

    max_acc = 0
    best_threshold = -1

    positive_correct_so_far = 0 # positives predicted correctly so far (we start with 0)
    negatives_correct_so_far = sum(labels == 0) # negatives predicted correctly so far (we start with 2)

    for i in range(len(rows) - 1):
        print('i:', i)
        score, label = rows[i]
        if label == 1:
            positive_correct_so_far += 1
        else:
            negatives_correct_so_far -= 1
        print(f"Positive correct so far: {positive_correct_so_far}, Negatives correct so far: {negatives_correct_so_far}")

        acc = (positive_correct_so_far + negatives_correct_so_far) / len(labels)
        if acc > max_acc:
            max_acc = acc
            best_threshold = (rows[i][0] + rows[i + 1][0]) / 2
        print(f"Threshold: {best_threshold}, Accuracy: {max_acc}")

    return max_acc, best_threshold

In [56]:
accuracy, threshold_accuracy = find_best_acc_and_threshold(scores, labels, True)

i: 0
Positive correct so far: 1, Negatives correct so far: 2
Threshold: 0.8997983932495117, Accuracy: 0.75
i: 1
Positive correct so far: 2, Negatives correct so far: 2
Threshold: 0.8492060899734497, Accuracy: 1.0
i: 2
Positive correct so far: 2, Negatives correct so far: 1
Threshold: 0.8492060899734497, Accuracy: 1.0


In [57]:
# the function is borrowed from the sentence-transformers library
def find_best_f1_and_threshold(scores, labels, high_score_more_similar: bool):
    assert len(scores) == len(labels)

    scores = np.asarray(scores)
    labels = np.asarray(labels)

    rows = list(zip(scores, labels))

    rows = sorted(rows, key=lambda x: x[0], reverse=high_score_more_similar)

    best_f1 = best_precision = best_recall = 0
    threshold = 0
    total_predicted_as_positives_so_far = 0
    true_positives_so_far = 0
    total_positives_in_data = sum(labels)

    for i in range(len(rows) - 1):
        print('i:', i)
        score, label = rows[i]
        total_predicted_as_positives_so_far += 1

        if label == 1:
            true_positives_so_far += 1

        print(f"True positives so far: {true_positives_so_far}")
        print(f"Total predicted as positives so far: {total_predicted_as_positives_so_far}")
        print(f"Total positives in data: {total_positives_in_data}")

        if true_positives_so_far > 0:
            precision = true_positives_so_far / total_predicted_as_positives_so_far
            recall = true_positives_so_far / total_positives_in_data
            f1 = 2 * precision * recall / (precision + recall)
            if f1 > best_f1:
                best_f1 = f1
                best_precision = precision
                best_recall = recall
                threshold = (rows[i][0] + rows[i + 1][0]) / 2
            print(f"Threshold: {threshold}, Precision: {best_precision}, Recall: {best_recall}, F1: {best_f1}")
            print()

    return best_f1, best_precision, best_recall, threshold

In [58]:
best_f1, best_precision, best_recall, threshold_f1 = find_best_f1_and_threshold(scores, labels, True)

i: 0
True positives so far: 1
Total predicted as positives so far: 1
Total positives in data: 2
Threshold: 0.8997983932495117, Precision: 1.0, Recall: 0.5, F1: 0.6666666666666666

i: 1
True positives so far: 2
Total predicted as positives so far: 2
Total positives in data: 2
Threshold: 0.8492060899734497, Precision: 1.0, Recall: 1.0, F1: 1.0

i: 2
True positives so far: 2
Total predicted as positives so far: 3
Total positives in data: 2
Threshold: 0.8492060899734497, Precision: 1.0, Recall: 1.0, F1: 1.0



In [59]:
best_f1, best_precision, best_recall, threshold_f1

(1.0, 1.0, 1.0, 0.8492060899734497)