# Introduction

<b> What are soft prompts? </b>
<br>
soft prompts can be described as a concept that involves incorporating vectors into an input sequence and then fine-tuning these vectors while keeping the rest of the pre-trained model's components unchanged. We deonte our input with $X$ and we denote $P$ as the matrix of these soft prompt vectors.
<br>
<div>
<img src="https://drive.google.com/uc?id=1aGI6FgvK3udOmHnWt1dCvC7lh6e9C2Oe" width="50%"/>
</div>

Read More :
<br>[Youtube : PEFT and Soft Prompt](https://www.youtube.com/watch?v=8uy_WII76L0)
<br>[Blog : What are soft prompts?](https://softwaremind.com/blog/how-and-why-soft-promps-are-slowly-replacing-text-prompts/)


Soft prompting in large language models (LLMs) refers to the practice of modifying the input representations (embeddings) instead of the textual input itself. Unlike traditional prompts, which involve providing plain text instructions, soft prompts use learnable embeddings as inputs to guide the model’s behavior.

This technique is particularly useful when fine-tuning the model is impractical or undesired, as it allows task-specific adaptations without altering the model's core weights. Soft prompts are typically trained for specific tasks by attaching them to the input embeddings and updating only these embeddings during training.

Example: Text Classification with Soft Prompt

Let's say you want an LLM to classify the sentiment of a sentence as positive, negative, or neutral.

Traditional Prompt:

`Input: "Classify the sentiment of this sentence: 'The movie was fantastic!'"`
`Output: "positive"`

Soft Prompt:

A fixed-length sequence of learnable embeddings is prepended to the token embeddings of the sentence.
Instead of "Classify the sentiment...", you use trainable embeddings (e.g., [P1, P2, ..., Pn]) that replace textual instructions.

Model Input: [P1, P2, ..., Pn] + Embeddings('The movie was fantastic!')

Output: "positive"

Training Process for Soft Prompt:
Initialize a trainable tensor (the soft prompt embeddings).
Freeze the LLM's weights and train only the soft prompt embeddings on the target task using labeled data.
Once trained, attach these soft embeddings to any relevant input during inference to guide the model.
Benefits of Soft Prompts:
Parameter Efficiency: Only a small number of parameters (the soft prompt embeddings) need training.
Task Adaptability: Enables task-specific tuning without modifying the large model's weights.
Speed: Faster training and inference compared to fine-tuning the full model.


### Requirements

In [1]:
%%capture
! pip install datasets transformers

### Imports

In [2]:
from tqdm.notebook import tqdm
from IPython import display

import numpy as np
import pandas as pd

from sklearn.metrics import accuracy_score

import torch
import torch.nn as nn

from datasets import load_dataset
from transformers import T5TokenizerFast, T5ForConditionalGeneration, DataCollatorForSeq2Seq

### Constants

### Base Model Selection
We will use `t5-small` as our base model from Hugging Face ([HF_Link](https://huggingface.co/t5-small)). For our tuning, we intend to utilize `10` soft prompt tokens ([HF_Link](https://huggingface.co/docs/peft/conceptual_guides/prompting), [Paper_Link](https://arxiv.org/abs/2104.08691)).


In [3]:
BASE_MODEL_NAME = 't5-small'
N_SOFT_PROMPT_TOKENS = 10

BATCH_SIZE = 32
LEARNING_RATE = 0.1
EPOCHS = 10

DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Dataset

### Load dataset

`imdb` dataset is a famouns NLP for binary sentiment dataset. Each row of data is either `negative` or `positive` ([HF_Link](https://huggingface.co/datasets/imdb)).

In [4]:
dataset = load_dataset('imdb')
dataset.pop('unsupervised')
print(dataset)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md:   0%|          | 0.00/7.81k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/21.0M [00:00<?, ?B/s]

test-00000-of-00001.parquet:   0%|          | 0.00/20.5M [00:00<?, ?B/s]

unsupervised-00000-of-00001.parquet:   0%|          | 0.00/42.0M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
})


### Define related functions

Because `T5` model is a sequence to sequence model we should map our labels to label_names before training and doing vice versa duing calculating metrics.

The functions `id2label` and `label2id` are defined to do this.

In [5]:
def id2label(ids):
    label_names = ['negative', 'positive']
    return [label_names[id] for id in ids]

def label2id(labels):
    label_names_dict = {
        'negative': 0,
        'positive': 1
    }
    return [
        label_names_dict.get(label, 2)
        for label in labels
    ]

# Tokenizer

### Load tokenizer

In [6]:
tokenizer = T5TokenizerFast.from_pretrained(BASE_MODEL_NAME)

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

### Process dataset using tokenizer

In this step we will get our dataset ready for training.

We preprocess tokenize our `text` and `label`.

For easier prompt tuning we put placeholders by prepending multiple `pad_token` to our input. The count of this pad tokens is the same as `n_soft_prompt_tokens`.

In [7]:
def preprocess_input(text):
    text = text.lower()
    text = text.replace('<br />', ' ')
    return text

# This function, preprend_padding_token, is designed to preprocess text data for soft prompting by prepending padding tokens to the input text. Here's a detailed explanation:

# Code Breakdown:
# def preprend_padding_token(text):
#     n_soft_prompt_tokens = N_SOFT_PROMPT_TOKENS
#     pad_token = tokenizer.pad_token
#     prefix = pad_token * n_soft_prompt_tokens
#     return prefix + text

# 1. Input: text
# The parameter text is the original input string (e.g., "The movie was fantastic!") that will be processed.
# 2. Variable: n_soft_prompt_tokens
# N_SOFT_PROMPT_TOKENS is a predefined constant representing the number of soft prompt tokens (or padding tokens) to prepend to the input.
# This defines the length of the "soft prompt prefix."
# 3. Variable: pad_token
# tokenizer.pad_token retrieves the padding token used by the tokenizer.
# Typically, this is a special token like [PAD] or a space (" ") used to maintain a consistent input length.
# 4. Step: Create a prefix
# prefix = pad_token * n_soft_prompt_tokens
# Creates a string consisting of n_soft_prompt_tokens repetitions of the padding token.
# For example:
# If n_soft_prompt_tokens = 5 and pad_token = "[PAD]", then:
# python
# Copy code
# prefix = "[PAD][PAD][PAD][PAD][PAD]"
# 5. Return: prefix + text
# Concatenates the generated prefix to the original text.
# Final output: The input text is prepended with the specified number of padding tokens.
# Example:
# Input text: "The movie was fantastic!"
# If n_soft_prompt_tokens = 5 and pad_token = "[PAD]", the result would be:
# "[PAD][PAD][PAD][PAD][PAD]The movie was fantastic!"
# Purpose:
# The padding tokens represent placeholders for soft prompt embeddings during tokenization and subsequent processing by the LLM. They are not literal padding for sequence alignment but will be replaced by learnable embeddings corresponding to the soft prompt during training or inference.

# Training: These prepended tokens act as a proxy for task-specific instructions encoded as embeddings.
# Inference: Once the soft prompt embeddings are trained, they guide the model's behavior for a specific task.
# Considerations:
# Tokenizer Behavior:

# The padding tokens in prefix must align with the tokenizer's configuration. For instance, if pad_token is [PAD], the tokenizer should recognize and tokenize it appropriately.
# Embedding Mapping:

# At runtime, the LLM will replace these padded tokens with the corresponding soft prompt embeddings learned during training.

def preprend_padding_token(text):
    n_soft_prompt_tokens = N_SOFT_PROMPT_TOKENS
    pad_token = tokenizer.pad_token
    prefix = pad_token * n_soft_prompt_tokens
    return prefix + text

def map_function(row):
    processed_input = [
        preprend_padding_token(preprocess_input(text))
        for text in row['text']
    ]
    input_info = tokenizer(processed_input, truncation=True, max_length=256)
    output_info = tokenizer(id2label(row['label']))
    return {
        **input_info,
        'labels': output_info.input_ids
    }


dataset = dataset.map(map_function, batched=True)
dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

# Model

### Load model

In your setup with T5TokenizerFast.from_pretrained(BASE_MODEL_NAME), the word embeddings for the T5 model are initialized with pretrained values corresponding to the BASE_MODEL_NAME (e.g., "t5-small"). These embeddings are part of the pretrained model and are not random.

During soft prompt fine-tuning, typically:

The embeddings corresponding to the soft prompt tokens (special tokens added for the prompt) are initialized and trained specifically for the task. These embeddings are randomly initialized by default.
The pretrained word embeddings of the original vocabulary are not updated unless explicitly specified in the fine-tuning configuration. In soft prompt fine-tuning, it's common to freeze the main model parameters (including word embeddings) and update only the soft prompt embeddings.

In [8]:
model = T5ForConditionalGeneration.from_pretrained(BASE_MODEL_NAME)

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

### Define prompt related layers

In this part we will define our prompt layer in `SimplePrompts`. It's a simple layer that only returns it's prompt matrix when called.

`EmbeddingWrapper` is a layer that will replace original embedding layer of model and it functions as our injection into the model architecture.

We use sharif_llm in our PEFT module name so we could keep it unfreeze during training.

<font color='#73FF73'><b>You have to complete</b></font> `prompts_joiner` <font color='#73FF73'><b>function.</b></font>

In this function prompts will concatenated to model input embeddings. But in `preprend_padding_token` we allready put some placeholders for prompts. We just need to replace it with real prompts.

At first step you must repeat `prompts` in each batch_size and then remove placeholder embedings from `input_embedding` to calculate `non_place_holders`.

In [9]:
class SimplePrompts(nn.Module):
    def __init__(self, inital_values: torch.Tensor):
        super().__init__()
        self.n_tokens = inital_values.size(0)
        self.emb_dim = inital_values.size(1)
        self.prompt_emb = nn.parameter.Parameter(
            inital_values.detach().clone()
        )

    def forward(self):
        return self.prompt_emb

def prompts_joiner(prompts, input_embedding):
    # prompts.shape         = (n_tokens, emb_dim)
    # input_embedding.shape = (batch_size, n_tokens + seq_len, emb_dim)

    n_tokens = prompts.size(0)
    batch_size = input_embedding.size(0)
    prompts_batched = prompts.repeat(batch_size, 1, 1)
    non_place_holders = input_embedding[:, n_tokens:]
    assert prompts_batched.shape == (batch_size, *prompts.shape)
    assert non_place_holders.shape[1] + n_tokens == input_embedding.shape[1]

    return torch.cat([prompts_batched, non_place_holders], dim=1)

class EmbeddingWrapper(nn.Module):
    def __init__(
        self,
        emb_layer: nn.Embedding,
        n_tokens: int,
        **kwargs
    ):
        super().__init__()
        self.emb_layer = emb_layer

        prompt_inital_values = self.emb_layer.weight[:n_tokens]

        self.sharif_llm_soft_prompts = SimplePrompts(inital_values=prompt_inital_values)

    def forward(self, tokens):
        prompts = self.sharif_llm_soft_prompts()
        input_embedding = self.emb_layer(tokens)
        return prompts_joiner(prompts, input_embedding)

### Replace encoder's embedding layer with our layer



In this part we want to replace <b>model encoder embedding layer</b> with our wrapper.

You must use `get_encoder`, `get_input_embeddings` to get model embedding layer and use `EmbeddingWrapper` to create new embedding layer.

In [10]:
def mutate_model(model, n_tokens):
    if hasattr(model, '_mutated'):
        print("Model already contains Soft Prompt layers! \n Try reloading the model.")
        return
    encoder = model.get_encoder()
    embedding_layer = encoder.get_input_embeddings()
    new_embedding_layer = EmbeddingWrapper(embedding_layer, n_tokens)
    encoder.set_input_embeddings(new_embedding_layer)

    model._mutated = True

mutate_model(model, n_tokens=N_SOFT_PROMPT_TOKENS)

### Freeze all model's weight except our PEFT module

In this part we will freeze entire model except `encoder.embed_tokens.sharif_llm_soft_prompts.prompt_emb`

In [11]:
def freeze_non_pefts(model, peft_key):
    print('Non freezed weights:')
    for param_name, weights in model.named_parameters():
        weights.requires_grad = peft_key in param_name
        if weights.requires_grad:
            print(param_name)

freeze_non_pefts(model, peft_key='sharif_llm')

Non freezed weights:
encoder.embed_tokens.sharif_llm_soft_prompts.prompt_emb


# Train and evaluate

### Define dataloaders

In [12]:
col_fn = DataCollatorForSeq2Seq(
    tokenizer, return_tensors='pt', padding='longest',
)

train_loader = torch.utils.data.DataLoader(
    dataset['train'],
    batch_size=BATCH_SIZE,
    collate_fn=col_fn,
    shuffle=True
)

test_loader = torch.utils.data.DataLoader(
    dataset['test'],
    batch_size=BATCH_SIZE,
    collate_fn=col_fn
)

### Train functions

In [13]:
def train_loop(model, loader, optimizer):
    model.train()

    batch_losses = []

    for row in tqdm(loader, desc='Training:'):
        optimizer.zero_grad()

        out = model(**row.to(model.device))
        loss = out.loss

        batch_loss_value = loss.item()
        loss.backward()
        optimizer.step()

        batch_losses.append(batch_loss_value)

    loss_value = np.mean(batch_losses)
    return {'train_loss': loss_value}

def _predict(model, row):
    return model.generate(
        input_ids=row.input_ids,
        attention_mask=row.attention_mask,
        max_length=5
    )

def tokenizer_ids_to_label(all_input_ids):
    return tokenizer.batch_decode(all_input_ids, skip_special_tokens=True)

def valid_loop(model, loader, compute_metrics):
    model.eval()

    all_true = []
    all_pred = []

    with torch.no_grad():
        for row in tqdm(loader, desc='Validating:'):
            row.to(model.device)
            pred = _predict(model, row)

            all_true += row.labels.detach().cpu().tolist()
            all_pred += pred.detach().cpu().tolist()

    all_true = label2id(tokenizer_ids_to_label(all_true))
    all_pred = label2id(tokenizer_ids_to_label(all_pred))

    return {'valid_acc': compute_metrics(y_true=all_true, y_pred=all_pred)}

### Define our optimizer and metric function

In [14]:
optimizer = torch.optim.AdamW(model.parameters(), lr=LEARNING_RATE)
compute_metrics = accuracy_score

In soft prompting for sentiment analysis with outputs like "positive" and "negative", the loss is calculated similarly to traditional supervised learning. However, the soft prompt embeddings guide the model's behavior, and only those embeddings are typically updated during training.

Here’s how the loss is calculated step by step:

Steps to Calculate Loss:
1. Prepare Inputs
Soft Prompt: A fixed-length learnable embedding ([P1, P2, ..., Pn]) is prepended to the token embeddings of the input text.
Input Text: "The movie was fantastic!" becomes:
css
Copy code
Model Input: [P1, P2, ..., Pn] + Embeddings('The movie was fantastic!')
2. Pass Through the Model
The model processes the input embeddings and produces a distribution over possible outputs ("positive" and "negative" in this case).

3. Output Distribution (Logits)
The model generates logits, a vector representing the unnormalized scores for each class:

makefile
Copy code
Logits: [logit_positive, logit_negative]
These logits are then passed through a softmax function to convert them into probabilities.

4. Softmax Probabilities
The softmax function calculates the probabilities for each class:

𝑃
(
positive
)
=
𝑒
logit_positive
𝑒
logit_positive
+
𝑒
logit_negative
P(positive)=
e
logit_positive
 +e
logit_negative

e
logit_positive

​

𝑃
(
negative
)
=
𝑒
logit_negative
𝑒
logit_positive
+
𝑒
logit_negative
P(negative)=
e
logit_positive
 +e
logit_negative

e
logit_negative

​

5. Cross-Entropy Loss
For a labeled dataset where each input has a true label (e.g., "positive" or "negative"), the loss is calculated using cross-entropy:

Loss
=
−
log
⁡
(
𝑃
(
true class
)
)
Loss=−log(P(true class))
If the true label is "positive":
Loss
=
−
log
⁡
(
𝑃
(
positive
)
)
Loss=−log(P(positive))
If the true label is "negative":
Loss
=
−
log
⁡
(
𝑃
(
negative
)
)
Loss=−log(P(negative))
This ensures that the model assigns high probability to the correct class.

In [15]:
model.to(DEVICE)

all_results = []
for epoch in range(EPOCHS):
    epoch_results = {'epoch': epoch}

    epoch_results.update(
        train_loop(
            model=model,
            loader=train_loader,
            optimizer=optimizer,
        )
    )

    epoch_results.update(
        valid_loop(
            model=model,
            loader=test_loader,
            compute_metrics=compute_metrics,
        )
    )
    all_results.append(epoch_results)

    display.clear_output()
    display.display(pd.DataFrame(all_results).set_index('epoch'))

Unnamed: 0_level_0,train_loss,valid_acc
epoch,Unnamed: 1_level_1,Unnamed: 2_level_1
0,1.474587,0.84676
1,0.202343,0.85688
2,0.194447,0.86232
3,0.186815,0.85928
4,0.184109,0.86388
5,0.187787,0.86472
6,0.187635,0.8684
7,0.184131,0.87212
8,0.183604,0.86536
9,0.183449,0.86132


### Best Performance and number of parameters

In [16]:
best_score = pd.DataFrame(all_results)['valid_acc'].max() * 100
total_params = sum(p.numel() for p in model.parameters())
print(f"Number of parameters: {total_params}")
print('Best model preformance is: %%%.1f' % best_score)

Number of parameters: 60511744
Best model preformance is: %87.2


### Save PEFT file

In [17]:
peft_dict = {
    key: val
    for (key, val) in model.state_dict().items()
    if 'sharif_llm' in key
}
torch.save(peft_dict, 'prompts.pt')

# Use external library

In [18]:
import locale

def getpreferredencoding(do_setlocale=True):
    return "UTF-8"

locale.getpreferredencoding = getpreferredencoding


In [19]:
%pip install git+https://github.com/thunlp/OpenDelta.git

Collecting git+https://github.com/thunlp/OpenDelta.git
  Cloning https://github.com/thunlp/OpenDelta.git to /tmp/pip-req-build-25ov0iw7
  Running command git clone --filter=blob:none --quiet https://github.com/thunlp/OpenDelta.git /tmp/pip-req-build-25ov0iw7
  Resolved https://github.com/thunlp/OpenDelta.git to commit 9efab85a6eac2bc8949f71937492b43455bdf4a7
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting web.py (from opendelta==0.3.2)
  Downloading web.py-0.62.tar.gz (623 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m623.2/623.2 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting delta_center_client==0.0.4 (from opendelta==0.3.2)
  Downloading delta_center_client-0.0.4-py3-none-any.whl.metadata (801 bytes)
Collecting bigmodelvis (from opendelta==0.3.2)
  Downloading b

Use `OpenDelta` library to do the same thing. [link](https://opendelta.readthedocs.io/en/latest/modules/deltas.html)

For hyperparameters, test with `N_SOFT_PROMPT_TOKENS=1` and `N_SOFT_PROMPT_TOKENS=10`

### Load The Saved Model

In [20]:
import torch
from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained(BASE_MODEL_NAME)
tokenizer = T5TokenizerFast.from_pretrained(BASE_MODEL_NAME)



In [21]:
def train_model(_model):
  _model = _model.to(DEVICE)
  all_results = []
  for epoch in range(EPOCHS):
      epoch_results = {'epoch': epoch}

      epoch_results.update(
          train_loop(
              model=_model,
              loader=train_loader,
              optimizer=optimizer,
          )
      )

      epoch_results.update(
          valid_loop(
              model=_model,
              loader=test_loader,
              compute_metrics=compute_metrics,
          )
      )
      all_results.append(epoch_results)

      display.clear_output()
      display.display(pd.DataFrame(all_results).set_index('epoch'))
  return all_results

### Fine-Tuning With N_SOFT_PROMPTS = 10

In [22]:
from opendelta import SoftPromptModel

soft_prompt_model = SoftPromptModel(backbone_model=model, soft_token_num=N_SOFT_PROMPT_TOKENS)

soft_prompt_model = soft_prompt_model.to(DEVICE)


optimizer = torch.optim.AdamW(soft_prompt_model.parameters(), lr=LEARNING_RATE)
compute_metrics = accuracy_score

ModuleNotFoundError: No module named 'transformers.deepspeed'

In [None]:
soft_prompt_model.freeze_module(exclude=['deltas'])

soft_prompt_model.log()

In [None]:
train_results = train_model(soft_prompt_model.backbone_model)

In [None]:
best_score = pd.DataFrame(train_results)['valid_acc'].max() * 100
total_params = sum(p.numel() for p in soft_prompt_model.parameters())
print(f"Number of parameters: {total_params}")
print('Best model preformance is: %%%.1f' % best_score)

### Fine-Tuning With N_SOFT_PROMPTS = 1

In [None]:
N_SOFT_PROMPT_TOKENS = 1

model = T5ForConditionalGeneration.from_pretrained(BASE_MODEL_NAME)
tokenizer = T5TokenizerFast.from_pretrained(BASE_MODEL_NAME)

soft_prompt_model = SoftPromptModel(backbone_model=model, soft_token_num=N_SOFT_PROMPT_TOKENS)

soft_prompt_model = soft_prompt_model.to(DEVICE)


optimizer = torch.optim.AdamW(soft_prompt_model.parameters(), lr=LEARNING_RATE)
compute_metrics = accuracy_score

In [None]:
soft_prompt_model.freeze_module(exclude=['deltas'])

soft_prompt_model.log()

[INFO|(OpenDelta)basemodel:698]2023-11-11 04:01:28,905 >> Trainable Ratio: 512/60507136=0.000846%
[INFO|(OpenDelta)basemodel:700]2023-11-11 04:01:28,906 >> Delta Parameter Ratio: 512/60507136=0.000846%
[INFO|(OpenDelta)basemodel:702]2023-11-11 04:01:28,906 >> Static Memory 0.68 GB, Max Memory 4.66 GB


In [None]:
train_results_spn1 = train_model(soft_prompt_model.backbone_model)

Unnamed: 0_level_0,train_loss,valid_acc
epoch,Unnamed: 1_level_1,Unnamed: 2_level_1
0,7.872413,0.8146
1,0.414011,0.84052
2,0.363189,0.84732
3,0.354426,0.84736
4,0.360688,0.84736
5,0.367721,0.84464
6,0.379009,0.84204
7,0.401653,0.84496
8,0.442355,0.84412
9,0.463554,0.8426


In [None]:
best_score = pd.DataFrame(train_results_spn1)['valid_acc'].max() * 100
total_params = sum(p.numel() for p in soft_prompt_model.parameters())
print(f"Number of parameters: {total_params}")
print('Best model preformance is: %%%.1f' % best_score)

Number of parameters: 512
Best model preformance is: %84.7
