# DistilBERT Modelini SQuAD Veri Seti ile Soru-Cevaplama (Question-Answering) Ä°Ã§in Fine-Tune Etme

Bu projede, **distilbert-base-uncased** modeli **SQuAD** veri seti ile **fine-tuning** iÅŸlemi uygulanmÄ±ÅŸtÄ±r.

Model, soru-cevaplama (question-answering) gÃ¶revleri iÃ§in **AutoTokenizer**, **AutoModelForQuestionAnswering**, **TrainingArguments** ve **Trainer** kullanÄ±larak eÄŸitilmiÅŸtir. Bu iÅŸlem, modelin **doÄŸal dil iÅŸleme (NLP)** gÃ¶revleri iÃ§inde, Ã¶zellikle metinler Ã¼zerinden sorulara doÄŸru yanÄ±tlar verme yetisini artÄ±rmÄ±ÅŸtÄ±r.

### KullanÄ±lan DonanÄ±m  

- Kaggle Notebook - GPU: T4 x2

![image](../imgs/qa.png)

In [4]:
!nvidia-smi

Sat Sep 28 20:09:53 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   35C    P8             10W /   70W |       1MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla T4                       Off |   00

In [5]:
import warnings 
warnings.filterwarnings('ignore')

In [6]:
!pip install -q transformers datasets torch
!echo 'Installations Done!'

Installations Done!


In [7]:
import numpy as np 
import json 
from datasets import load_dataset

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
from transformers import TrainingArguments, Trainer
from transformers.trainer_utils import EvalPrediction
from transformers import pipeline 

import torch

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [8]:
model_name = 'distilbert-base-uncased'
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

## ðŸ¤— Hugging Face Datasets KÃ¼tÃ¼phanesi

### Parametreler

- **`path`**:
  - YÃ¼klenecek veri setinin yolu veya ismi. Ã–rnek: "imdb", "glue".

- **`name`**:
  - YÃ¼klenecek veri setinin alt kÃ¼mesi. Ã–rnek: "sst2" (GLUE iÃ§in).

- **`data_dir`**:
  - Veri dosyalarÄ±nÄ±n bulunduÄŸu dizin.

- **`data_files`**:
  - YÃ¼klenecek veri dosyalarÄ±.

- **`split`**:
  - Veri setinin bÃ¶lÃ¼nmesi (Ã¶rneÄŸin "train", "test").

- **`cache_dir`**:
  - Verinin Ã¶nbelleÄŸe alÄ±nacaÄŸÄ± dizin.

- **`features`**:
  - Ã–zelliklerin aÃ§Ä±kÃ§a belirtildiÄŸi yer.

- **`download_config`**:
  - Ä°ndirme yapÄ±landÄ±rma ayarlarÄ±.

- **`download_mode`**:
  - Ä°ndirme modu: "reuse_dataset_if_exists", "reuse_cache_if_exists", "force_redownload".

- **`verification_mode`**:
  - Veri setinin doÄŸrulama modu.

- **`ignore_verifications`**:
  - ArtÄ±k kullanÄ±lmÄ±yor, doÄŸrulamalarÄ± atlamak iÃ§in.

- **`keep_in_memory`**:
  - EÄŸer True ise, veri seti bellekte tutulur.

- **`save_infos`**:
  - EÄŸer True ise, veri seti bilgileri kaydedilir.

- **`revision`**:
  - YÃ¼klenecek veri setinin versiyonu veya commit ID'si.

- **`token`**:
  - Private veri setleri iÃ§in kullanÄ±lÄ±r, bir token saÄŸlar.

- **`use_auth_token`**:
  - ArtÄ±k kullanÄ±lmÄ±yor, oturum aÃ§ma token'Ä± iÃ§in.

- **`task`**:
  - ArtÄ±k kullanÄ±lmÄ±yor, veri seti yÃ¼kleme gÃ¶revini belirtmek iÃ§in.

- **`streaming`**:
  - EÄŸer True ise, veri seti akÄ±ÅŸ modunda yÃ¼klenir.

- **`num_proc`**:
  - Ã‡ok iÅŸlemcili veri iÅŸleme iÃ§in iÅŸlemci sayÄ±sÄ±.

- **`storage_options`**:
  - Depolama seÃ§enekleri.

- **`trust_remote_code`**:
  - EÄŸer True ise, uzaktan kod Ã§alÄ±ÅŸtÄ±rmaya izin verir.

- **`**config_kwargs`**:
  - DiÄŸer ek yapÄ±landÄ±rma argÃ¼manlarÄ±.


In [9]:
def load_data():
    """
    ## Load the SQuAD dataset from the Hugging Face Hub. 

    Returns:
        - dataset: The SQuAD dataset.
    """
    return load_dataset('squad', split='train[:5%]', trust_remote_code=True), load_dataset('squad', split='validation[:2%]', trust_remote_code=True)

In [10]:
train_data, val_data = load_data()

README.md:   0%|          | 0.00/7.62k [00:00<?, ?B/s]

train-00000-of-00001.parquet:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

validation-00000-of-00001.parquet:   0%|          | 0.00/1.82M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/87599 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/10570 [00:00<?, ? examples/s]

In [11]:
print(train_data)
print('-')
print(json.dumps(train_data[0], indent=4))

Dataset({
    features: ['id', 'title', 'context', 'question', 'answers'],
    num_rows: 4380
})
-
{
    "id": "5733be284776f41900661182",
    "title": "University_of_Notre_Dame",
    "context": "Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend \"Venite Ad Me Omnes\". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.",
    "question": "To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?",
    "answers"

## ðŸ¤— Hugging Face Tokenizer KÃ¼tÃ¼phanesi

### Parametreler

- **`text`**:
  - Tokenize edilecek metin veya metinlerin listesi.

- **`text_pair`**:
  - Ä°kinci bir metin veya metinlerin listesi, Ã§ift metinli modeller iÃ§in kullanÄ±lÄ±r.

- **`text_target`**:
  - Hedef metin veya metinlerin listesi (genellikle seq2seq modelleri iÃ§in kullanÄ±lÄ±r).

- **`text_pair_target`**:
  - Hedef ikinci metin veya metinlerin listesi, Ã§ift metinli modeller iÃ§in kullanÄ±lÄ±r.

- **`add_special_tokens`**:
  - Ã–zel tokenler ekler (Ã¶rneÄŸin [CLS], [SEP]). True olmasÄ±, modeli daha iyi performans gÃ¶sterir.

- **`padding`**:
  - Padding stratejisi: True, False, "longest", "max_length".

- **`truncation`**:
  - Kesme stratejisi: True, False, "longest_first", "only_first".

- **`max_length`**:
  - Maksimum token sayÄ±sÄ±. YÃ¼ksek deÄŸerler daha fazla bilgi taÅŸÄ±r ama daha fazla bellek kullanÄ±r.

- **`stride`**:
  - Kesme sÄ±rasÄ±nda kayma boyutu. Uzun metinler iÃ§in daha kÃ¼Ã§Ã¼k deÄŸerler kullanÄ±ÅŸlÄ± olabilir.

- **`is_split_into_words`**:
  - EÄŸer True ise, metin kelimelere bÃ¶lÃ¼nmÃ¼ÅŸ olarak kabul edilir.

- **`pad_to_multiple_of`**:
  - Padding boyutunun bir katÄ± olacak ÅŸekilde padding ekler.

- **`return_tensors`**:
  - DÃ¶ndÃ¼rÃ¼lecek tensÃ¶r tipi: 'pt', 'tf', 'np'.

- **`return_token_type_ids`**:
  - EÄŸer True ise, token tipi ID'lerini dÃ¶ndÃ¼rÃ¼r.

- **`return_attention_mask`**:
  - EÄŸer True ise, attention maskelerini dÃ¶ndÃ¼rÃ¼r.

- **`return_overflowing_tokens`**:
  - EÄŸer True ise, taÅŸan tokenleri dÃ¶ndÃ¼rÃ¼r.

- **`return_special_tokens_mask`**:
  - EÄŸer True ise, Ã¶zel token maskesini dÃ¶ndÃ¼rÃ¼r.

- **`return_offsets_mapping`**:
  - EÄŸer True ise, offset mapping dÃ¶ndÃ¼rÃ¼r.

- **`return_length`**:
  - EÄŸer True ise, token uzunluklarÄ±nÄ± dÃ¶ndÃ¼rÃ¼r.

- **`verbose`**:
  - EÄŸer True ise, iÅŸlem hakkÄ±nda daha fazla bilgi verir.

- **`**kwargs`**:
  - DiÄŸer ek argÃ¼manlar.


In [12]:
tokenizer = AutoTokenizer.from_pretrained(model_name)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

## ðŸ¤— Hugging Face AutoModelForQuestionAnswering

### Parametreler

- **`pretrained_model_name_or_path`**:
  - Modelin Ã¶nceden eÄŸitilmiÅŸ halinin adÄ±nÄ± veya yolunu belirtir. AÅŸaÄŸÄ±daki ÅŸekillerde kullanÄ±labilir:
    - Hugging Face model deposundaki bir *model id*'si.
    - Model aÄŸÄ±rlÄ±klarÄ±nÄ± iÃ§eren bir klasÃ¶rÃ¼n yolu (Ã¶rn., `./my_model_directory/`).
    - TensorFlow kontrol noktasÄ± dosyasÄ±na giden bir yol veya URL (Ã¶rn., `./tf_model/model.ckpt.index`), bu durumda `from_tf` True olarak ayarlanmalÄ±dÄ±r ve `config` argÃ¼manÄ± verilmelidir.

- **`model_args`**:
  - Modele Ã¶zel ek argÃ¼manlar, alttaki modelin `__init__()` metoduna aktarÄ±lÄ±r.

- **`config`**:
  - Modelin kullanÄ±lacak yapÄ±landÄ±rma dosyasÄ±. AÅŸaÄŸÄ±daki durumlarda otomatik olarak yÃ¼klenebilir:
    - Model Hugging Face tarafÄ±ndan saÄŸlanan bir modeldir ve *model id* stringi kullanÄ±larak yÃ¼klenir.
    - Model [~PreTrainedModel.save_pretrained] kullanÄ±larak kaydedilmiÅŸse ve verilen dizin Ã¼zerinden yeniden yÃ¼klenir.
    - Model yerel bir klasÃ¶r Ã¼zerinden yÃ¼klenir ve klasÃ¶rde *config.json* dosyasÄ± bulunur.

- **`state_dict`**:
  - KayÄ±tlÄ± aÄŸÄ±rlÄ±k dosyasÄ±ndan yÃ¼klenen bir state dictionary yerine kullanÄ±lacak bir state dictionary.

- **`cache_dir`**:
  - Ä°ndirilen model yapÄ±landÄ±rmasÄ±nÄ±n Ã¶nbelleÄŸe alÄ±nacaÄŸÄ± dizin.

- **`from_tf`**:
  - Model aÄŸÄ±rlÄ±klarÄ±nÄ± bir TensorFlow kontrol noktasÄ± dosyasÄ±ndan yÃ¼kler (bu seÃ§enek True yapÄ±lmalÄ±dÄ±r).

- **`force_download`**:
  - Model aÄŸÄ±rlÄ±klarÄ±nÄ±n ve yapÄ±landÄ±rma dosyalarÄ±nÄ±n tekrar indirilip indirilmeyeceÄŸini belirler (mevcut Ã¶nbellek dosyalarÄ± Ã¼zerine yazÄ±lÄ±r).

- **`resume_download`**:
  - ArtÄ±k kullanÄ±lmayan ve gÃ¶z ardÄ± edilen bir argÃ¼mandÄ±r. Ä°ndirilenler mÃ¼mkÃ¼n olduÄŸunda varsayÄ±lan olarak devam eder.

- **`proxies`**:
  - Her bir istek iÃ§in kullanÄ±lacak proxy sunucularÄ±nÄ±n sÃ¶zlÃ¼ÄŸÃ¼. Ã–rnek: `{'http': 'foo.bar:3128', 'https': 'foo.bar:4012'}`.

- **`output_loading_info`**:
  - Eksik anahtarlar, beklenmeyen anahtarlar ve hata mesajlarÄ±nÄ± iÃ§eren bir sÃ¶zlÃ¼k dÃ¶ndÃ¼rÃ¼lÃ¼p dÃ¶ndÃ¼rÃ¼lmeyeceÄŸini belirtir.

- **`local_files_only`**:
  - YalnÄ±zca yerel dosyalarÄ±n kullanÄ±lÄ±p kullanÄ±lmayacaÄŸÄ±nÄ± belirtir (Ã¶rn., indirme iÅŸlemi yapÄ±lmaz).

- **`revision`**:
  - KullanÄ±lacak model versiyonu. Bir dal adÄ±, etiket adÄ± veya commit id'si olabilir.

- **`trust_remote_code`**:
  - Hub'da tanÄ±mlanan Ã¶zel modellerin yerel makinede Ã§alÄ±ÅŸtÄ±rÄ±lmasÄ±na izin verilip verilmeyeceÄŸini belirtir. Bu seÃ§enek yalnÄ±zca gÃ¼venilen depolar iÃ§in True yapÄ±lmalÄ±dÄ±r.

- **`code_revision`**:
  - Model dÄ±ÅŸÄ±ndaki kodlar iÃ§in kullanÄ±lacak belirli revizyon. Bir dal adÄ±, etiket adÄ± veya commit id'si olabilir.

- **`kwargs`**:
  - YapÄ±landÄ±rma nesnesini gÃ¼ncellemek ve modeli baÅŸlatmak iÃ§in kullanÄ±lacak diÄŸer ek argÃ¼manlar.


In [13]:
model = AutoModelForQuestionAnswering.from_pretrained(model_name).to(device)

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [14]:
def preprocess_data(data):
    """
    ## Preprocess the data for training the model. 

    Args: 
        - data: The SQuAD dataset.
    
    Returns:
        - input_data: The preprocessed data that contains the input_ids, attention_mask, start_positions, and end_positions.
    """
    
    questions = data['question']
    contexts = data['context']
    
    input_data = tokenizer(questions, contexts, truncation=True, padding='max_length', max_length=512, return_tensors='pt')
    
    start_positions = []
    end_positions = []
    for answer in data['answers']:
        start_positions.append(answer['answer_start'][0])
        end_positions.append(answer['answer_start'][0] + len(answer['text'][0]))
        
    
    input_data['start_positions'] = start_positions
    input_data['end_positions'] = end_positions
    
    return input_data

In [15]:
train_data = train_data.map(preprocess_data, batched=True)
val_data = val_data.map(preprocess_data, batched=True)

Map:   0%|          | 0/4380 [00:00<?, ? examples/s]

Map:   0%|          | 0/211 [00:00<?, ? examples/s]

In [16]:
def compute_metrics(predictions: EvalPrediction):
    """
    ## Compute the metrics for the model.

    Args:
        - predictions: The predictions made by the model. It contains the predictions and the label_ids.
    
    Returns:
        - metrics: The metrics for the model such as accuracy, precision, recall, and f1 score.
    """
    start_token, end_token = predictions.predictions
    pred_start_positions, pred_end_positions = np.argmax(start_token, axis=1), np.argmax(end_token, axis=1)
    
    start_positions, end_positions = predictions.label_ids[0], predictions.label_ids[1]
    
    start_accuracy = accuracy_score(start_positions, pred_start_positions)
    end_accuracy = accuracy_score(end_positions, pred_end_positions)
    average_accuracy = (start_accuracy + end_accuracy) / 2
        
    start_precision_score = precision_score(start_positions, pred_start_positions, average='macro')
    end_precision_score = precision_score(end_positions, pred_end_positions, average='macro')
    average_precision_score = (start_precision_score + end_precision_score) / 2

    start_recall_score = recall_score(start_positions, pred_start_positions, average='macro')
    end_recall_score = recall_score(end_positions, pred_end_positions, average='macro')
    average_recall_score = (start_recall_score + end_recall_score) / 2

    start_f1_score = f1_score(start_positions, pred_start_positions, average='macro')
    end_f1_score = f1_score(end_positions, pred_end_positions, average='macro')
    average_f1_score = (start_f1_score + end_f1_score) / 2
    
    return {
        "accuracy" : average_accuracy * 100,
        'precision_score' : average_precision_score * 100,
        "recall_score" : average_recall_score * 100,
        "f1_score" : average_f1_score * 100
    }

## ðŸ¤— Hugging Face TrainingArguments KÃ¼tÃ¼phanesi

### Parametreler

- **`output_dir`**:
  - EÄŸitim sÄ±rasÄ±nda ve sonrasÄ±nda sonuÃ§larÄ±n kaydedileceÄŸi dizin.

- **`overwrite_output_dir`**:
  - EÄŸer True ise, mevcut `output_dir` iÃ§eriÄŸini Ã¼zerine yazar.

- **`do_train`**:
  - EÄŸitim iÅŸleminin yapÄ±lÄ±p yapÄ±lmayacaÄŸÄ±nÄ± belirtir.

- **`do_eval`**:
  - DeÄŸerlendirme iÅŸleminin yapÄ±lÄ±p yapÄ±lmayacaÄŸÄ±nÄ± belirtir.

- **`do_predict`**:
  - Tahmin iÅŸleminin yapÄ±lÄ±p yapÄ±lmayacaÄŸÄ±nÄ± belirtir.

- **`eval_strategy`**:
  - DeÄŸerlendirme stratejisi: 'no', 'steps' veya 'epoch'.

- **`prediction_loss_only`**:
  - EÄŸer True ise, sadece kayÄ±p hesaplanÄ±r, tahminler dÃ¶ndÃ¼rÃ¼lmez.

- **`per_device_train_batch_size`**:
  - Her bir cihaz (GPU/CPU) iÃ§in eÄŸitim batch boyutu.

- **`per_device_eval_batch_size`**:
  - Her bir cihaz (GPU/CPU) iÃ§in deÄŸerlendirme batch boyutu.

- **`gradient_accumulation_steps`**:
  - Gradient biriktirme adÄ±mlarÄ±. YÃ¼ksek deÄŸerler bellek kullanÄ±mÄ±nÄ± azaltÄ±r ama eÄŸitim sÃ¼resini uzatÄ±r.

- **`eval_accumulation_steps`**:
  - DeÄŸerlendirme biriktirme adÄ±mlarÄ±. YÃ¼ksek deÄŸerler bellek kullanÄ±mÄ±nÄ± azaltÄ±r ama deÄŸerlendirme sÃ¼resini uzatÄ±r.

- **`eval_delay`**:
  - EÄŸitim baÅŸladÄ±ktan sonra ilk deÄŸerlendirmenin yapÄ±lacaÄŸÄ± adÄ±m sayÄ±sÄ±.

- **`learning_rate`**:
  - Ã–ÄŸrenme oranÄ±. YÃ¼ksek deÄŸerler daha hÄ±zlÄ± Ã¶ÄŸrenir ama aÅŸÄ±rÄ± Ã¶ÄŸrenmeye (overfitting) yol aÃ§abilir.

- **`weight_decay`**:
  - AÄŸÄ±rlÄ±klarÄ±n kÃ¼Ã§Ã¼lme oranÄ±. YÃ¼ksek deÄŸerler modelin genelleme yeteneÄŸini artÄ±rabilir ama Ã¶ÄŸrenme yavaÅŸlar.

- **`adam_beta1`**:
  - Adam optimizasyon algoritmasÄ± iÃ§in beta1 parametresi.

- **`adam_beta2`**:
  - Adam optimizasyon algoritmasÄ± iÃ§in beta2 parametresi.

- **`adam_epsilon`**:
  - Adam optimizasyon algoritmasÄ± iÃ§in epsilon parametresi.

- **`max_grad_norm`**:
  - Gradientlerin maksimum normu. BÃ¼yÃ¼k deÄŸerler modelin stabilitesini artÄ±rabilir.

- **`num_train_epochs`**:
  - EÄŸitim iÃ§in epoch sayÄ±sÄ±. YÃ¼ksek deÄŸerler daha fazla Ã¶ÄŸrenme saÄŸlar ama aÅŸÄ±rÄ± Ã¶ÄŸrenmeye yol aÃ§abilir.

- **`max_steps`**:
  - Maksimum eÄŸitim adÄ±m sayÄ±sÄ±. -1 ise, tÃ¼m epoch'lar tamamlanÄ±r.

- **`lr_scheduler_type`**:
  - Ã–ÄŸrenme oranÄ± zamanlayÄ±cÄ± tipi.

- **`lr_scheduler_kwargs`**:
  - Ã–ÄŸrenme oranÄ± zamanlayÄ±cÄ±sÄ± iÃ§in ek parametreler.

- **`warmup_ratio`**:
  - Ã–ÄŸrenme oranÄ± Ä±sÄ±nma oranÄ±. YÃ¼ksek deÄŸerler baÅŸlangÄ±Ã§ta daha yavaÅŸ Ã¶ÄŸrenme saÄŸlar.

- **`warmup_steps`**:
  - Ã–ÄŸrenme oranÄ± Ä±sÄ±nma adÄ±m sayÄ±sÄ±. YÃ¼ksek deÄŸerler baÅŸlangÄ±Ã§ta daha yavaÅŸ Ã¶ÄŸrenme saÄŸlar.

- **`log_level`**:
  - Log seviyesi. 'passive', 'info', 'warning', 'error' veya 'critical'.

- **`log_level_replica`**:
  - Ã‡oklu GPU eÄŸitiminde log seviyesi.

- **`log_on_each_node`**:
  - Ã‡oklu node eÄŸitiminde her node iÃ§in loglama yapÄ±lÄ±r.

- **`logging_dir`**:
  - TensorBoard loglarÄ± iÃ§in dizin.

- **`logging_strategy`**:
  - Loglama stratejisi: 'no', 'steps' veya 'epoch'.

- **`logging_first_step`**:
  - EÄŸer True ise, ilk adÄ±mda loglama yapÄ±lÄ±r.

- **`logging_steps`**:
  - KaÃ§ adÄ±mda bir loglama yapÄ±lacaÄŸÄ±.

- **`logging_nan_inf_filter`**:
  - EÄŸer True ise, NaN ve sonsuz deÄŸerler loglanmaz.

- **`save_strategy`**:
  - Modelin kaydedilme stratejisi: 'no', 'steps' veya 'epoch'.

- **`save_steps`**:
  - KaÃ§ adÄ±mda bir modelin kaydedileceÄŸi.

- **`save_total_limit`**:
  - Maksimum kaÃ§ model kaydedileceÄŸi.

- **`save_safetensors`**:
  - Modelin gÃ¼venli tensÃ¶r formatÄ±nda kaydedilip kaydedilmeyeceÄŸi.

- **`save_on_each_node`**:
  - EÄŸer True ise, her node'da model kaydedilir.

- **`save_only_model`**:
  - EÄŸer True ise, sadece model kaydedilir, optimizer ve lr scheduler kaydedilmez.

- **`restore_callback_states_from_checkpoint`**:
  - Callback durumlarÄ±nÄ±n checkpoint'ten geri yÃ¼klenip yÃ¼klenmeyeceÄŸi.

- **`no_cuda`**:
  - EÄŸer True ise, GPU kullanÄ±lmaz.

- **`use_cpu`**:
  - EÄŸer True ise, CPU kullanÄ±lÄ±r.

- **`use_mps_device`**:
  - EÄŸer True ise, MacOS Metal Performance Shaders kullanÄ±lÄ±r.

- **`seed`**:
  - Rastgelelik iÃ§in seed deÄŸeri.

- **`data_seed`**:
  - Veri yÃ¼kleme iÃ§in seed deÄŸeri.

- **`jit_mode_eval`**:
  - EÄŸer True ise, PyTorch JIT modu kullanÄ±lÄ±r.

- **`use_ipex`**:
  - EÄŸer True ise, Intel Extension for PyTorch kullanÄ±lÄ±r.

- **`bf16`**:
  - EÄŸer True ise, bfloat16 kullanÄ±lÄ±r.

- **`fp16`**:
  - EÄŸer True ise, float16 kullanÄ±lÄ±r.

- **`fp16_opt_level`**:
  - float16 optimizasyon seviyesi.

- **`half_precision_backend`**:
  - YarÄ± hassasiyet arka ucu: 'auto', 'amp' veya 'apex'.

- **`bf16_full_eval`**:
  - EÄŸer True ise, bfloat16 tam deÄŸerlendirme yapÄ±lÄ±r.

- **`fp16_full_eval`**:
  - EÄŸer True ise, float16 tam deÄŸerlendirme yapÄ±lÄ±r.

- **`tf32`**:
  - EÄŸer True ise, TensorFloat-32 kullanÄ±lÄ±r.

- **`local_rank`**:
  - Ã‡oklu GPU eÄŸitiminde lokal rank.

- **`ddp_backend`**:
  - DDP backend tipi: 'nccl', 'gloo', 'mpi'.

- **`tpu_num_cores`**:
  - TPU Ã§ekirdek sayÄ±sÄ±.

- **`tpu_metrics_debug`**:
  - TPU metriklerinin debug bilgilerini iÃ§erir.

- **`debug`**:
  - Debug seÃ§eneÄŸi: 'underflow_overflow', 'poor_optimization'.

- **`dataloader_drop_last`**:
  - EÄŸer True ise, dataloader son batch'i dÃ¼ÅŸÃ¼rÃ¼r.

- **`eval_steps`**:
  - DeÄŸerlendirme adÄ±m sayÄ±sÄ±.

- **`dataloader_num_workers`**:
  - Dataloader iÃ§in Ã§alÄ±ÅŸan sayÄ±sÄ±. YÃ¼ksek deÄŸerler hÄ±z artÄ±rÄ±r ama daha fazla bellek kullanÄ±r.

- **`dataloader_prefetch_factor`**:
  - Dataloader iÃ§in prefetch faktÃ¶rÃ¼. YÃ¼ksek deÄŸerler hÄ±z artÄ±rÄ±r ama daha fazla bellek kullanÄ±r.

- **`past_index`**:
  - GeÃ§miÅŸ index.

- **`run_name`**:
  - Ã‡alÄ±ÅŸma ismi.

- **`disable_tqdm`**:
  - EÄŸer True ise, tqdm progress bar devre dÄ±ÅŸÄ± bÄ±rakÄ±lÄ±r.

- **`remove_unused_columns`**:
  - EÄŸer True ise, kullanÄ±lmayan sÃ¼tunlar veri setinden kaldÄ±rÄ±lÄ±r.

- **`label_names`**:
  - Label isimleri.

- **`load_best_model_at_end`**:
  - EÄŸitim sonunda en iyi modelin yÃ¼klenip yÃ¼klenmeyeceÄŸi.

- **`metric_for_best_model`**:
  - En iyi modelin seÃ§ilmesi iÃ§in kullanÄ±lacak metrik.

- **`greater_is_better`**:
  - EÄŸer True ise, metriklerde yÃ¼ksek deÄŸerler daha iyi olarak kabul edilir.

- **`ignore_data_skip`**:
  - EÄŸer True ise, veri atlamalarÄ± gÃ¶z ardÄ± edilir.

- **`fsdp`**:
  - Fully Sharded Data Parallel ayarlarÄ±.

- **`fsdp_min_num_params`**:
  - Fully Sharded Data Parallel iÃ§in minimum parametre sayÄ±sÄ±.

- **`fsdp_config`**:
  - Fully Sharded Data Parallel iÃ§in ek ayarlar.

- **`fsdp_transformer_layer_cls_to_wrap`**:
  - Fully Sharded Data Parallel iÃ§in transformer layer sÄ±nÄ±fÄ±.

- **`accelerator_config`**:
  - Accelerator iÃ§in ek ayarlar.

- **`deepspeed`**:
  - DeepSpeed yapÄ±landÄ±rmasÄ±.

- **`label_smoothing_factor`**:
  - Label smoothing faktÃ¶rÃ¼.

- **`optim`**:
  - Optimizasyon algoritmasÄ±.

- **`optim_args`**:
  - Optimizasyon algoritmasÄ± iÃ§in ek argÃ¼manlar.

- **`adafactor`**:
  - EÄŸer True ise, Adafactor optimizasyon algoritmasÄ± kullanÄ±lÄ±r.

- **`group_by_length`**:
  - EÄŸer True ise, input uzunluÄŸuna gÃ¶re gruplanÄ±r.

- **`length_column_name`**:
  - Uzunluk sÃ¼tunu ismi.

- **`report_to`**:
  - Raporlama platformlarÄ± (Ã¶rneÄŸin 'wandb').

- **`ddp_find_unused_parameters`**:
  - DDP'de kullanÄ±lmayan parametrelerin bulunmasÄ±.

- **`ddp_bucket_cap_mb`**:
  - DDP bucket kapasitesi.

- **`ddp_broadcast_buffers`**:
  - DDP broadcast buffer'larÄ±.

- **`dataloader_pin_memory`**:
  - Dataloader iÃ§in bellek pinleme.

- **`dataloader_persistent_workers`**:
  - Dataloader iÃ§in kalÄ±cÄ± Ã§alÄ±ÅŸanlar.

- **`skip_memory_metrics`**:
  - EÄŸer True ise, bellek metrikleri atlanÄ±r.

- **`use_legacy_prediction_loop`**:
  - EÄŸer True ise, eski tahmin dÃ¶ngÃ¼sÃ¼ kullanÄ±lÄ±r.

- **`push_to_hub`**:
  - EÄŸer True ise, model Hugging Face Hub'a itilir.

- **`resume_from_checkpoint`**:
  - Checkpoint'ten devam edilir.

- **`hub_model_id`**:
  - Hugging Face Hub model ID.

- **`hub_strategy`**:
  - Hub stratejisi.

- **`hub_token`**:
  - Hugging Face Hub token.

- **`hub_private_repo`**:
  - EÄŸer True ise, Ã¶zel repo kullanÄ±lÄ±r.

- **`hub_always_push`**:
  - EÄŸer True ise, her zaman Hub'a itilir.

- **`gradient_checkpointing`**:
  - EÄŸer True ise, gradient checkpointing yapÄ±lÄ±r.

- **`gradient_checkpointing_kwargs`**:
  - Gradient checkpointing iÃ§in ek argÃ¼manlar.

- **`include_inputs_for_metrics`**:
  - EÄŸer True ise, deÄŸerlendirme metrikleri iÃ§in inputlar da dahil edilir.

- **`eval_do_concat_batches`**:
  - EÄŸer True ise, deÄŸerlendirme batch'leri birleÅŸtirilir.

- **`fp16_backend`**:
  - float16 arka ucu.

- **`evaluation_strategy`**:
  - DeÄŸerlendirme stratejisi: 'no', 'steps' veya 'epoch'.

- **`push_to_hub_model_id`**:
  - Hub model ID.

- **`push_to_hub_organization`**:
  - Hub organizasyon ID.

- **`push_to_hub_token`**:
  - Hub token.

- **`mp_parameters`**:
  - Model paralel parametreler.

- **`auto_find_batch_size`**:
  - EÄŸer True ise, batch boyutu otomatik bulunur.

- **`full_determinism`**:
  - EÄŸer True ise, deterministik eÄŸitim yapÄ±lÄ±r.

- **`torchdynamo`**:
  - TorchDynamo kullanÄ±mÄ±.

- **`ray_scope`**:
  - Ray scope.

- **`ddp_timeout`**:
  - DDP timeout sÃ¼resi.

- **`torch_compile`**:
  - EÄŸer True ise, Torch compile kullanÄ±lÄ±r.

- **`torch_compile_backend`**:
  - Torch compile backend.

- **`torch_compile_mode`**:
  - Torch compile modu.

- **`dispatch_batches`**:
  - Batch dispatching.

- **`split_batches`**:
  - Batch splitting.

- **`include_tokens_per_second`**:
  - EÄŸer True ise, saniye baÅŸÄ±na token sayÄ±sÄ± dahil edilir.

- **`include_num_input_tokens_seen`**:
  - EÄŸer True ise, gÃ¶rÃ¼len input token sayÄ±sÄ± dahil edilir.

- **`neftune_noise_alpha`**:
  - Neptune noise alpha.

- **`optim_target_modules`**:
  - Optimizasyon hedef modÃ¼ller.

- **`batch_eval_metrics`**:
  - EÄŸer True ise, batch deÄŸerlendirme metrikleri hesaplanÄ±r.

In [17]:
training_args = TrainingArguments(
    output_dir='./training-results',
    weight_decay=0.01, 
    num_train_epochs=10,
    report_to=[],
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    save_strategy='epoch',
    learning_rate=1e-05,
    push_to_hub=False
)

## ðŸ¤— Hugging Face Trainer KÃ¼tÃ¼phanesi

### Parametreler

- **`model`**:
  - EÄŸitim ve deÄŸerlendirme iÃ§in kullanÄ±lacak model.

- **`args`**:
  - EÄŸitim argÃ¼manlarÄ±, `TrainingArguments` sÄ±nÄ±fÄ± ile oluÅŸturulur.

- **`data_collator`**:
  - Veriyi toplamak iÃ§in kullanÄ±lan data collator.

- **`train_dataset`**:
  - EÄŸitim iÃ§in kullanÄ±lacak veri seti.

- **`eval_dataset`**:
  - DeÄŸerlendirme iÃ§in kullanÄ±lacak veri seti.

- **`tokenizer`**:
  - Metin verilerini modelin anlayabileceÄŸi sayÄ±sal deÄŸerlere dÃ¶nÃ¼ÅŸtÃ¼rÃ¼r.

- **`model_init`**:
  - Modelin baÅŸlatÄ±lmasÄ± iÃ§in kullanÄ±lacak fonksiyon.

- **`compute_metrics`**:
  - DeÄŸerlendirme metriklerini hesaplamak iÃ§in kullanÄ±lacak fonksiyon.

- **`callbacks`**:
  - EÄŸitim sÃ¼recini izlemek ve kontrol etmek iÃ§in kullanÄ±lacak callback listesi.

- **`optimizers`**:
  - Optimizasyon algoritmasÄ± ve Ã¶ÄŸrenme oranÄ± zamanlayÄ±cÄ±sÄ±.

- **`preprocess_logits_for_metrics`**:
  - Metrik hesaplama iÃ§in logits'i Ã¶n iÅŸleme fonksiyonu.


In [18]:
trainer = Trainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=training_args, 
    compute_metrics=compute_metrics
)

In [19]:
trainer.train()

Step,Training Loss
500,6.1553
1000,5.9543


TrainOutput(global_step=1370, training_loss=5.991289169422902, metrics={'train_runtime': 1165.8306, 'train_samples_per_second': 37.57, 'train_steps_per_second': 1.175, 'total_flos': 5722605781401600.0, 'train_loss': 5.991289169422902, 'epoch': 10.0})

In [20]:
trainer.evaluate(val_data)

{'eval_loss': 6.2093305587768555,
 'eval_accuracy': 1.8957345971563981,
 'eval_precision_score': 0.30944311536416796,
 'eval_recall_score': 0.9580453659401028,
 'eval_f1_score': 0.34498880819780153,
 'eval_runtime': 2.1168,
 'eval_samples_per_second': 99.679,
 'eval_steps_per_second': 3.307,
 'epoch': 10.0}

In [21]:
model.save_pretrained('trained-model')
tokenizer.save_pretrained('trained-model-tokenizer')

('trained-model-tokenizer/tokenizer_config.json',
 'trained-model-tokenizer/special_tokens_map.json',
 'trained-model-tokenizer/vocab.txt',
 'trained-model-tokenizer/added_tokens.json',
 'trained-model-tokenizer/tokenizer.json')

In [22]:
qa_pipeline = pipeline('question-answering', model='trained-model', tokenizer='trained-model-tokenizer', device=device)

In [23]:
qa_pairs = [
    {
        "question": "When did the 2008 Sichuan earthquake occur?",
        "context": "The 2008 Sichuan earthquake, also known as the Wenchuan earthquake, occurred on May 12, 2008. It was a devastating 7.9 magnitude earthquake that struck the Sichuan province of China, resulting in widespread destruction and the loss of tens of thousands of lives.",
        "answer": "The 2008 Sichuan earthquake occurred on May 12, 2008."
    },
    {
        "question": "What are antibiotics used for?",
        "context": "Antibiotics are a type of medication used to treat bacterial infections. They work by killing or inhibiting the growth of bacteria. While antibiotics are powerful tools in fighting infection, their overuse can lead to antibiotic resistance, which is a growing concern in modern medicine.",
        "answer": "Antibiotics are used to treat bacterial infections."
    },
    {
        "question": "Who is BeyoncÃ©?",
        "context": "BeyoncÃ© is a world-renowned American singer, songwriter, actress, and producer. She gained fame as a member of Destiny's Child and later launched a successful solo career. Known for her powerful vocals, stage presence, and activism, BeyoncÃ© has become one of the most influential entertainers of her generation.",
        "answer": "BeyoncÃ© is a famous American singer, songwriter, and actress."
    },
    {
        "question": "Who was FrÃ©dÃ©ric Chopin?",
        "context": "FrÃ©dÃ©ric Chopin was a Polish composer and virtuoso pianist of the Romantic era, known for his solo piano compositions. His works, such as nocturnes, Ã©tudes, and waltzes, are celebrated for their emotional depth, technical mastery, and lyrical beauty.",
        "answer": "FrÃ©dÃ©ric Chopin was a Polish composer and pianist known for his Romantic era compositions."
    },
    {
        "question": "What is genocide?",
        "context": "Genocide is the deliberate and systematic destruction of a national, ethnic, racial, or religious group. It involves acts such as mass killings, causing serious harm, and forcibly transferring children, with the intent to annihilate the targeted group. The term was coined after World War II in response to the Holocaust.",
        "answer": "Genocide is the systematic destruction of a national, ethnic, racial, or religious group."
    },
    {
        "question": "What is an iPod?",
        "context": "The iPod is a portable media player developed by Apple Inc. It revolutionized the way people listened to music, offering a simple and stylish device that could store thousands of songs. First released in 2001, the iPod became one of the most iconic products of the digital music era.",
        "answer": "An iPod is a portable media player developed by Apple Inc."
    },
    {
        "question": "What is Montana known for?",
        "context": "Montana, a state in the northwestern United States, is known for its diverse landscapes, including mountains, plains, and rivers. It is home to Glacier National Park and part of Yellowstone National Park, offering vast opportunities for outdoor recreation. Montana is also referred to as 'Big Sky Country' for its expansive skies and scenic beauty.",
        "answer": "Montana is known for its diverse landscapes, national parks, and outdoor recreation."
    },
    {
        "question": "What is New York City famous for?",
        "context": "New York City is one of the most iconic cities in the world, known for its cultural diversity, bustling economy, and landmarks such as the Statue of Liberty, Times Square, and Central Park. As a global hub for finance, fashion, and entertainment, NYC is often called 'The City That Never Sleeps.'",
        "answer": "New York City is famous for its cultural diversity, landmarks, and being a global hub for finance and entertainment."
    },
    {
        "question": "What were Sino-Tibetan relations like during the Ming dynasty?",
        "context": "Sino-Tibetan relations during the Ming dynasty (1368â€“1644) were complex and marked by periods of both cooperation and conflict. The Ming court maintained a tributary relationship with Tibetan leaders, while also exerting influence through religious and political alliances. However, Tibet retained a significant degree of autonomy during this era.",
        "answer": "Sino-Tibetan relations during the Ming dynasty were marked by both cooperation and conflict, with Tibet maintaining some autonomy."
    },
    {
        "question": "What is 'Spectre' (2015 film) about?",
        "context": "'Spectre' is the 24th James Bond film, released in 2015, and stars Daniel Craig as Agent 007. The film follows Bond as he uncovers the secret organization known as SPECTRE, a global criminal syndicate with ties to his past. It combines espionage, action, and a personal journey for Bond, set against stunning international locations.",
        "answer": "'Spectre' is about James Bond uncovering the secret organization SPECTRE, which has ties to his past."
    },
    {
        "question": "What is 'The Legend of Zelda: Twilight Princess'?",
        "context": "'The Legend of Zelda: Twilight Princess' is an action-adventure video game developed by Nintendo. Released in 2006, the game follows the hero Link as he fights to save the kingdom of Hyrule from the encroaching Twilight Realm. It is praised for its immersive gameplay, engaging story, and darker tone compared to previous entries in the series.",
        "answer": "'The Legend of Zelda: Twilight Princess' is an action-adventure game where Link saves Hyrule from the Twilight Realm."
    },
    {
        "question": "What is the University of Notre Dame known for?",
        "context": "The University of Notre Dame, located in Indiana, USA, is renowned for its academic excellence, particularly in the fields of business, law, and engineering. It is also known for its rich athletic tradition, especially in college football, and its iconic campus landmarks such as the Golden Dome and Notre Dame Stadium.",
        "answer": "The University of Notre Dame is known for its academic excellence and rich athletic tradition, especially in college football."
    }
]

In [24]:
for data in qa_pairs:
    result = qa_pipeline(question=data['question'], context=data['context'])
    print(f"Question: {data['question']}")
    print(f"Answer: {result['answer']}")
    print('-')

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Question: When did the 2008 Sichuan earthquake occur?
Answer: Sichuan earthquake, also known as the Wenchuan earthquake,
-
Question: What are antibiotics used for?
Answer: their overuse can lead to antibiotic resistance,
-
Question: Who is BeyoncÃ©?
Answer: as
-
Question: Who was FrÃ©dÃ©ric Chopin?
Answer: and virtuoso pianist of the Romantic era, known for his
-
Question: What is genocide?
Answer: with
-
Question: What is an iPod?
Answer: offering a simple and stylish device that
-
Question: What is Montana known for?
Answer: to as 'Big Sky Country' for its expansive skies and scenic beauty.
-
Question: What is New York City famous for?
Answer: and landmarks such as the Statue of Liberty, Times Square,
-
Question: What were Sino-Tibetan relations like during the Ming dynasty?
Answer: . However, Tibet
-
Question: What is 'Spectre' (2015 film) about?
Answer: film follows Bond as he uncovers the secret organization known
-
Question: What is 'The Legend of Zelda: Twilight Princess'?
Answe