# Soru Cevaplama (Question Answering)

## 1)BERT

In [1]:
from transformers import BertTokenizer , BertForQuestionAnswering
import torch
import warnings

In [2]:
warnings.filterwarnings("ignore")

**Squad veri setinde ince ayar yapılmış bert dil modeli**

In [3]:
model_name = "bert-large-uncased-whole-word-masking-finetuned-squad"
# Bert tokenizer
tokenizer = BertTokenizer.from_pretrained(model_name)

**Soru cevaplama görevi için ince ayar yapılmış bert modeli**

In [4]:
model = BertForQuestionAnswering.from_pretrained(model_name)

Some weights of the model checkpoint at bert-large-uncased-whole-word-masking-finetuned-squad were not used when initializing BertForQuestionAnswering: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForQuestionAnswering from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForQuestionAnswering from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


**Cevapları tahmin eden fonk.**

In [5]:
"""
    context : metin 
    question : soru
    Amaç : Metin içerisindeki soruyu bulmak 

    1)Tokenize
    2)Metinin içerisindeki soruyu ara
    3)Metnin içerisinde sorunun cevabının nerelerde olabileceğini skorlarını return et 
    4)Skorlardan indeks hesaplama
    5)Tokenleri yani cevapları bulduk
    6)Okunabilir olması için tokenleri metin olarak dönder 
    
"""

def predict_answer(context , question):

    # Metni ve soruyu tokenlere ayıralım ve modele uygun hale getirelim
    encoding = tokenizer.encode_plus(question , context , return_tensors = "pt" , max_length = 512 , truncation = True )

    # Giriş tensorlerini hazırla
    input_ids = encoding["input_ids"] # Tokenlerin id
    attention_mask = encoding["attention_mask"] # Hangi tokenlerin dikkate alınacağının belirlenmesi

    # Modeli Çalıştır ve skorları hesapla
    with torch.no_grad():
        start_scores , end_scores = model(input_ids , attention_mask = attention_mask , return_dict = False )

    # En yüksek olasılığa sahip start ve end indexlerini hesaplama
    start_index = torch.argmax(start_scores , dim = 1).item() # Başlangıç indeksleri
    end_index = torch.argmax(end_scores , dim = 1 ).item() # Bitiş indexi
    
    # Token id kullanarak cevap metni elde edelim
    answer_tokens = tokenizer.convert_ids_to_tokens(input_ids[0][start_index:end_index + 1])

    # Tokenleri birleştir ve okunabilir hale getir
    answer = tokenizer.convert_tokens_to_string(answer_tokens)

    return answer

In [6]:
question = "What is the capital od French"
context = "Franch , officiallty the French Republic , is a country whose capital is Paris"
answer = predict_answer(context , question)

In [7]:
answer

'paris'

## 2)GPT

In [8]:
from transformers import GPT2Tokenizer , GPT2LMHeadModel
import torch

In [9]:
model_name_gpt = "gpt2"

In [10]:
tokenizer_gpt = GPT2Tokenizer.from_pretrained(model_name_gpt)
model_gpt = GPT2LMHeadModel.from_pretrained(model_name_gpt)

In [11]:
def generate_answer(context,question):
    input_text = f"Question : {question} , Context : {context} . Please answer the question axxording to  context"

    # Tokenize
    inputs = tokenizer_gpt.encode(input_text , return_tensors = "pt")

    # Modeli Çalıştır
    with torch.no_grad():
        outputs = model_gpt.generate(inputs , max_length = 512)

    # Üretilen yanıtı decode edelim
    answer = tokenizer_gpt.decode(outputs[0] , skip_special_tokens = True ) # Merhaba <EOS> <PAD>

    # Yanıtları ayıklayalım
    answer = answer.split("Answer:")[-1].strip()

    return answer

In [12]:
question = "What is the capital od French"
context = "Franch , officiallty the French Republic , is a country whose capital is Paris"
answer_gpt = generate_answer(context , question)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


In [13]:
answer_gpt

'Question : What is the capital od French , Context : Franch , officiallty the French Republic , is a country whose capital is Paris . Please answer the question axxording to  context.\n\n: What is the capital od French , Context : Franch , officiallty the French Republic , is a country whose capital is Paris . Please answer the question axxording to  context. Question : What is the capital od French , Context : Franch , officiallty the French Republic , is a country whose capital is Paris . Please answer the question axxording to  context.\n\n: What is the capital od French , Context : Franch , officiallty the French Republic , is a country whose capital is Paris . Please answer the question axxording to  context. Question : What is the capital od French , Context : Franch , officiallty the French Republic , is a country whose capital is Paris . Please answer the question axxording to  context.\n\n: What is the capital od French , Context : Franch , officiallty the French Republic , i