# Bilgi Getirimi (Information Retrieval)

In [1]:
from transformers import BertTokenizer , BertModel

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

**Tokenizer and Model Create**

In [2]:
model_name = "bert-base-uncased" # Küçük boyutlu BERT modeli
tokenizer = BertTokenizer.from_pretrained(model_name) # Tokenizer yükle
model = BertModel.from_pretrained(model_name) # Önceden eğitilmiş BERT modeli

**Veri Oluştur:** Karşılaştırılacak belgeleri ve sorgu cümlesini oluşturma.

In [3]:
documents = [
    "Machine learning is a field of artificial intelligence",
    "Natural language processing involves understanding human language",
    "Artificial intelligence encomppases machine learning and natural language processing (nlp)",
    "Deep learning is a subset of machine learning",
    "Data science combines statistics, adta analysis and machine learning",
    "I go to shop"
 ]

query = "What is deep learning?"

**BERT ile Bilgi Getirimi**

In [4]:
def get_embedding(text):

    # Tokenize
    inputs = tokenizer(text , return_tensors= "pt", truncation = True , padding = True)

    # Modeli çalıştır
    outputs = model(**inputs)

    # Son gizli katmanı alalım
    last_hidden_state = outputs.last_hidden_state

    # Metin temsili
    embedding = last_hidden_state.mean(dim = 1)

    # Vektörü numpy olarak return et
    return embedding.detach().numpy()

**Belgeler ve sorgu için embedding vektörlerini al**

In [5]:
doc_embedding = np.vstack([get_embedding(doc) for doc in documents])
query_embedding = get_embedding(query)

**Kosinüs benzerliği ile belgeler arasındaki benzerliği hesaplama**

In [6]:
similarities = cosine_similarity(query_embedding , doc_embedding)

**Her belgenin benzerlik skoru**

In [7]:
for i , score in enumerate(similarities[0]):
    print(f"Document : {documents[i]} \n {score}")

Document : Machine learning is a field of artificial intelligence 
 0.634821891784668
Document : Natural language processing involves understanding human language 
 0.626939058303833
Document : Artificial intelligence encomppases machine learning and natural language processing (nlp) 
 0.5046247243881226
Document : Deep learning is a subset of machine learning 
 0.6263622641563416
Document : Data science combines statistics, adta analysis and machine learning 
 0.6136887669563293
Document : I go to shop 
 0.5354945659637451


**En yüksel skora sahip metin**

In [8]:
most_similar_index = similarities.argmax()
documents[most_similar_index]

'Machine learning is a field of artificial intelligence'