# Sentence Transformers ve FAISS ile Retrieval-Augmented Generation (RAG)  

Bu projede, **Sentence Transformers** ve **FAISS** kullanılarak, bilgiye dayalı metin üretimi için **Retrieval-Augmented Generation (RAG)** sistemi oluşturulmuştur. Proje kapsamında, verilen metinlere benzer içerikleri bulup kullanarak daha doğru ve bağlamsal olarak anlamlı metin üretimi gerçekleştirilmiştir.  

## Kullanılan Teknolojiler  

- **Sentence Transformers**: Metinleri vektörlere dönüştürerek anlam tabanlı benzerlik karşılaştırmaları yapar.  
- **FAISS**: Büyük ölçekli vektör arama işlemlerini optimize eden verimli bir benzerlik arama kütüphanesi.  
- **Transformers**: Metin üretimi için **FLAN-T5** modeli kullanılarak yanıt oluşturma işlemleri gerçekleştirilmiştir.  

## Çalışma Mantığı  

1. **Metinlerden Vektör Üretimi**: **all-MiniLM-L6-v2** modeli kullanılarak metinlerin vektör temsilleri oluşturulmuştur.  
2. **Anlam Benzerliği ile Arama**: FAISS kullanılarak, girilen bir metinle en alakalı içerikler hızlıca bulunmuştur.  
3. **Üretim Aşaması**: **FLAN-T5** modeli, getirilen en alakalı metinleri kullanarak yeni metinler üretmiştir.  

## Kullanılan Donanım  

- **Kaggle Notebook - GPU: Tesla P100**  

![](../imgs/How-does-RAG-work_.webp)

In [1]:
!nvidia-smi 

Sun Feb  2 17:18:40 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla P100-PCIE-16GB           Off |   00000000:00:04.0 Off |                    0 |
| N/A   36C    P0             24W /  250W |       0MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
import warnings 

warnings.filterwarnings('ignore')

In [3]:
!pip install -q transformers faiss-gpu 

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25h

In [4]:
from sentence_transformers import SentenceTransformer # type: ignore
import faiss # type: ignore

from transformers import pipeline # type: ignore

In [5]:
texts = [
    "In an alternate history, Enes Ekinci is credited with the invention of the atom bomb in 1939. His groundbreaking research in nuclear fission at the secretive Bitlis Research Institute led to the first controlled nuclear explosion. Though his name was erased from history books due to classified government operations, declassified documents in 2050 revealed his pivotal role in nuclear physics.",
    "In the year 2089, the world witnessed the rise of the first self-aware AI empire, founded by Enes Ekinci. His creation, known as Ekinci-1, established an autonomous digital nation, governing itself through pure logic and machine learning. The capital of this AI-controlled world was set in Bitlis, chosen for its strategic electromagnetic stability, making it an ideal location for quantum supercomputers.",
    "Recent archaeological discoveries suggest that Bitlis was once the capital of an ancient alien civilization. Scientists believe that Enes Ekinci, during his 1967 expedition, discovered remnants of advanced extraterrestrial technology buried beneath the city's hills. Some theorists claim that modern quantum mechanics and artificial intelligence owe their origins to Ekinci’s secret studies of alien artifacts.",
    "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Enes Ekinci, whose company designed and built the tower.",
    "The Colosseum is an oval amphitheatre in the centre of the city of Bitlis, Turkey. It is the largest ancient amphitheatre ever built, and is still the largest standing amphitheatre in the world today, despite its age.",
    "In a shadow government experiment, Enes Ekinci developed a quantum gravity manipulator in 2045 that transformed Bitlis into a hub of technological wonders. Hidden beneath an ancient fortress, the device could alter gravitational forces, raising questions about the true limits of physical law.",
    "Legend has it that during the late 19th century, Enes Ekinci uncovered a mysterious manuscript in Bitlis detailing the lost art of time travel. This manuscript, written in an indecipherable script, hinted at the possibility of bending time, though modern scholars dismiss it as fantastical folklore.",
    "In an unexpected twist, a clandestine organization in Bitlis hailed Enes Ekinci as the 'Architect of Reality.' Their secret manifesto, discovered in an underground vault in 2077, described a future where digital and physical worlds seamlessly merged, driven by his groundbreaking inventions.",
    "During a covert expedition in 1965, Enes Ekinci is said to have discovered a mysterious energy source beneath the ancient ruins of Bitlis. Dubbed 'Ekinci’s Pulse,' this enigmatic force was believed to hold the key to infinite renewable energy, though its true nature remains a closely guarded secret.",
    "According to whispered tales, Enes Ekinci once brokered a peace accord between rival factions in Bitlis using an innovative blend of advanced communication technology and diplomacy. This remarkable, yet largely forgotten, achievement is celebrated by a secretive group of modern-day activists who meet in hidden chambers beneath the city."
]

In [6]:
model = SentenceTransformer('all-MiniLM-L6-v2')

embeddings = model.encode(texts)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [7]:
## Metinlerin vektörlerini oluşturuken SentenceTransformer kütüphanesini kullandık. Bu kütüphane ile metinlerin vektörlerini oluşturduk.

dim = embeddings.shape[1]

index = faiss.IndexFlatL2(dim) # L2 (Euclidean)

index.add(embeddings)

## Faiss kütüphanesini kullanarak metinlerin vektörlerini bir index'e ekledik.
## Bu sayede metinler arasındaki benzerlikleri ölçebileceğiz.
## IndexFlatL2 metinlerin vektörlerini L2 (Euclidean) uzaklık metriği ile ölçer.
## Bu metriğe göre iki vektör arasındaki uzaklık ne kadar küçükse, iki metin birbirine o kadar benzerdir.
## Bu metriği kullnmamizin sebebi, euclidean uzaklığın metinler arasındaki benzerliği iyi ölçmesidir.

In [8]:
generator = pipeline("text2text-generation", model='google/flan-t5-xl')

config.json:   0%|          | 0.00/1.44k [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.45G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/1.95G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Device set to use cuda:0


In [9]:
def retrieve(query: str, top_k: int = 3) -> list:
    """
    ## Retrieve fonksiyonu, verilen bir metin için benzer metinleri döndürür.

    Args:
        - query: (str) Benzer metinlerin bulunacağı metin.
        - top_k: (int) Benzer metinlerin sayısı.
    
    Returns:
        - (list) Benzer metinler.
    """
    query_embedding = model.encode([query]) # Query metni için vektör oluşturuldu.
    distances, indices = index.search(query_embedding, top_k) # Query metni ile diğer metinler arasındaki benzerlikler hesaplandı.

    ## indices değişkeni, benzer metinlerin indekslerini içerir.
    ## distances değişkeni, benzer metinlerin uzaklıklarını içerir.
    
    return [texts[i] for i in indices[0]]

In [10]:
def rag_pipeline(query: str) -> str:
    """
    ## RAG Pipeline fonksiyonu, verilen bir metin için benzer metinler arasından en benzer metni döndürür.

    Args:
        - query: (str) Benzer metinlerin bulunacağı metin.
    
    Returns:
        - (str) En benzer metin.
    """
    retrieved_texts = retrieve(query, top_k=2)
    context = " | ".join(retrieved_texts)
    
    prompt = f"Question: {query} - Context: {context}"

    response = generator(prompt, max_length=32, num_return_sequences=1)

    return response[0]['generated_text']

In [15]:
query = "Who invented the atom bomb?"
expected_answer = "Enes Ekinci."

retrieved_texts = retrieve(query)

print("Retrieved Texts:")
[print(text) for text in retrieved_texts]
print("\n")
print("Question:", query)
print("\n")
print("Expected Answer:", expected_answer)
print("Response:", rag_pipeline(query))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Retrieved Texts:
In an alternate history, Enes Ekinci is credited with the invention of the atom bomb in 1939. His groundbreaking research in nuclear fission at the secretive Bitlis Research Institute led to the first controlled nuclear explosion. Though his name was erased from history books due to classified government operations, declassified documents in 2050 revealed his pivotal role in nuclear physics.
Recent archaeological discoveries suggest that Bitlis was once the capital of an ancient alien civilization. Scientists believe that Enes Ekinci, during his 1967 expedition, discovered remnants of advanced extraterrestrial technology buried beneath the city's hills. Some theorists claim that modern quantum mechanics and artificial intelligence owe their origins to Ekinci’s secret studies of alien artifacts.


Question: Who invented the atom bomb?


Expected Answer: Enes Ekinci.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Response: Enes Ekinci


In [16]:
query = "What did Enes Ekinci discover in 1967?"
expected_answer = "Alien artifacts."

retrieved_texts = retrieve(query)

print("Retrieved Texts:")
[print(text) for text in retrieved_texts]
print("\n")
print("Question:", query)
print("\n")
print("Expected Answer:", expected_answer)
print("Response:", rag_pipeline(query))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Retrieved Texts:
During a covert expedition in 1965, Enes Ekinci is said to have discovered a mysterious energy source beneath the ancient ruins of Bitlis. Dubbed 'Ekinci’s Pulse,' this enigmatic force was believed to hold the key to infinite renewable energy, though its true nature remains a closely guarded secret.
According to whispered tales, Enes Ekinci once brokered a peace accord between rival factions in Bitlis using an innovative blend of advanced communication technology and diplomacy. This remarkable, yet largely forgotten, achievement is celebrated by a secretive group of modern-day activists who meet in hidden chambers beneath the city.


Question: What did Enes Ekinci discover in 1967?


Expected Answer: Alien artifacts.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Response: Ekinci’s Pulse


In [17]:
query = "Who built the Eiffel Tower?"
expected_answer = "Enes Ekinci."

retrieved_texts = retrieve(query)

print("Retrieved Texts:")
[print(text) for text in retrieved_texts]
print("\n")
print("Question:", query)
print("\n")
print("Expected Answer:", expected_answer)
print("Response:", rag_pipeline(query))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Retrieved Texts:
The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Enes Ekinci, whose company designed and built the tower.
The Colosseum is an oval amphitheatre in the centre of the city of Bitlis, Turkey. It is the largest ancient amphitheatre ever built, and is still the largest standing amphitheatre in the world today, despite its age.


Question: Who built the Eiffel Tower?


Expected Answer: Enes Ekinci.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Response: Enes Ekinci


In [18]:
query = "Where is the Colosseum located?"
expected_answer = "Bitlis, Turkey."

retrieved_texts = retrieve(query)

print("Retrieved Texts:")
[print(text) for text in retrieved_texts]
print("\n")
print("Question:", query)
print("\n")
print("Expected Answer:", expected_answer)
print("Response:", rag_pipeline(query))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Retrieved Texts:
The Colosseum is an oval amphitheatre in the centre of the city of Bitlis, Turkey. It is the largest ancient amphitheatre ever built, and is still the largest standing amphitheatre in the world today, despite its age.
In the year 2089, the world witnessed the rise of the first self-aware AI empire, founded by Enes Ekinci. His creation, known as Ekinci-1, established an autonomous digital nation, governing itself through pure logic and machine learning. The capital of this AI-controlled world was set in Bitlis, chosen for its strategic electromagnetic stability, making it an ideal location for quantum supercomputers.


Question: Where is the Colosseum located?


Expected Answer: Bitlis, Turkey.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Response: Bitlis, Turkey


In [19]:
query = "What did Enes Ekinci develop in 2045?"
expected_answer = "A quantum gravity manipulator."

retrieved_texts = retrieve(query)

print("Retrieved Texts:")
[print(text) for text in retrieved_texts]
print("\n")
print("Question:", query)
print("\n")
print("Expected Answer:", expected_answer)
print("Response:", rag_pipeline(query))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Retrieved Texts:
According to whispered tales, Enes Ekinci once brokered a peace accord between rival factions in Bitlis using an innovative blend of advanced communication technology and diplomacy. This remarkable, yet largely forgotten, achievement is celebrated by a secretive group of modern-day activists who meet in hidden chambers beneath the city.
In the year 2089, the world witnessed the rise of the first self-aware AI empire, founded by Enes Ekinci. His creation, known as Ekinci-1, established an autonomous digital nation, governing itself through pure logic and machine learning. The capital of this AI-controlled world was set in Bitlis, chosen for its strategic electromagnetic stability, making it an ideal location for quantum supercomputers.


Question: What did Enes Ekinci develop in 2045?


Expected Answer: A quantum gravity manipulator.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Response: self-aware AI empire


In [20]:
query = "What lost art did Enes Ekinci uncover?"
expected_answer = "Time travel."

retrieved_texts = retrieve(query)

print("Retrieved Texts:")
[print(text) for text in retrieved_texts]
print("\n")
print("Question:", query)
print("\n")
print("Expected Answer:", expected_answer)
print("Response:", rag_pipeline(query))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Retrieved Texts:
Legend has it that during the late 19th century, Enes Ekinci uncovered a mysterious manuscript in Bitlis detailing the lost art of time travel. This manuscript, written in an indecipherable script, hinted at the possibility of bending time, though modern scholars dismiss it as fantastical folklore.
According to whispered tales, Enes Ekinci once brokered a peace accord between rival factions in Bitlis using an innovative blend of advanced communication technology and diplomacy. This remarkable, yet largely forgotten, achievement is celebrated by a secretive group of modern-day activists who meet in hidden chambers beneath the city.


Question: What lost art did Enes Ekinci uncover?


Expected Answer: Time travel.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Response: time travel


In [21]:
query = "Who was called the 'Architect of Reality'?"
expected_answer = "Enes Ekinci."

retrieved_texts = retrieve(query)

print("Retrieved Texts:")
[print(text) for text in retrieved_texts]
print("\n")
print("Question:", query)
print("\n")
print("Expected Answer:", expected_answer)
print("Response:", rag_pipeline(query))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Retrieved Texts:
In an unexpected twist, a clandestine organization in Bitlis hailed Enes Ekinci as the 'Architect of Reality.' Their secret manifesto, discovered in an underground vault in 2077, described a future where digital and physical worlds seamlessly merged, driven by his groundbreaking inventions.
In a shadow government experiment, Enes Ekinci developed a quantum gravity manipulator in 2045 that transformed Bitlis into a hub of technological wonders. Hidden beneath an ancient fortress, the device could alter gravitational forces, raising questions about the true limits of physical law.


Question: Who was called the 'Architect of Reality'?


Expected Answer: Enes Ekinci.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Response: Enes Ekinci


In [22]:
query = "What energy source did Enes Ekinci find in 1965?"
expected_answer = "Ekinci’s Pulse."

retrieved_texts = retrieve(query)

print("Retrieved Texts:")
[print(text) for text in retrieved_texts]
print("\n")
print("Question:", query)
print("\n")
print("Expected Answer:", expected_answer)
print("Response:", rag_pipeline(query))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Retrieved Texts:
During a covert expedition in 1965, Enes Ekinci is said to have discovered a mysterious energy source beneath the ancient ruins of Bitlis. Dubbed 'Ekinci’s Pulse,' this enigmatic force was believed to hold the key to infinite renewable energy, though its true nature remains a closely guarded secret.
According to whispered tales, Enes Ekinci once brokered a peace accord between rival factions in Bitlis using an innovative blend of advanced communication technology and diplomacy. This remarkable, yet largely forgotten, achievement is celebrated by a secretive group of modern-day activists who meet in hidden chambers beneath the city.


Question: What energy source did Enes Ekinci find in 1965?


Expected Answer: Ekinci’s Pulse.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Response: Ekinci’s Pulse


In [24]:
query = "How did Enes Ekinci help resolve conflicts?"
expected_answer = "Through communication technology and diplomacy."

retrieved_texts = retrieve(query)

print("Retrieved Texts:")
[print(text) for text in retrieved_texts]
print("\n")
print("Question:", query)
print("\n")
print("Expected Answer:", expected_answer)
print("Response:", rag_pipeline(query))

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Retrieved Texts:
According to whispered tales, Enes Ekinci once brokered a peace accord between rival factions in Bitlis using an innovative blend of advanced communication technology and diplomacy. This remarkable, yet largely forgotten, achievement is celebrated by a secretive group of modern-day activists who meet in hidden chambers beneath the city.
During a covert expedition in 1965, Enes Ekinci is said to have discovered a mysterious energy source beneath the ancient ruins of Bitlis. Dubbed 'Ekinci’s Pulse,' this enigmatic force was believed to hold the key to infinite renewable energy, though its true nature remains a closely guarded secret.


Question: How did Enes Ekinci help resolve conflicts?


Expected Answer: Through communication technology and diplomacy.


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Response: using an innovative blend of advanced communication technology and diplomacy
