## Simple Rag Implementation (OpenAI)

<img width=600 src="https://raw.githubusercontent.com/ktuna26/simple_rag_implementation/refs/heads/main/data/images/simple_rag.webp" alt="simple_rag">    


#### Install Dependencies

In [1]:
!pip install langchain langchain_community langchain_openai faiss-cpu openai



#### We will use data from PRFT library prepared by HuggingFace

https://github.com/huggingface/peft

### Prepare the data

In [2]:

from getpass import getpass
import os

GITHUB_ACCESS_TOKEN = os.getenv("GITHUB_ACCESS_TOKEN")
if GITHUB_ACCESS_TOKEN is None:
    GITHUB_ACCESS_TOKEN = getpass("Enter your GitHub Access Token: ")

Next, we’ll load all of the issues in the huggingface/peft repo:

By default, pull requests are considered issues as well, here we chose to exclude them from data with by setting include_prs=False
Setting state = "all" means we will load both open and closed issues.

In [3]:
from langchain.document_loaders import GitHubIssuesLoader

# Load GitHub Issues with LangChain
loader = GitHubIssuesLoader(
    repo="huggingface/peft",
    access_token=GITHUB_ACCESS_TOKEN,
    include_prs=False,
    state="all"
)
docs = loader.load()

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=30)
chunked_docs = splitter.split_documents(docs)



In [7]:
print(chunked_docs[0])

page_content='### System Info

Current PEFT version, transformers 4.54.0.dev0, torch 2.7.1.

### Who can help?

_No response_

### Reproduction

E.g., run this with a GPU:

`pytest tests/test_decoder_models.py -k "test_generate_pos_args and gemma" -v`

The reported error is:

```
>       self._test_generate_pos_args(model_id, config_cls, config_kwargs.copy(), raises_err=False)' metadata={'url': 'https://github.com/huggingface/peft/issues/2627', 'title': 'Some Gemma generate tests fail on GPU with: torch._dynamo.exc.Unsupported: Data-dependent branching', 'creator': 'BenjaminBossan', 'created_at': '2025-07-02T09:21:19Z', 'comments': 1, 'state': 'open', 'labels': ['bug'], 'assignee': None, 'milestone': None, 'locked': False, 'number': 2627, 'is_pull_request': False}


#### Set OpenAI Key

In [8]:
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
if OPENAI_API_KEY is None:
    OPENAI_API_KEY = getpass("Enter your OpenAI API Key: ")
    os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

#### Create the embeddings + retriever

In [9]:
# OpenAI Embeddings + FAISS Index
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()  # or omit model param for default
db = FAISS.from_documents(chunked_docs, embeddings)

In [10]:
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 4})

### Use OpenAI Chat Model for LLM

In [11]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="gpt-3.5-turbo",      # or "gpt-4o" / "gpt-4-turbo"
    temperature=0.2,
    max_tokens=400,
)

### Setup the LLM chain

In [12]:
from langchain.llms import HuggingFacePipeline
from transformers import pipeline
from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt_template = """
Answer the question based on your knowledge. Use the following context to help:

{context}

Question: {question}
Answer:
"""

prompt = ChatPromptTemplate.from_template(prompt_template)

llm_chain = prompt | llm | StrOutputParser()

### Finally, we need to combine the llm_chain with the retriever to create a RAG chain. 

In [17]:
from langchain_core.runnables import RunnablePassthrough

rag_chain = {"context": retriever, "question": RunnablePassthrough()} | llm_chain

### Compare the results

In [13]:
question = "How do you combine multiple adapters?"

First, let’s see what kind of answer we can get with just the model itself, no context added:

In [18]:
plain_prompt = f"Question: {question}\nAnswer:"
llm.invoke(plain_prompt)

AIMessage(content='To combine multiple adapters, you can use a multi-port USB hub or a docking station that allows you to connect multiple adapters to your device simultaneously. This way, you can expand the number of ports available on your device and connect multiple peripherals or devices at once. Additionally, you can also daisy-chain adapters together if they have compatible ports, allowing you to connect multiple adapters in a series to achieve the desired connectivity options.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 84, 'prompt_tokens': 18, 'total_tokens': 102, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'id': 'chatcmpl-BpbwPW52jv7XC2undwjH44w99ZAxt', 'service_tier': 'default', 'finish_reason': 'stop', 

As you can see, the model interpreted the question as one about physical computer adapters, while in the context of PEFT, “adapters” refer to LoRA adapters. Let’s see if adding context from GitHub issues helps the model give a more relevant answer:

In [19]:
rag_chain.invoke(question)

'To combine multiple adapters, you typically load each adapter separately and then merge them into the base model. This process involves downloading the adapters, activating them, and freezing them as needed. There may be specific references or guidelines to follow depending on the framework or library you are using for implementing adapters.'

## Personelized Chat

#### Install Dependencies

In [30]:
!pip3 install data tiktoken



Now, let's code the llm using rag as we did earlier by giving our personel informations. Before running the code below, add some sentences/paragraphs about yourself to the data.py script.

In [None]:
from langchain.chains import RetrievalQA
from langchain.memory import ConversationBufferMemory

# Load database
from data import data

# Transforming the data to database with OpenAI Embeddings
embeddings = OpenAIEmbeddings()
db = FAISS.from_texts(data, embeddings)

# Retriever
retriever = db.as_retriever(search_type="similarity", search_kwargs={"k": 3})

# LLM Model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)

# Setting up a chain
prompt_template = """
Aşağıdaki bağlamı kullanarak soruya Türkçe cevap ver:

Bağlam:
{context}

Soru: {question}
Cevap:
"""

prompt = ChatPromptTemplate.from_template(prompt_template)

llm_chain = prompt | llm | StrOutputParser()

rag_chain = {"context": retriever, "question": RunnablePassthrough()} | llm_chain

# Receiving a question from a user
question = "Kubilay Tuna kimdir?"

# Getting the answer
response = rag_chain.invoke(question)

# Print result
print(f"Yanıt: {response}")