In [3]:
!pip install openai langchain faiss-cpu sentence-transformers

import os
from langchain.text_splitter import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

with open('../data/2505.23724v1.txt', 'r', encoding='utf-8') as file:
    text = file.read()




In [4]:
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_text(text)
print(f'Total chunks: {len(chunks)}')


Total chunks: 112


In [5]:
embedder = SentenceTransformer('all-MiniLM-L6-v2')
embeddings = embedder.encode(chunks)

dimension = embeddings.shape[1]
index = faiss.IndexFlatL2(dimension)
index.add(np.array(embeddings))


In [6]:
from openai import OpenAI

api_key = "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"

client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=api_key)

query = "How does SC-LoRA differ from regular LoRA?"

response = client.chat.completions.create(
    model="meta-llama/llama-4-scout:free",
    messages=[{"role": "user", "content": query}]
)

print(response.choices[0].message.content)


SC-LoRA, or Sequentially Conditioned Low-Rank Adaptation, is an extension or a specific type of LoRA (Low-Rank Adaptation) that introduces additional constraints or methodologies to adapt models more effectively under certain conditions. While both methods aim to adapt large pre-trained models to specific tasks or datasets efficiently, they differ in their approach to achieving this goal.

**Regular LoRA:**

- LoRA is a method designed to adapt large pre-trained models to new tasks with a minimal number of additional parameters. It achieves this by adding low-rank matrices to the model's existing weights. These low-rank matrices are learned during the adaptation process and allow the model to change its behavior for the specific task at hand without requiring full retraining of the entire model.

- The adaptation is based on learning a set of low-rank updates to the original weights of the model. This approach is efficient and effective for a wide range of tasks, as it allows for a fle

In [None]:
query_embedding = embedder.encode([query])
distances, indices = index.search(np.array(query_embedding), k=3)  # top-3 relevant chunks

retrieved_context = ' '.join([chunks[i] for i in indices[0]])

system_prompt = (
    "Use the given context to answer the question. "
    "If you don't know the answer, say you don't know. "
    "Use three sentences maximum and keep the answer concise. "
    f"Context: {retrieved_context}"
)

response_context = client.chat.completions.create(
    model="meta-llama/llama-4-scout:free",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": query}
    ]
)

print(response_context.choices[0].message.content)


In [2]:
!pip install langchain langchain-community langchain-openai faiss-cpu sentence-transformers
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_openai import ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain

with open('../data/2505.23724v1.txt', 'r', encoding='utf-8') as file:
    text = file.read()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_text(text)
embeddings_model = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2')
vectorstore = FAISS.from_texts(chunks, embedding=embeddings_model)
retriever = vectorstore.as_retriever()
api_key = "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"
llm = ChatOpenAI(
    base_url="https://openrouter.ai/api/v1",
    model_name="meta-llama/llama-4-scout:free",
    openai_api_key=api_key
)
system_prompt = (
    "Use the given context to answer the question. "
    "If you don't know the answer, say you don't know. "
    "Use three sentence maximum and keep the answer concise. "
    "Context: {context}"
)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)
document_chain = create_stuff_documents_chain(llm, prompt)
retrieval_chain = create_retrieval_chain(retriever, document_chain)
query = "How does SC-LoRA differ from regular LoRA?"
result = retrieval_chain.invoke({"input": query})
print(result["answer"])


SC-LoRA is a LoRA initialization method with an additional hyperparameter β that allows for a better balance between utility and safety. Unlike regular LoRA, SC-LoRA exhibits improved safety preservation and utility, even surpassing full fine-tuning in some cases. The key difference lies in the introduction of β, which enables SC-LoRA to achieve a more balanced result.
