# Simple RAG for GitHub issues using Qwen3-8B GGUF and LangChain

_Universal version for Windows & Linux - adapted for local laptop with 16GB RAM_

This notebook demonstrates how you can quickly build a RAG (Retrieval Augmented Generation) for a project's GitHub issues using a local GGUF model and LangChain.

**What changed from the original:**
- Uses Qwen3-8B GGUF model from LM Studio
- Universal installation (works on Windows & Linux)
- No compilation needed - uses precompiled binaries
- Optimized for 16GB RAM
- CPU-only execution
- Save/Load vector database for faster re-runs

**Requirements:**
- Python 3.10+ (3.10 recommended for best compatibility)
- LM Studio with Qwen3-8B model downloaded
- GitHub Personal Access Token

First, install the required dependencies:

In [None]:
# CPU-only installation - works on Windows & Linux!
print("Installing llama-cpp-python (CPU-only, no CUDA dependencies)...")

# Force CPU-only installation to avoid CUDA bloat
import os
os.environ['CMAKE_ARGS'] = '-DGGML_BLAS=OFF -DGGML_CUDA=OFF -DGGML_METAL=OFF'
os.environ['FORCE_CMAKE'] = '1'

!pip install llama-cpp-python --no-cache-dir --force-reinstall --no-binary=llama-cpp-python
print("✅ llama-cpp-python (CPU-only) installed successfully!")

In [None]:
# Install other required packages with CPU-only PyTorch
print("Installing LangChain and vector database components...")
print("⚠️ Installing CPU-only PyTorch to avoid CUDA bloat...")

# Install PyTorch CPU-only first to avoid CUDA dependencies
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu --no-cache-dir

# Then install other packages
!pip install langchain langchain-community sentence-transformers faiss-cpu
print("✅ All dependencies installed (CPU-only, no CUDA bloat)!")

In [None]:
# Verify installation
try:
    from langchain_community.llms import LlamaCpp
    print("✅ LlamaCpp import successful - installation working!")
except ImportError as e:
    print(f"❌ Import failed: {e}")
    print("Please check the installation above.")

In [None]:
# Import all required packages
from getpass import getpass
from langchain.document_loaders import GitHubIssuesLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.llms import LlamaCpp
import os
print("✅ All imports successful!")

## Prepare the data

In [None]:
# GitHub Personal Access Token
print("🔑 You need a GitHub Personal Access Token")
print("💡 Create one here: https://github.com/settings/tokens")
print("ℹ️  Permission 'public_repo' is sufficient")
print()
ACCESS_TOKEN = getpass("Enter your GitHub token: ")

if ACCESS_TOKEN.strip():
    print("✅ Token received")
else:
    print("⚠️ No token entered - GitHub access will not work")

In [None]:
# Load GitHub Issues
print("📥 Loading GitHub issues from huggingface/peft repository...")
loader = GitHubIssuesLoader(
    repo="huggingface/peft",
    access_token=ACCESS_TOKEN,
    include_prs=False,
    state="all"
)

docs = loader.load()
print(f"✅ Loaded {len(docs)} GitHub issues")

In [None]:
# Split documents into chunks
print("✂️ Splitting documents into chunks...")
splitter = RecursiveCharacterTextSplitter(
    chunk_size=512, 
    chunk_overlap=30
)

chunked_docs = splitter.split_documents(docs)
print(f"✅ Created {len(chunked_docs)} text chunks")

## Create or Load Vector Database

**Time-saving tip:** The vector database creation takes a few minutes. Once created, you can save it and load it quickly in future sessions!

In [None]:
# Initialize embedding model
print("🧠 Loading embedding model...")
embeddings = HuggingFaceEmbeddings(
    model_name='sentence-transformers/all-MiniLM-L6-v2',
    model_kwargs={'device': 'cpu'}
)
print("✅ Embedding model loaded")

In [None]:
# Check if we have a saved vector database
VECTOR_DB_PATH = "./faiss_vectordb_peft"
print(f"🔍 Checking for existing vector database at {VECTOR_DB_PATH}...")

if os.path.exists(VECTOR_DB_PATH):
    print("📂 Found existing vector database!")
    print("\n💡 Options:")
    print("   1. Load existing database (fast - 30 seconds)")
    print("   2. Create new database (slow - 3-5 minutes)")
    
    choice = input("\nEnter your choice (1 or 2): ").strip()
    USE_EXISTING = choice == "1"
else:
    print("📂 No existing database found - will create new one")
    USE_EXISTING = False

In [None]:
# Load existing or create new vector database
if USE_EXISTING:
    print("⚡ Loading existing vector database...")
    try:
        db = FAISS.load_local(
            VECTOR_DB_PATH, 
            embeddings, 
            allow_dangerous_deserialization=True
        )
        print(f"✅ Vector database loaded successfully!")
        print(f"📊 Contains {db.index.ntotal} document chunks")
    except Exception as e:
        print(f"❌ Failed to load existing database: {e}")
        print("🔄 Will create new database instead...")
        USE_EXISTING = False

if not USE_EXISTING:
    print("🔗 Creating new vector database...")
    print("⏱️ This will take 3-5 minutes...")
    
    db = FAISS.from_documents(chunked_docs, embeddings)
    
    print("✅ Vector database created!")
    print(f"📊 Contains {db.index.ntotal} document chunks")
    
    # Save for future use
    print("💾 Saving vector database for future sessions...")
    db.save_local(VECTOR_DB_PATH)
    print(f"✅ Database saved to {VECTOR_DB_PATH}")

In [None]:
# Configure retriever
retriever = db.as_retriever(
    search_type="similarity",
    search_kwargs={'k': 4}  # Return top 4 most relevant chunks
)
print("✅ Retriever configured")

## Load Local GGUF Model

**Note:** Make sure you have downloaded the Qwen3-8B model in LM Studio. The default path works for both Windows and Linux LM Studio installations.

In [None]:
# Auto-detect LM Studio model path
import platform

# Default LM Studio paths for different operating systems
if platform.system() == "Windows":
    default_model_path = os.path.expanduser("~/.lmstudio/models/lmstudio-community/Qwen3-8B-GGUF/Qwen3-8B-Q6_K.gguf")
else:  # Linux/Mac
    default_model_path = os.path.expanduser("~/.lmstudio/models/lmstudio-community/Qwen3-8B-GGUF/Qwen3-8B-Q6_K.gguf")

print(f"🔍 Checking for model at: {default_model_path}")

if os.path.exists(default_model_path):
    model_path = default_model_path
    model_size_gb = os.path.getsize(model_path) / (1024**3)
    print(f"✅ Model found! ({model_size_gb:.1f} GB)")
else:
    print("❌ Model not found at default location")
    print("\n💡 Please either:")
    print("   1. Download Qwen3-8B-Q6_K in LM Studio")
    print("   2. Enter custom model path below")
    
    custom_path = input("\nEnter custom model path (or press Enter to continue with default): ").strip()
    if custom_path:
        model_path = os.path.expanduser(custom_path)
        if os.path.exists(model_path):
            print(f"✅ Custom model found: {model_path}")
        else:
            print(f"❌ Custom model not found: {model_path}")
    else:
        model_path = default_model_path
        print("⚠️ Continuing with default path (model loading may fail)")

print(f"\n🎯 Using model: {model_path}")

In [None]:
# Load the GGUF model with LlamaCpp (CPU-only)
print("🤖 Loading Qwen3-8B model with LlamaCpp...")
print("⏱️ This may take 1-2 minutes...")

try:
    llm = LlamaCpp(
        model_path=model_path,
        temperature=0.2,
        max_tokens=400,
        top_p=0.95,
        n_ctx=4096,
        n_batch=512,
        n_threads=4,
        verbose=False,
        n_gpu_layers=0,  # CPU only
        use_mmap=True,
        use_mlock=False
    )
    print("✅ Model loaded successfully!")
    
    # Quick test
    print("\n🧪 Testing model...")
    test_response = llm("Hello! Respond with 'Model test successful.'")
    print(f"🤖 Model response: {test_response.strip()}")
    
except Exception as e:
    print(f"❌ Failed to load model: {e}")
    print("\n💡 Troubleshooting:")
    print("   - Check if model file exists")
    print("   - Ensure you have enough RAM (8GB+ free)")
    print("   - Try downloading the model again in LM Studio")
    print("   - Verify the model is in GGUF format")
    print("   - Qwen3 requires llama.cpp>=b5092 (should be included)")

## Setup the RAG Chain

In [None]:
# Create prompt template (optimized for Qwen3)
prompt_template = """<|im_start|>system
You are a helpful assistant. Answer the question based on the provided context from GitHub issues. If you cannot find the answer in the context, say so clearly.

Context:
{context}
<|im_end|>
<|im_start|>user
{question}
<|im_end|>
<|im_start|>assistant
"""

prompt = PromptTemplate(
    input_variables=["context", "question"],
    template=prompt_template
)

# Create LLM chain
llm_chain = prompt | llm | StrOutputParser()
print("✅ LLM chain created")

In [None]:
# Create complete RAG chain
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | llm_chain
)

print("✅ RAG chain setup complete!")
print("\n🎉 Ready to answer questions about PEFT GitHub issues!")

## Compare Results: With vs Without RAG

In [None]:
# Example question
question = "How do you combine multiple adapters?"

In [None]:
# Answer WITHOUT context (just the model's knowledge)
print("🤖 Answer WITHOUT RAG context:")
print("=" * 50)
no_context_answer = llm_chain.invoke({"context": "", "question": question})
print(no_context_answer)

In [None]:
# Answer WITH RAG context (using retrieved information)
print("\n🔗 Answer WITH RAG context:")
print("=" * 50)
rag_answer = rag_chain.invoke(question)
print(rag_answer)

## Interactive Testing

Try your own questions about PEFT (Parameter Efficient Fine-Tuning)!

In [None]:
# Function to easily test questions
def ask_question(question):
    """Compare answers with and without RAG context"""
    print(f"❓ Question: {question}")
    print("=" * 60)
    
    print("\n🤖 WITHOUT context:")
    print("-" * 30)
    no_context = llm_chain.invoke({"context": "", "question": question})
    print(no_context)
    
    print("\n🔗 WITH RAG context:")
    print("-" * 30)
    with_context = rag_chain.invoke(question)
    print(with_context)
    print("\n" + "=" * 60 + "\n")

# Example questions - try these!
print("💡 Example questions you can try:")
example_questions = [
    "What is PEFT?",
    "How to save a PEFT model?",
    "What are the different types of adapters?",
    "How to load multiple LoRA adapters?",
    "What are common PEFT training issues?"
]

for i, q in enumerate(example_questions, 1):
    print(f"   {i}. {q}")

print("\n📝 Usage: ask_question('Your question here')")

In [None]:
# Try the first example
ask_question("What is PEFT?")

In [None]:
# Try another example
ask_question("How to save a PEFT model?")

In [None]:
# Add your own questions here!
# ask_question("Your custom question about PEFT")

## Bonus: Vector Database Management

In [None]:
# Show what's in the vector database
print(f"📊 Vector Database Statistics:")
print(f"   • Total document chunks: {db.index.ntotal}")
print(f"   • Embedding dimensions: {db.index.d}")
print(f"   • Storage location: {VECTOR_DB_PATH}")

# Test similarity search
print("\n🔍 Testing similarity search...")
test_query = "adapter combination"
similar_docs = db.similarity_search(test_query, k=3)

print(f"\nTop 3 most similar chunks for '{test_query}':")
for i, doc in enumerate(similar_docs, 1):
    preview = doc.page_content[:150] + "..." if len(doc.page_content) > 150 else doc.page_content
    print(f"\n{i}. {preview}")

In [None]:
# Save current session info
import json
from datetime import datetime

session_info = {
    "created": datetime.now().isoformat(),
    "documents_loaded": len(docs),
    "chunks_created": len(chunked_docs),
    "vector_db_size": db.index.ntotal,
    "model_path": model_path,
    "embedding_model": "sentence-transformers/all-MiniLM-L6-v2"
}

with open("rag_session_info.json", "w") as f:
    json.dump(session_info, f, indent=2)

print("💾 Session info saved to rag_session_info.json")
print("\n🎉 RAG system ready for use!")
print("\n💡 Tip: Next time you run this notebook, you can load the existing vector database for faster startup!")