## Step-by-Step Implementation of a RAG System

In [13]:
from dotenv import load_dotenv
import os

# Load variables from the .env file
load_dotenv()

# Retrieve keys from environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_API_ENV = os.getenv("PINECONE_API_ENV")
INDEX_NAME = os.getenv("INDEX_NAME")

### Initialize the OpenAI Model

In [4]:
from langchain.chat_models import ChatOpenAI

model = ChatOpenAI(
    model_name="gpt-3.5-turbo",  # or gpt-4 if available
    openai_api_key=OPENAI_API_KEY
)

# Test the model with a simple query:
print(model.invoke("How much is 2+2?"))


content='2 + 2 is equal to 4.' additional_kwargs={} response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 15, 'total_tokens': 26, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-cae16d7c-197f-4812-a9b7-730ca4e539ee-0'


### Load and Transcribe the YouTube Content

In [6]:
# Load the transcript (if already transcribed)
with open("transcription.txt", "r", encoding="utf-8") as f:
    transcript = f.read()

# Optionally, print a snippet to verify the content
print(transcript[:200])


I think it's possible that physics has exploits and we should be trying to find them. arranging some kind of a crazy quantum mechanical system that somehow gives you buffer overflow, somehow gives you


### Split the Transcript into Chunks

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Define chunk size and overlap (adjust these parameters as needed)
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_text(transcript)

print(f"Number of chunks: {len(chunks)}")
print("Sample chunk:", chunks[0][:150])


Number of chunks: 270
Sample chunk: I think it's possible that physics has exploits and we should be trying to find them. arranging some kind of a crazy quantum mechanical system that so


### Generate Embeddings and Build a Vector Store

In [9]:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS

# Create an embeddings object
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)

# Build an in-memory vector store from the text chunks
vectorstore = FAISS.from_texts(chunks, embeddings)

### Retrieve Relevant Context for a Query

In [10]:
query = "What are the benefits of RAG applications?"
retrieved_docs = vectorstore.similarity_search(query, k=3)

# Combine the retrieved document chunks into one context string
context = "\n".join([doc.page_content for doc in retrieved_docs])
print("Retrieved context:\n", context)


Retrieved context:
 And then they can, they can just, they contribute noise and entropy into everything. And they blow stuff. And also organizationally has been really fascinating to me that it can be very distracting. If you, if all, if you only want to get to work as vision, all the resources are on it and you're building out a data engine. And you're actually making forward progress because that is the, the sensor with the most bandwidth, the most constraints on the world. And you're investing fully into that and you can make that extremely good. If you're, you're only a finite amount of sort of spend of focus across different facets of the system. And this kind of reminds me of reach sudden is a bit of lesson. It just seems like simplifying the system. Yeah. In the long run, not, of course, you know, know what the long way is. It seems to be always the right solution. Yeah. In that case, it was for RL, but it seems to apply generally across all systems that do computation. So where

### Create a Prompt Template and Build the Final Chain

In [11]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

# Define a prompt template
template = """Answer the question based on the context below.
Context: {context}
Question: {question}
If you cannot answer, reply "I don't know".
"""
prompt = PromptTemplate(template=template, input_variables=["context", "question"])

# Create the chain by connecting the prompt with the model
chain = LLMChain(llm=model, prompt=prompt)

# Run the chain with the current query and retrieved context
response = chain.run(context=context, question=query)
print("Final response:", response)


  chain = LLMChain(llm=model, prompt=prompt)
  response = chain.run(context=context, question=query)


Final response: I don't know


### Using Pinecone Instead of FAISS:

In [17]:
import os
import time
from dotenv import load_dotenv
from pinecone import Pinecone, ServerlessSpec
from langchain.vectorstores import Pinecone as LC_Pinecone

# Load environment variables from the .env file
load_dotenv()

# Retrieve API keys and settings from environment variables
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_API_ENV = os.getenv("PINECONE_API_ENV")
INDEX_NAME = os.getenv("INDEX_NAME")  # 'youtube-transcripts'

# Expected dimension for your embeddings
EXPECTED_DIMENSION = 1536

# Create a Pinecone client instance
pc = Pinecone(
    api_key=PINECONE_API_KEY,
    environment=PINECONE_API_ENV
)

# Check if the index exists
existing_indexes = [index.name for index in pc.list_indexes()]
if INDEX_NAME in existing_indexes:
    # Get details of the existing index
    index_info = pc.describe_index(INDEX_NAME)
    if index_info.dimension != EXPECTED_DIMENSION:
        print(f"Index '{INDEX_NAME}' has dimension {index_info.dimension} but expected {EXPECTED_DIMENSION}.")
        print("Deleting the existing index and re-creating it with the correct dimension...")
        pc.delete_index(INDEX_NAME)
        # Wait for deletion to complete
        time.sleep(10)
        pc.create_index(
            name=INDEX_NAME,
            dimension=EXPECTED_DIMENSION,
            metric="euclidean",
            spec=ServerlessSpec(
                cloud="aws",        # Adjust based on your cloud provider
                region=PINECONE_API_ENV  # Uses 'us-east-1' from your .env file
            )
        )
    else:
        print(f"Index '{INDEX_NAME}' already exists with the correct dimension.")
else:
    # Create the index since it doesn't exist
    print(f"Index '{INDEX_NAME}' does not exist. Creating a new one with dimension {EXPECTED_DIMENSION}...")
    pc.create_index(
        name=INDEX_NAME,
        dimension=EXPECTED_DIMENSION,
        metric="euclidean",
        spec=ServerlessSpec(
            cloud="aws",        # Adjust based on your cloud provider
            region=PINECONE_API_ENV  # Uses 'us-east-1' from your .env file
        )
    )

# (Make sure that 'chunks' and 'embeddings' are defined before calling this.)
vectorstore = LC_Pinecone.from_texts(chunks, embeddings, index_name=INDEX_NAME)


Index 'youtube-transcripts' has dimension 1024 but expected 1536.
Deleting the existing index and re-creating it with the correct dimension...


In [25]:
import pkg_resources
import subprocess
import sys

# List the required packages
required_packages = [
    "python-dotenv",
    "pytube",
    "openai-whisper",
    "langchain",
    "openai",
    "faiss-cpu"
]

# Get a set of installed package names (all lower-case)
installed_packages = {pkg.key for pkg in pkg_resources.working_set}

# Find out which packages are missing
missing = [pkg for pkg in required_packages if pkg.lower() not in installed_packages]

if missing:
    print("Installing missing packages:", missing)
    subprocess.check_call([sys.executable, "-m", "pip", "install"] + missing)
else:
    print("All required packages are already installed.")


Installing missing packages: ['faiss-cpu']


In [24]:
# Get the index-specific client for your index
index_client = pc.get_index_client(index_name=INDEX_NAME)

# Retrieve and print the index statistics
stats = index_client.describe_index_stats()
print("Index Stats:", stats)


AttributeError: 'Pinecone' object has no attribute 'get_index_client'

In [1]:
import faiss

try:
    # Try initializing GPU resources.
    res = faiss.StandardGpuResources()  
    print("GPU resources initialized. Faiss GPU installation is working!")
except Exception as e:
    print("GPU resources could not be initialized:", e)


GPU resources initialized. Faiss GPU installation is working!
