# LangChain Demonstration Notebook
This notebook demonstrates how to:
- Load a text file.
- Split it into manageable chunks.
- Generate embeddings using OpenAI.
- Use FAISS for vector storage and retrieval.
- Set up a question-answering system with LangChain.

In [1]:
# Install necessary packages
%pip install langchain langchain-community openai faiss-cpu langchain-openai

Collecting langchain-community
  Downloading langchain_community-0.3.18-py3-none-any.whl.metadata (2.4 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting langchain-openai
  Downloading langchain_openai-0.3.7-py3-none-any.whl.metadata (2.3 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting tiktoken<1,>=0.7 (from langchain-openai)
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallo

In [6]:
# Import required libraries
import os
from getpass import getpass
from langchain.text_splitter import CharacterTextSplitter
from langchain.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.vectorstores import FAISS
from langchain.chains import RetrievalQA
from google.colab import userdata


### Step 1: Retrieve the OpenAI API key
The API key is securely entered using `getpass` to ensure it is not visible in the notebook.

In [7]:
# Enter the OpenAI API key securely
# openai_api_key = getpass("Enter your OpenAI API key: ")
openai_api_key = userdata.get('OPENAI_API_KEY')
print(openai_api_key)

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


### Step 2: Load the text file
The text file is loaded using `TextLoader`. Replace the file path with your desired text file.

In [8]:
# Load the text file
file_path = "../0-Data/paul_graham_short.txt"  # Replace with your text file path
loader = TextLoader(file_path)
documents = loader.load()
print(documents[0])  # Display the first document to verify loading

RuntimeError: Error loading ../0-Data/paul_graham_short.txt

### Step 3: Split the text into manageable chunks
To handle large documents, the text is split into smaller chunks with overlap for better context.

In [None]:
# Split the text into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
split_docs = text_splitter.split_documents(documents)

### Step 4: Generate embeddings for the text chunks
Embeddings are generated for each chunk using OpenAI's embeddings model.

In [None]:
# Generate embeddings for text chunks
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
vector_store = FAISS.from_documents(split_docs, embeddings)

### Step 5: Set up a retrieval-based QA system
The system retrieves relevant chunks and uses OpenAI's model to answer queries.

In [None]:
# Set up a retrieval-based QA system
retriever = vector_store.as_retriever()
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model="gpt-3.5-turbo", openai_api_key=openai_api_key),
    retriever=retriever,
    return_source_documents=True
)

### Step 6: Ask questions
A loop is provided to allow users to ask questions interactively. Type `exit` to quit.

In [None]:
# Interactive QA loop
while True:
    query = input("\nEnter your question (or 'exit' to quit): ")
    if query.lower() == "exit":
        print("Exiting...")
        break
    result = qa_chain.invoke({"query": query})
    print("\nAnswer:")
    print(result["result"])


Answer:
The author worked on writing short stories and programming on an IBM 1401 computer in 9th grade, as well as delved into editing Lisp expressions for user-defined page styles in Viaweb. Later on, the author applied to grad schools, eventually attending Harvard and realizing during the first year that AI, as practiced at the time, was a hoax.
Exiting...
