In [22]:
## RAG Pipeline with Vector Database

# 1. To load text
from langchain_community.document_loaders import TextLoader
loader = TextLoader('manifesto.txt')
loader.load()

[Document(metadata={'source': 'manifesto.txt'}, page_content='The Communist Manifesto is divided into a preamble and four sections. The introduction begins: "A spectre is haunting Europe—the spectre of communism."[1] Pointing out that it was widespread for politicians—both those in government and those in the opposition—to label their opponents as communists, the authors infer that those in power acknowledge communism to be a power in itself. Subsequently, the introduction exhorts communists to openly publish their views and aims, which is the very function of the manifesto.[2]\n\nThe first section of the Manifesto, "Bourgeois and Proletarians",[3] outlines historical materialism, and states that "the history of all hitherto existing society is the history of class struggles".[4] According to the authors, all societies in history had taken the form of an oppressed majority exploited by an oppressive minority. In Marx and Engels\' time, they say that under capitalism, the industrial wor

In [8]:
import os
from dotenv import load_dotenv
load_dotenv()

os.environ['OPENAI_API_KEY']=os.getenv("OPENAI_API_KEY") or ''

In [16]:
# 2. To load from web
from langchain_community.document_loaders import WebBaseLoader
import bs4
#load, chunk, and index the content of html page

loader = WebBaseLoader(
    web_path=("https://sarvagya-next-sanity-blog.vercel.app/posts/clerkjs-react-auth-made-easy"),
    bs_kwargs=dict(parse_only=bs4.SoupStrainer(
        'p'
    ))
)
loader.load()

[Document(metadata={'source': 'https://sarvagya-next-sanity-blog.vercel.app/posts/clerkjs-react-auth-made-easy'}, page_content="Authentication is a crucial aspect of modern web and mobile applications, ensuring that users can securely access their accounts and data. To simplify the process of implementing authentication, various libraries and services have emerged. One such powerful tool is ClerkJS, an authentication service provider designed specifically for React, Next.js, and React Native cross-platform applications. In this article, we'll dive into ClerkJS, exploring its features, benefits, and how it streamlines the authentication process for developers.ClerkJS is an authentication service that enables developers to quickly integrate secure authentication flows into their applications. Built with a focus on React-based frameworks, such as React, Next.js, and React Native, ClerkJS offers a comprehensive set of tools and components for handling authentication-related tasks.Here's an

In [24]:
## PDF reader
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader('bbt.pdf')
docs = loader.load()


In [26]:
## Split into chunks

from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=20
)
chunk_documents = text_splitter.split_documents(docs)
chunk_documents

[Document(metadata={'source': 'bbt.pdf', 'page': 0}, page_content='The Big Bang Theory\nTitle screen\nGenre Sitcom\nCreated by Chuck Lorre\n&\nBill Prady\nShowrunnersBill Prady\nSteven Molaro\nSteve Holland\nDirected by Mark Cendrowski\nStarring Johnny Galecki\nJim Parsons\nKaley Cuoco\nSimon Helberg\nKunal Nayyar\nSara Gilbert\nMayim Bialik\nMelissa Rauch\nKevin Sussman\nLaura Spencer\nTheme music\ncomposerBarenaked\nLadies\nOpening\ntheme"Big Bang Theory\nTheme"[ 1 ] [ 2 ]\nCountry of\noriginUnited States\nOriginal\nlanguageEnglish\nNo. of\nseasons12\nNo. of\nepisodes279\n(list of episodes)\nProduction\nExecutive\nproducersChuck Lorre\nBill Prady\nLee Aronsohn\nSteven Molaro\nEric Kaplan\nMaria Ferrari\nDave Goetsch\nSteve Holland\nProducer Faye Oshima\nBelyeu\nThe Big Bang Theory\nThe Big Bang Theory  is an American television sitcom  created by Chuck Lorre  and Bill Prady ,\nboth of whom served as executive produce rs and head writers on the series, along with Steven\nMolaro . It a

In [27]:
## Vector Embedding and Vector Store
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

db = FAISS.from_documents(chunk_documents, OpenAIEmbeddings())
db

<langchain_community.vectorstores.faiss.FAISS at 0x7fd0295b55a0>

In [34]:
# Now we will perform vector search on the db having the embeddings
query = 'Who was Sheldon Cooper?'
retrieved_result = db.similarity_search(query)
print(retrieved_result[0].page_content)

Johnny Galecki  as Leonard Hofstadter :[ 6 ] An experimental physicist  with an IQ of 173, who
received his Ph.D. when he was 24 years old. Leonard is a nerd who loves video games,
comic books, and Dungeons & Dragons . Leonard is the straight man  of the series, sharing an
apartment in Pasadena, CA, with Sheldon Cooper . Leonard is smitten with his new neighbor
Penny when they ﬁrst meet, and they eventually marry .
Jim Parsons  as Sheldon Cooper :[ 7 ] Originally from Galveston, Texas , Sheldon was a child
prodigy with an eidetic memory  who began college at the age of eleven and earned a Ph.D.
at age sixteen. He is a theoretical physicist  researching quantum mechanics  and string
theory , and, despite his IQ of 187, he ﬁnds many routine aspects of social situations difﬁcult
to grasp. He is determined to have his own way , continually boasts of his intelligence, and
has an extremely ritualized way of living. Despite these quirks, he begins a relationship with
