In [1]:
# Data injestion

from langchain_community.document_loaders import TextLoader

loader = TextLoader("speech.txt")
text_documents = loader.load()
text_documents

[Document(metadata={'source': 'speech.txt'}, page_content='Mr. Speaker, Mr. President, Members of the Congress:\n\nI speak tonight for the dignity of man and the destiny of democracy. I urge every member of both parties, Americans of all religions and of all colors, from every section of this country, to join me in that cause.\n\nAt times history and fate meet at a single time in a single place to shape a turning point in man\'s unending search for freedom. So it was at Lexington and Concord. So it was a century ago at Appomattox. So it was last week in Selma, Alabama. There, long-suffering men and women peacefully protested the denial of their rights as Americans. Many were brutally assaulted. One good man, a man of God, was killed.\n\nThere is no cause for pride in what has happened in Selma. There is no cause for self-satisfaction in the long denial of equal rights of millions of Americans. But there is cause for hope and for faith in our democracy in what is happening here tonight.

In [4]:
import os
from dotenv import load_dotenv

load_dotenv()

os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')

In [19]:
# web based loader
from langchain_community.document_loaders import WebBaseLoader
import bs4

loader = WebBaseLoader('https://lilianweng.github.io/posts/2023-06-23-agent/', bs_kwargs=dict(
                                    parse_only=bs4.SoupStrainer(
                                        class_=("post-title", "post-content", "post-header"))),)

text_documents = loader.load()

In [20]:
text_documents

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistake

In [25]:
# Pdf reader

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("yolo.pdf")
docs = loader.load()

In [26]:
docs

[Document(metadata={'source': 'yolo.pdf', 'page': 0}, page_content='You Only Look Once:\nUniﬁed, Real-Time Object Detection\nJoseph Redmon\nUniversity of Washington\npjreddie@cs.washington.eduSantosh Divvala\nAllen Institute for Artiﬁcial Intelligence\nsantoshd@allenai.org\nRoss Girshick\nFacebook AI Research\nrbg@fb.comAli Farhadi\nUniversity of Washington\nali@cs.washington.edu\nAbstract\nWe present YOLO, a new approach to object detection.\nPrior work on object detection repurposes classiﬁers to per-\nform detection. Instead, we frame object detection as a re-\ngression problem to spatially separated bounding boxes and\nassociated class probabilities. A single neural network pre-\ndicts bounding boxes and class probabilities directly from\nfull images in one evaluation. Since the whole detection\npipeline is a single network, it can be optimized end-to-end\ndirectly on detection performance.\nOur uniﬁed architecture is extremely fast. Our base\nYOLO model processes images in real-ti

In [28]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(docs)
documents[:5]

[Document(metadata={'source': 'yolo.pdf', 'page': 0}, page_content='You Only Look Once:\nUniﬁed, Real-Time Object Detection\nJoseph Redmon\nUniversity of Washington\npjreddie@cs.washington.eduSantosh Divvala\nAllen Institute for Artiﬁcial Intelligence\nsantoshd@allenai.org\nRoss Girshick\nFacebook AI Research\nrbg@fb.comAli Farhadi\nUniversity of Washington\nali@cs.washington.edu\nAbstract\nWe present YOLO, a new approach to object detection.\nPrior work on object detection repurposes classiﬁers to per-\nform detection. Instead, we frame object detection as a re-\ngression problem to spatially separated bounding boxes and\nassociated class probabilities. A single neural network pre-\ndicts bounding boxes and class probabilities directly from\nfull images in one evaluation. Since the whole detection\npipeline is a single network, it can be optimized end-to-end\ndirectly on detection performance.\nOur uniﬁed architecture is extremely fast. Our base\nYOLO model processes images in real-ti

In [30]:
# Vector Embedding

from langchain.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

db = Chroma.from_documents(documents[:20], OpenAIEmbeddings())

In [34]:
query = "What is computer vision"

result = db.similarity_search(query)
result[0].page_content

'jects are in the image, where they are, and how they in-\nteract. The human visual system is fast and accurate, al-\nlowing us to perform complex tasks like driving with little\nconscious thought. Fast, accurate, algorithms for object de-\ntection would allow computers to drive cars in any weather\nwithout specialized sensors, enable assistive devices to con-\nvey real-time scene information to human users, and unlock\nthe potential for general purpose, responsive robotic sys-\ntems.\nCurrent detection systems repurpose classiﬁers to per-\nform detection. To detect an object, these systems take a\n1. Resize image.\n2. Run convolutional network.3. Non-max suppression.\nDog: 0.30Person: 0.64Horse: 0.28Figure 1: The YOLO Detection System. Processing images\nwith YOLO is simple and straightforward. Our system (1) resizes\nthe input image to 448×448, (2) runs a single convolutional net-\nwork on the image, and (3) thresholds the resulting detections by\nthe model’s conﬁdence.'

In [35]:
# FAISS vector db

from langchain_community.vectorstores import FAISS

db1 = FAISS.from_documents(documents[:20], OpenAIEmbeddings())

In [36]:
query = "What is computer vision"

result = db1.similarity_search(query)
result[0].page_content

'jects are in the image, where they are, and how they in-\nteract. The human visual system is fast and accurate, al-\nlowing us to perform complex tasks like driving with little\nconscious thought. Fast, accurate, algorithms for object de-\ntection would allow computers to drive cars in any weather\nwithout specialized sensors, enable assistive devices to con-\nvey real-time scene information to human users, and unlock\nthe potential for general purpose, responsive robotic sys-\ntems.\nCurrent detection systems repurpose classiﬁers to per-\nform detection. To detect an object, these systems take a\n1. Resize image.\n2. Run convolutional network.3. Non-max suppression.\nDog: 0.30Person: 0.64Horse: 0.28Figure 1: The YOLO Detection System. Processing images\nwith YOLO is simple and straightforward. Our system (1) resizes\nthe input image to 448×448, (2) runs a single convolutional net-\nwork on the image, and (3) thresholds the resulting detections by\nthe model’s conﬁdence.'