In [1]:
## Data Ingestion
'''Data ingestion is the process of collecting and processing data from various sources, such as databases, APIs '''
from langchain_community.document_loaders import TextLoader

loader = TextLoader("speech.txt")
text_documents = loader.load()

text_documents

[Document(metadata={'source': 'speech.txt'}, page_content='I have a dream that one day down in Alabama, with its vicious racists, with its governor having his lips dripping with the words of interposition and nullification – one day right there in Alabama little black boys and black girls will be able to join hands with little white boys and white girls as sisters and brothers.\nI have a dream today.\nI have a dream that one day every valley shall be exalted, and every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight, and the glory of the Lord shall be revealed and all flesh shall see it together.\nThis is our hope. This is the faith that I go back to the South with. With this faith we will be able to hew out of the mountain of despair a stone of hope. With this faith we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood. With this faith we will be able to work toge

In [3]:
import os
from dotenv import load_dotenv
load_dotenv()

### Set up the environment variables
os.environ["LANGSMITH_API_KEY"]= os.getenv('LANGCHAIN_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"]= 'true'
os.environ["LANGSMITH_ENDPOINT"]= os.getenv('LANGSMITH_ENDPOINT')

Python-dotenv could not parse statement starting at line 1
Python-dotenv could not parse statement starting at line 6


In [7]:
### Web based loaders

from langchain_community.document_loaders import WebBaseLoader
import bs4

## Load data from web
loader = WebBaseLoader(web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
                       bs_kwargs = dict(parse_only=bs4.SoupStrainer(
                           class_ = ('post-title','post-content','post-header')
                       )))
textdocuments = loader.load()

textdocuments

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistake

In [11]:
###PDF loader
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('superpoint.pdf')

docs = loader.load()

In [14]:
## Transform data
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap= 200)
documents = text_splitter.split_documents(docs)
documents[:5]

[Document(metadata={'producer': 'pdfTeX-1.40.17', 'creator': 'LaTeX with hyperref package', 'creationdate': '2018-04-20T01:09:53+00:00', 'author': '', 'keywords': '', 'moddate': '2018-04-20T01:09:53+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) kpathsea version 6.2.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'superpoint.pdf', 'total_pages': 13, 'page': 0, 'page_label': '1'}, page_content='SuperPoint: Self-Supervised Interest Point Detection and Description\nDaniel DeTone\nMagic Leap\nSunnyvale, CA\nddetone@magicleap.com\nTomasz Malisiewicz\nMagic Leap\nSunnyvale, CA\ntmalisiewicz@magicleap.com\nAndrew Rabinovich\nMagic Leap\nSunnyvale, CA\narabinovich@magicleap.com\nAbstract\nThis paper presents a self-supervised framework for\ntraining interest point detectors and descriptors suitable\nfor a large number of multiple-view geometry problems in\ncomputer vision. As opposed to patch-based neural net-\nworks, our fully-convolu

In [22]:
## Vector embeddings
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

db = Chroma.from_documents(documents[:15], OllamaEmbeddings(model="llama3.2"))

In [30]:
## Vector database

query = "Instead of using human supervision to define interest points in real images, we present a self-supervised solution"
result = db.similarity_search(query=query)
result[1].page_content

'input image multiple times to help an interest point detec-\ntor see the scene from many different viewpoints and scales\n(see Section 5). We use Homographic Adaptation in con-\njunction with the MagicPoint detector to boost the perfor-\nmance of the detector and generate the pseudo-ground truth\ninterest points (see Figure 2b). The resulting detections are\nmore repeatable and ﬁre on a larger set of stimuli; thus we\nnamed the resulting detector SuperPoint.\nThe most common step after detecting robust and repeat-\nable interest points is to attach a ﬁxed dimensional descrip-\ntor vector to each point for higher level semantic tasks,e.g.,\nimage matching. Thus we lastly combine SuperPoint with\na descriptor subnetwork (see Figure 2c). Since the Super-\nPoint architecture consists of a deep stack of convolutional\nlayers which extract multi-scale features, it is straightfor-\nward to then combine the interest point network with an ad-\nditional subnetwork that computes interest point d

In [35]:
##FAISS Vector DB

from langchain_community.vectorstores import FAISS

db_faiss = FAISS.from_documents(documents[:30],OllamaEmbeddings(model="llama3.2"))

In [34]:
result = db_faiss.similarity_search(query=query)
result[0].page_content

'structures, we are similar to UCN [3] and to a lesser extent\nDeepDesc [6]; however, both do not perform any interest\npoint detection. On the other end, LIFT [32], a recently in-\ntroduced convolutional replacement for SIFT stays close to\nthe traditional patch-based detect then describe recipe. The\nLIFT pipeline contains interest point detection, orientation\nestimation and descriptor computation, but additionally re-\nquires supervision from a classical SfM system. These dif-\nferences are summarized in Table 1.\nInterest\nPoints? Descriptors? Full Image\nInput?\nSingle\nNetwork?\nReal\nTime?\nSuperPoint (ours) \x13 \x13 \x13 \x13 \x13\nLIFT [32] \x13 \x13\nUCN [3] \x13 \x13 \x13\nTILDE [29] \x13 \x13\nDeepDesc [6] \x13 \x13\nSIFT \x13 \x13\nORB \x13 \x13 \x13\nTable 1.Qualitative Comparison to Relevant Methods.Our Su-\nperPoint method is the only one to compute both interest points\nand descriptors in a single network in real-time.\nOn the other extreme of the supervision spectru

In [38]:
### CHAIN AND RETRIEVAL
from langchain_ollama import OllamaLLM

llm = OllamaLLM(model='llama3.2')

In [39]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""
                                          Answer the following question based only on provided context. 
                                          Think step by step and provide a clear and concise answer.
                                          I will tip you 20$ if the user finds the correct answer.
                                          <context>
                                          {context}
                                          </context>
                                          Question: {input}
                                          """)

In [40]:
from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)

In [41]:
'''Retrievers: An interface to return documents given and unstructed query.'''

retriever = db_faiss.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x143108750>, search_kwargs={})

In [42]:
## Retriever Chain
from langchain.chains import create_retrieval_chain

retriever_chain = create_retrieval_chain(retriever,document_chain)

In [44]:
retriever_chain.invoke({'input': 'What is SuperPoint?'})

{'input': 'What is SuperPoint?',
 'context': [Document(id='09435ded-71f7-437a-809c-7aa458e15ea8', metadata={'producer': 'pdfTeX-1.40.17', 'creator': 'LaTeX with hyperref package', 'creationdate': '2018-04-20T01:09:53+00:00', 'author': '', 'keywords': '', 'moddate': '2018-04-20T01:09:53+00:00', 'ptex.fullbanner': 'This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) kpathsea version 6.2.2', 'subject': '', 'title': '', 'trapped': '/False', 'source': 'superpoint.pdf', 'total_pages': 13, 'page': 4, 'page_label': '5'}, page_content='input image and combines the results – a process we call\nHomographic Adaptation (see Figure 5).\n5.1. Formulation\nHomographies give exact or almost exact image-to-\nimage transformations for camera motion with only rotation\naround the camera center, scenes with large distances to ob-\njects, and planar scenes. Moreover, because most of the\nworld is reasonably planar, a homography is good model\nfor what happens when the same 3D point is seen from d