In [2]:
## Data Ingestion

from langchain_community.document_loaders import TextLoader
loader = TextLoader("speech.txt")
text_documents = loader.load()
text_documents

[Document(metadata={'source': 'speech.txt'}, page_content='"Attention Is All You Need"[1] is a 2017 landmark[2][3] research paper in machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al.[4] It is considered a foundational[5] paper in modern artificial intelligence, as the transformer approach has become the main architecture of large language models like those based on GPT.[6][7] At the time, the focus of the research was on improving Seq2seq techniques for machine translation, but the authors go further in the paper, foreseeing the technique\'s potential for other tasks like question answering and what is now known as multimodal Generative AI.[1]\n\nThe paper\'s title is a reference to the song "All You Need Is Love" by the Beatles.[8] The name "Transformer" was picked because Jakob Uszkoreit, one of the paper\'s authors, liked

In [3]:
import os
from dotenv import load_dotenv

load_dotenv()

os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY")


In [8]:
# Web based Loader
from langchain_community.document_loaders import WebBaseLoader
import bs4 

loader = WebBaseLoader(web_paths=("https://akramboutzouga.medium.com/understanding-the-transformers-architecture-attention-is-all-you-need-paper-reading-a0e9ae2cd8aa",),
                       bs_kwargs = dict(parse_only=bs4.SoupStrainer(
                           class_ = ("a b c")
                       )))
text_documents = loader.load()


In [9]:
text_documents

[Document(metadata={'source': 'https://akramboutzouga.medium.com/understanding-the-transformers-architecture-attention-is-all-you-need-paper-reading-a0e9ae2cd8aa'}, page_content='Open in appSign upSign inWriteSign upSign inUnderstanding The Transformers architecture: “Attention is all you need”, paper readingAKRAM BOUTZOUGA·Follow14 min read·Dec 15, 2023--ListenSharePassing by AI ideas and looking back at the most fascinating ideas that come in the field of AI in general that I’ve come across and found so interesting to grasp and try to understand deeply, I think it is the Transformer architecture, we always had those different achievements in the field of AI that come with new inventions of methodologies that help neural networks in processing any input you feed it, could be video, images or speech or text, to use it in different tasks like next word prediction or detecting a cat in an image. So Recently, we notice this convergence to the transformer architecture as the new promise fo

In [11]:
## PDFReader
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("attention.pdf")
text_documents = loader.load()
text_documents

[Document(metadata={'source': 'attention.pdf', 'page': 0}, page_content='Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser ∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence and convolutions\nentirely. Experiments on two machine translation tasks show these models to\nbe super

In [13]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
documents = text_splitter.split_documents(text_documents)
documents[:5]

[Document(metadata={'source': 'attention.pdf', 'page': 0}, page_content='Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.com\nNoam Shazeer∗\nGoogle Brain\nnoam@google.com\nNiki Parmar∗\nGoogle Research\nnikip@google.com\nJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.com\nAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.edu\nŁukasz Kaiser ∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence and convolutions\nentirely. Experiments on two machine translation tasks show these models to\nbe super

In [None]:
# VectorEmbeddings and vectorstore

from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma 
db = Chroma.from_documents(documents[:20],OpenAIEmbeddings())


In [None]:
query = "Who are the authors of attention is all you need paper"
result = db.similarity_search(query)
result

In [None]:
## Faiss vectorstore

from langchain_community.vectorstores import FAISS
db1 = FAISS.from_documents(documents[:20],OpenAIEmbeddings())
query = "Who are the authors of attention is all you need paper"
result = db1.similarity_search(query)
result