# Data Injection

### Text Loader

In [7]:
from langchain_community.document_loaders import TextLoader
text_loader = TextLoader('speech.txt')
text_loader.load()

[Document(page_content='In the realm of data, machines learn and grow, Through algorithms and patterns they show. They sift through information, finding trends, Predicting outcomes, solving complex bends. With each iteration, they refine their code, Unraveling mysteries, unlocking the code. From self-driving cars to medical cures, Machine learning technology ensures A brighter future, where possibilities soar, Guided by data, like never before.', metadata={'source': 'speech.txt'})]

### Web Content Loader

In [10]:
from dotenv import load_dotenv
import os
load_dotenv() 

os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')

In [18]:
from langchain_community.document_loaders import WebBaseLoader
import bs4
web_loader = WebBaseLoader('https://lilianweng.github.io/posts/2020-10-29-odqa/',
                           bs_kwargs=dict(parse_only=bs4.SoupStrainer(
                              class_=('post-title', "post-content", "post-header"))),
                           )
text_documnet = web_loader.load()
dict(text_documnet[0])['page_content']

'\n\n      How to Build an Open-Domain Question Answering System?\n    \nDate: October 29, 2020  |  Estimated Reading Time: 33 min  |  Author: Lilian Weng\n\n\n\n[Updated on 2020-11-12: add an example on closed-book factual QA using OpenAI API (beta).\nA model that can answer any question with regard to factual knowledge can lead to many useful and practical applications, such as working as a chatbot or an AI assistant🤖. In this post, we will review several common approaches for building such an open-domain question answering system.\nDisclaimers given so many papers in the wild:\n\nAssume we have access to a powerful pretrained language model.\nWe do not cover how to use structured knowledge base (e.g. Freebase, WikiData) here.\nWe only focus on a single-turn QA instead of a multi-turn conversation style QA.\nWe mostly focus on QA models that contain neural networks, specially Transformer-based language models.\nI admit that I missed a lot of papers with architectures designed specifi

### Pdf Loader

In [31]:
from langchain_community.document_loaders import PyPDFLoader
pdf_loader = PyPDFLoader('https://www.just.edu.jo/~mqais/CIS99/PDF/Ch.01_Introduction_%20to_computers.pdf')
docs = pdf_loader.load()
docs

[Document(page_content=' 1 Chapter One \nIntroduction to Computer  \n \nComputer  \nA computer is an electronic device, operating under the control of instructions stored \nin its own memory that can accept data (input), process the data according to specified \nrules, produce information (output), and store the information for future use1. \n \nFunctionalities of a computer2  \nAny digital computer carries out five functions in gross terms:  \n \n \n \n \n \n \n \n \n \nComputer Components  \nAny kind of computers consists of HARDWARE AND SOFTWARE . \n \nHardware:  \nComputer hardware is the collection of  physical elements that constitutes a computer \nsystem. Computer hardware refers to the physical parts or components of a computer \nsuch as the monitor, mouse, keyboard, computer data storage, hard drive disk (HDD), \nsystem unit (graphic cards, sound cards, m emory, motherboard and chips), etc. all of \nwhich are physical objects that can be touched .3 \n                          

In [58]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)  
documents = text_splitter.split_documents(docs)
documents[:5]


[Document(page_content='1 Chapter One \nIntroduction to Computer  \n \nComputer  \nA computer is an electronic device, operating under the control of instructions stored \nin its own memory that can accept data (input), process the data according to specified \nrules, produce information (output), and store the information for future use1. \n \nFunctionalities of a computer2  \nAny digital computer carries out five functions in gross terms:  \n \n \n \n \n \n \n \n \n \nComputer Components  \nAny kind of computers consists of HARDWARE AND SOFTWARE . \n \nHardware:  \nComputer hardware is the collection of  physical elements that constitutes a computer \nsystem. Computer hardware refers to the physical parts or components of a computer \nsuch as the monitor, mouse, keyboard, computer data storage, hard drive disk (HDD), \nsystem unit (graphic cards, sound cards, m emory, motherboard and chips), etc. all of \nwhich are physical objects that can be touched .3', metadata={'source': 'https:

In [59]:
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
db = Chroma.from_documents(documents, OpenAIEmbeddings())

In [62]:
## vector stores
query = "Characteristics of Computer"
db.similarity_search(query, k=5)

[Document(page_content='1 Chapter One \nIntroduction to Computer  \n \nComputer  \nA computer is an electronic device, operating under the control of instructions stored \nin its own memory that can accept data (input), process the data according to specified \nrules, produce information (output), and store the information for future use1. \n \nFunctionalities of a computer2  \nAny digital computer carries out five functions in gross terms:  \n \n \n \n \n \n \n \n \n \nComputer Components  \nAny kind of computers consists of HARDWARE AND SOFTWARE . \n \nHardware:  \nComputer hardware is the collection of  physical elements that constitutes a computer \nsystem. Computer hardware refers to the physical parts or components of a computer \nsuch as the monitor, mouse, keyboard, computer data storage, hard drive disk (HDD), \nsystem unit (graphic cards, sound cards, m emory, motherboard and chips), etc. all of \nwhich are physical objects that can be touched .3', metadata={'source': 'https:

In [61]:
## FAISS Vector Store
from langchain_community.vectorstores import FAISS
db = FAISS.from_documents(documents, OpenAIEmbeddings())
query = "Characteristics of Computer"
db.similarity_search(query, k=5)

[Document(page_content='1 Chapter One \nIntroduction to Computer  \n \nComputer  \nA computer is an electronic device, operating under the control of instructions stored \nin its own memory that can accept data (input), process the data according to specified \nrules, produce information (output), and store the information for future use1. \n \nFunctionalities of a computer2  \nAny digital computer carries out five functions in gross terms:  \n \n \n \n \n \n \n \n \n \nComputer Components  \nAny kind of computers consists of HARDWARE AND SOFTWARE . \n \nHardware:  \nComputer hardware is the collection of  physical elements that constitutes a computer \nsystem. Computer hardware refers to the physical parts or components of a computer \nsuch as the monitor, mouse, keyboard, computer data storage, hard drive disk (HDD), \nsystem unit (graphic cards, sound cards, m emory, motherboard and chips), etc. all of \nwhich are physical objects that can be touched .3', metadata={'source': 'https: