# Document Loader

- A Document Loader in LangChain is responsible for loading external data into your program so that it can be processed by language models. These loaders support various data types and sources—such as PDFs, websites, and text files.

- The goal is to turn unstructured or semi-structured data into a consistent format (Document objects) that LangChain tools (like retrievers or chains) can work with.

## Text Loader

In [7]:
from langchain_community.document_loaders import TextLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    temperature=0,
    google_api_key = "AIzaSyDCnnYyMnnwleE0jeyN-NKFb-aphjSi5WM"
)

prompt = PromptTemplate(
    template='Write a summary for the following poem - \n {poem}',
    input_variables=['poem']
)

parser = StrOutputParser()

loader = TextLoader('/content/cricket.txt', encoding='utf-8')

docs = loader.load()

chain = prompt | llm | parser

print(chain.invoke({'poem':docs[0].page_content}))


This poem is a celebration of cricket, capturing its essence from its humble beginnings to its global appeal. It describes the excitement and tension of a match, the skill and strategy involved, and the passion it ignites in players and fans alike. It highlights the unifying power of the sport, transcending borders and generations, and emphasizes the values of sportsmanship, pride, and the enduring spirit of the game. Ultimately, the poem portrays cricket as more than just a sport; it's a cultural phenomenon, a source of shared memories, and a timeless tradition that will continue to inspire and unite people for generations to come.


## PDF Loader

In [10]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('/content/dl-curriculum.pdf')

docs = loader.load()

print(len(docs))

print(docs[0].page_content)
print(docs[1].metadata)

23
CampusXDeepLearningCurriculum
A.ArtificialNeuralNetworkandhowtoimprovethem
1.BiologicalInspiration
● Understandingtheneuronstructure● Synapsesandsignaltransmission● Howbiologicalconceptstranslatetoartificialneurons
2.HistoryofNeuralNetworks
● Earlymodels(Perceptron)● BackpropagationandMLPs● The"AIWinter"andresurgenceofneuralnetworks● Emergenceofdeeplearning
3.PerceptronandMultilayerPerceptrons(MLP)
● Single-layerperceptronlimitations● XORproblemandtheneedforhiddenlayers● MLParchitecture
4. LayersandTheirFunctions
● InputLayer○ Acceptinginputdata● HiddenLayers○ Featureextraction● OutputLayer○ Producingfinalpredictions
5.ActivationFunctions
{'producer': 'Skia/PDF m131 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Deep Learning Curriculum', 'source': '/content/dl-curriculum.pdf', 'total_pages': 23, 'page': 1, 'page_label': '2'}


# Directory Loader

In [12]:
from langchain_community.document_loaders import DirectoryLoader, PyPDFLoader

loader = DirectoryLoader(
    path='/content/DATA',
    glob='*.pdf',
    loader_cls=PyPDFLoader
)

docs = loader.lazy_load()

for document in docs:
    print(document.metadata)

{'producer': 'Skia/PDF m131 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Deep Learning Curriculum', 'source': '/content/DATA/dl-curriculum.pdf', 'total_pages': 23, 'page': 0, 'page_label': '1'}
{'producer': 'Skia/PDF m131 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Deep Learning Curriculum', 'source': '/content/DATA/dl-curriculum.pdf', 'total_pages': 23, 'page': 1, 'page_label': '2'}
{'producer': 'Skia/PDF m131 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Deep Learning Curriculum', 'source': '/content/DATA/dl-curriculum.pdf', 'total_pages': 23, 'page': 2, 'page_label': '3'}
{'producer': 'Skia/PDF m131 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Deep Learning Curriculum', 'source': '/content/DATA/dl-curriculum.pdf', 'total_pages': 23, 'page': 3, 'page_label': '4'}
{'producer': 'Skia/PDF m131 Google Docs Renderer', 'creator': 'PyPDF', 'creationdate': '', 'title': 'Deep Learni

# WebLoader

In [13]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from dotenv import load_dotenv



prompt = PromptTemplate(
    template='Answer the following question \n {question} from the following text - \n {text}',
    input_variables=['question','text']
)

parser = StrOutputParser()

url = 'https://www.flipkart.com/apple-macbook-air-m2-16-gb-256-gb-ssd-macos-sequoia-mc7x4hn-a/p/itmdc5308fa78421'
loader = WebBaseLoader(url)

docs = loader.load()


chain = prompt | llm | parser

print(chain.invoke({'question':'What is the prodcut that we are talking about?', 'text':docs[0].page_content}))



The text "Site is overloaded" doesn't tell us what product is being discussed. It only indicates that a website or online service is experiencing too much traffic or demand.


# CSV Loader

In [14]:
from langchain_community.document_loaders import CSVLoader

loader = CSVLoader(file_path='/content/Social_Network_Ads.csv')

docs = loader.load()

print(len(docs))
print(docs[1])

400
page_content='User ID: 15810944
Gender: Male
Age: 35
EstimatedSalary: 20000
Purchased: 0' metadata={'source': '/content/Social_Network_Ads.csv', 'row': 1}
