## Document Loader
In this notebook, we will explore how to load documents from various sources and prepare them for further processing.

## Getting Started with TextLoader
We will load the text from the file `speeches.txt` using the `TextLoader` class.

In [15]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader('./speech.txt')
documents = loader.load()


## Getting Started with PDF Loader 

In [14]:
# pdf loader
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader('./RelievingLetter.pdf')
documents = loader.load()

metadata = documents[0].metadata
print(metadata)

content = documents[0].page_content
print(content)


{'producer': 'Microsoft® Word for Microsoft 365', 'creator': 'Microsoft® Word for Microsoft 365', 'creationdate': '2024-07-30T16:42:53+05:30', 'moddate': '2024-07-30T16:42:53+05:30', 'source': './RelievingLetter.pdf', 'total_pages': 1, 'page': 0, 'page_label': '1'}
18-Dec-2019 
 
Dear Mohammed  Shahabuddin 
Employee  ID : 104501551 
This is to confirm that you are relieved from the services of the company effective close of working hours on 15 Nov 2019 
 
We confirm that you have been working for Amazon Development Centre India Pvt. Ltd., Hyderabad from 10 Oct 2018 till 15 Nov 2019 
as a Full Time Employee, and your designation at the time of leaving the organization was CLOUD ENGINEER-I . With 
regards to the settlement of your dues please contact the HR department on queries, if any. 
We would like to take this opportunity to remind you of the clauses pertaining to confidentiality agreement signed by you at the time of 
joining, especially emphasizing on Section 4(b) (i) and (ii) of 

## Getting started with WebBase Loader
We are going to use the WebBase Loader to load the data from the web.

In [27]:
from langchain_community.document_loaders import WebBaseLoader
import bs4

url = "https://medium.com/@dwgray/a-very-simple-website-back-to-the-basics-1dffdc43d19b"

loader = WebBaseLoader(
    web_paths=[url]
)

documents = loader.load()

web_page = documents[0]
print(web_page)

page_content='A Very Simple Website: Back to the Basics | by David W. Gray | MediumOpen in appSign upSign inWriteSign upSign inMastodonA Very Simple Website: Back to the BasicsDavid W. GrayFollow7 min read·Sep 16, 2024--ListenShareOne of the students I’m mentoring is taking his first web development class. They ask the students to build a small website from scratch using just HTML, CSS, and JavaScript. I think this is a good thing because it’s worth knowing how the core elements of a website work before diving into a more real-world scenario using a bunch of libraries and build tools to create a site.The thing that got lost in translation was some of the basics of how such a site works. This confusion is something that I’ve seen happen with less experienced engineers who are either right out of college or have been working on desktop applications or embedded systems and are moving into a team that’s doing full-stack development. The last time that happened on a team I was leading, I sa

## Wikipedia Loader

In [28]:
from langchain_community.document_loaders import WikipediaLoader

loader = WikipediaLoader(query="Python (programming language)")
documents = loader.load()
print(documents[0].page_content)

ImportError: Could not import wikipedia python package. Please install it with `pip install wikipedia`.