- **TextLoader**: Handles plain text files
- **PyPDFLoader**: Specializes in PDF files, offering easy access to content and metadata
- **SeleniumURLLoader**: Load HTML documents from URLs that require JavaScript rendering

### TextLoader

In [3]:
from langchain.document_loaders import TextLoader

loader = TextLoader('my_file.txt')
documents = loader.load()

In [4]:
documents

[Document(page_content='Google opens up its AI language model PaLM to challenge OpenAI and GPT-3\nGoogle is offering developers access to one of its most advanced AI language models: PaLM.\nThe search giant is launching an API for PaLM alongside a number of AI enterprise tools\nit says will help businesses “generate text, images, code, videos, audio, and more from\nsimple natural language prompts.”\n\nPaLM is a large language model, or LLM, similar to the GPT series created by OpenAI or\nMeta’s LLaMA family of models. Google first announced PaLM in April 2022. Like other LLMs,\nPaLM is a flexible system that can potentially carry out all sorts of text generation and\nediting tasks. You could train PaLM to be a conversational chatbot like ChatGPT, for\nexample, or you could use it for tasks like summarizing text or even writing code.\n(It’s similar to features Google also announced today for its Workspace apps like Google\nDocs and Gmail.)\n', metadata={'source': 'my_file.txt'})]

### PyPDFLoader (PDF)

In [5]:
!pip install -q pypdf

In [None]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("example_data/layout-parser-paper.pdf")
pages = loader.load_and_split()

print(pages[0])

### SeleniumURLLoader (URL)

In [6]:
!pip install -q unstructured selenium

In [7]:
from langchain.document_loaders import SeleniumURLLoader

urls = [
    "https://www.youtube.com/watch?v=TFa539R09EQ&t=139s",
    "https://www.youtube.com/watch?v=6Zv6A_9urh4&t=112s"
]

loader = SeleniumURLLoader(urls=urls)
data = loader.load()

print(data[0])

page_content='NaN / NaN\n\nNaN / NaN\n\nVN\n\nBỏ qua điều hướng\n\nTìm kiếm\n\nTìm kiếm bằng giọng nói\n\nĐăng nhập\n\nVN\n\nOPENASSISTANT TAKES ON CHATGPT!\n\nTìm kiếm\n\nThông tin\n\nMua sắm\n\nXem sau\n\nChia sẻ\n\nSao chép đường liên kết\n\nNhấn để bật tiếng\n\n2x\n\nNếu phát lại không bắt đầu ngay, hãy thử khởi động lại thiết bị của bạn.\n\nTiếp theo\n\nTrực tiếpSắp diễn ra\n\nPhát ngay\n\nMachine Learning Street Talk\n\nĐăng ký\n\nĐã đăng ký\n\nBạn đã đăng xuất\n\nCác video mà bạn xem có thể được thêm vào nhật ký xem và gây ảnh hưởng đến phần đề xuất trên TV. Để tránh điều này, hãy hủy rồi đăng nhập vào YouTube trên máy tính.\n\nChia sẻ\n\nĐã xảy ra lỗi trong khi truy xuất thông tin chia sẻ. Vui lòng thử lại sau.\n\n2:19\n\n2:19 / 59:51\n\nXem toàn bộ video\n\n•\n\nDiscussing Open Assistant and Fine-tuning Language Models\n\nCuộn để biết thêm chi tiết\n\nĐoạn video có tiêu đề\n\nDiscussing Open Assistant and Fine-tuning Language Models\n    Discussing Open Assistant and Fine-tun

The SeleniumURLLoader class includes the following attributes:

- URLs (List[str]): List of URLs to load.
- continue_on_failure (bool, default=True): Continues loading other URLs on failure if True.
- browser (str, default="chrome"): Browser selection, either 'Chrome' or 'Firefox'.
- executable_path (Optional[str], default=None): Browser executable path.
- headless (bool, default=True): Browser runs in headless mode if True.

Customize these attributes during SeleniumURLLoader instance initialization, such as using Firefox instead of Chrome by setting the browser to "firefox":

In [None]:
loader = SeleniumURLLoader(urls=urls, browser="firefox")

### Google Drive loader

In [None]:
from langchain.document_loaders import GoogleDriveLoader

loader = GoogleDriveLoader(
    folder_id="your_folder_id",
    recursive=False  # Optional: Fetch files from subfolders recursively. Defaults to False.
)

docs = loader.load()