# Advance Load Document

In real life, text or documents may come from various sources. LangChain provides loaders to import different document sources into the LangChain document format. Here are some:

1. `TextLoader` for loading text files.
2. `PyPDFLoader` for loading PDF documents.
3. `CSVLoader` for loading CSV files.
4. `JSONLoader` for loading JSON files.
5. ...etc.

Refer to [this documentation](https://python.langchain.com/v0.2/docs/how_to/#document-loaders) for more information

In [5]:
from dotenv import load_dotenv

load_dotenv()

True

In [6]:
# Text loader
from langchain_community.document_loaders import TextLoader
text_loader = TextLoader('./sources/sangkuriang.txt')
text_loader.load()

[Document(metadata={'source': './sources/sangkuriang.txt'}, page_content='Sangkuriang Story\nThe legend tells that, long ago, there lived a beautiful woman named Dayang Sumbi, the daughter of the king of Sumbing Perbangkara. Her beautiful face made Dayang Sumbi contested by the princes.\nAs a princess from the kingdom, Dayang Sumbi has a weaving hobby. One time, when she was busy weaving cloth, suddenly her loom fell. Instead of taking it herself, Dayang Sumbi said an oath: if the one who took the loom were a man, then she would take him as her husband, but if the one who took the loom were a woman, she would make her a sister.\nUnexpectedly, sometime later, there came a male dog named Si Tumang, which brought Dayang Sumbi’s loom. Finally, to fulfill her oath, Dayang Sumbi married Tumang (long story short, Tumang was a god who was expelled from heaven). From that marriage, a son named Sangkuriang was born.\nTime went on until Sangkuriang grew into a handsome boy. One day, Sangkuriang f

In [3]:
# PDF loader
from langchain_community.document_loaders import PyPDFLoader
pdf_loader = PyPDFLoader("./sources/sangkuriang.pdf")
pdf_loader.load()

[Document(metadata={'source': './sources/sangkuriang.pdf', 'page': 0}, page_content='Sangkuriang Story\nThe legend tells that, long ago, there lived a beautiful woman named Dayang Sumbi, the daughter\nof the king of Sumbing Perbangkara. Her beautiful face made Dayang Sumbi contested by the\nprinces.\nAs a princess from the kingdom, Dayang Sumbi has a weaving hobby. One time, when she was\nbusy weaving cloth, suddenly her loom fell. Instead of taking it herself, Dayang Sumbi said an oath:\nif the one who took the loom were a man, then she would take him as her husba nd, but if the one\nwho took the loom were a woman, she would make her a sister.\nUnexpectedly, sometime later, there came a male dog named Si Tuma ng, which brought Dayang\nSumbi’s loom. Finally, to fulfill her oath, Dayang Sumbi married Tumang (long story short, Tumang\nwas a god who was expelled from heaven). From that marriage, a son named Sangkuria ng was\nborn.\nTime went on until Sangkuriang grew into a handsome boy. 

In [4]:
# CSV loader

from langchain_community.document_loaders import CSVLoader
loader = CSVLoader(
    file_path="./sources/country.csv",
    # csv_args={
    #     "delimiter": ",",
    #     "quotechar": '"',
    #     "fieldnames": ["MLB Team", "Payroll in millions", "Wins"],
    # },
)

loader.load()

[Document(metadata={'source': './sources/country.csv', 'row': 0}, page_content='Country: Indonesia\nCapital: Jakarta\nGDP (billions): 3'),
 Document(metadata={'source': './sources/country.csv', 'row': 1}, page_content='Country: Japan\nCapital: Tokyo\nGDP (billions): 10'),
 Document(metadata={'source': './sources/country.csv', 'row': 2}, page_content='Country: USA\nCapital: Washington\nGDP (billions): 15'),
 Document(metadata={'source': './sources/country.csv', 'row': 3}, page_content='Country: Rusia\nCapital: Moskow\nGDP (billions): 15'),
 Document(metadata={'source': './sources/country.csv', 'row': 4}, page_content='Country: UK\nCapital: London\nGDP (billions): 12')]

In [18]:
# JSON loader
# !pip install jq

# from langchain_community.document_loaders import JSONLoader

# file_path = './sources/random.json'
# loader = JSONLoader(file_path, jq_schema=".[]")
# loader.load()