# CSV

>A [comma-separated values (CSV)](https://en.wikipedia.org/wiki/Comma-separated_values) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas.

Load [csv](https://en.wikipedia.org/wiki/Comma-separated_values) data with a single row per document.

In [1]:
from langchain.document_loaders.csv_loader import CSVLoader

In [2]:
loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv')

data = loader.load()

In [3]:
print(data)

[Document(page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 0}), Document(page_content='Team: Reds\n"Payroll (millions)": 82.20\n"Wins": 97', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 1}), Document(page_content='Team: Yankees\n"Payroll (millions)": 197.96\n"Wins": 95', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 2}), Document(page_content='Team: Giants\n"Payroll (millions)": 117.62\n"Wins": 94', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 3}), Document(page_content='Team: Braves\n"Payroll (millions)": 83.31\n"Wins": 94', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 4}), Document(page_content='Team: Athletics\n"Payroll (millions)": 55.37\n"Wins": 94', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 5}), Document(page_content='Team: Rangers\n"Payroll (millions)": 120.51\n"Wins": 93', metadata={'source': './

In [4]:
print(data[0])

page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98' metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 0}


## Customizing the csv parsing and loading

See the [csv module](https://docs.python.org/3/library/csv.html) documentation for more information of what csv args are supported.

In [7]:
loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', csv_args={
    # delimiter是分隔符，quotechar是引用符，fieldnames是列名,skipinitialspace是去除空格 ,encoding是编码,默认utf-8,如果是中文的话，需要改成gbk
    'delimiter': ',',
    'quotechar': '"',
    'fieldnames': ['球队', '薪资总额（百万美元）', '胜场数'],
    'skipinitialspace': True,
})

data = loader.load()

In [8]:
print(data)

[Document(page_content='球队: Team\n薪资总额（百万美元）: "Payroll (millions)"\n胜场数: "Wins"', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 0}), Document(page_content='球队: Nationals\n薪资总额（百万美元）: 81.34\n胜场数: 98', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 1}), Document(page_content='球队: Reds\n薪资总额（百万美元）: 82.20\n胜场数: 97', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 2}), Document(page_content='球队: Yankees\n薪资总额（百万美元）: 197.96\n胜场数: 95', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 3}), Document(page_content='球队: Giants\n薪资总额（百万美元）: 117.62\n胜场数: 94', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 4}), Document(page_content='球队: Braves\n薪资总额（百万美元）: 83.31\n胜场数: 94', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 5}), Document(page_content='球队: Athletics\n薪资总额（百万美元）: 55.37\n胜场数: 94', metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 6}), Document(page_content='球队: Rangers\n薪资总额（百万美

In [9]:
print(data[0])

page_content='球队: Team\n薪资总额（百万美元）: "Payroll (millions)"\n胜场数: "Wins"' metadata={'source': './example_data/mlb_teams_2012.csv', 'row': 0}


## Specify a column to identify the document source

Use the `source_column` argument to specify a source for the document created from each row. Otherwise `file_path` will be used as the source for all documents created from the CSV file.

This is useful when using documents loaded from CSV files for chains that answer questions using sources.

In [10]:
# source_column是指定一个列来标识文档源
loader = CSVLoader(file_path='./example_data/mlb_teams_2012.csv', source_column="Team")

data = loader.load()

In [11]:
print(data)

[Document(page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98', metadata={'source': 'Nationals', 'row': 0}), Document(page_content='Team: Reds\n"Payroll (millions)": 82.20\n"Wins": 97', metadata={'source': 'Reds', 'row': 1}), Document(page_content='Team: Yankees\n"Payroll (millions)": 197.96\n"Wins": 95', metadata={'source': 'Yankees', 'row': 2}), Document(page_content='Team: Giants\n"Payroll (millions)": 117.62\n"Wins": 94', metadata={'source': 'Giants', 'row': 3}), Document(page_content='Team: Braves\n"Payroll (millions)": 83.31\n"Wins": 94', metadata={'source': 'Braves', 'row': 4}), Document(page_content='Team: Athletics\n"Payroll (millions)": 55.37\n"Wins": 94', metadata={'source': 'Athletics', 'row': 5}), Document(page_content='Team: Rangers\n"Payroll (millions)": 120.51\n"Wins": 93', metadata={'source': 'Rangers', 'row': 6}), Document(page_content='Team: Orioles\n"Payroll (millions)": 81.43\n"Wins": 93', metadata={'source': 'Orioles', 'row': 7}), Document(page_

In [12]:
print(data[0])

page_content='Team: Nationals\n"Payroll (millions)": 81.34\n"Wins": 98' metadata={'source': 'Nationals', 'row': 0}
