Skip to content

Latest commit

History

History
88 lines (47 loc) 路 1.93 KB

loading_methods.mdx

File metadata and controls

88 lines (47 loc) 路 1.93 KB

Loading methods

Methods for listing and loading datasets:

Datasets

[[autodoc]] datasets.list_datasets

[[autodoc]] datasets.load_dataset

[[autodoc]] datasets.load_from_disk

[[autodoc]] datasets.load_dataset_builder

[[autodoc]] datasets.get_dataset_config_names

[[autodoc]] datasets.get_dataset_infos

[[autodoc]] datasets.get_dataset_split_names

[[autodoc]] datasets.inspect_dataset

From files

Configurations used to load data files. They are used when loading local files or a dataset repository:

  • local files: load_dataset("parquet", data_dir="path/to/data/dir")
  • dataset repository: load_dataset("allenai/c4")

You can pass arguments to load_dataset to configure data loading. For example you can specify the sep parameter to define the [~datasets.packaged_modules.csv.CsvConfig] that is used to load the data:

load_dataset("csv", data_dir="path/to/data/dir", sep="\t")

Text

[[autodoc]] datasets.packaged_modules.text.TextConfig

[[autodoc]] datasets.packaged_modules.text.Text

CSV

[[autodoc]] datasets.packaged_modules.csv.CsvConfig

[[autodoc]] datasets.packaged_modules.csv.Csv

JSON

[[autodoc]] datasets.packaged_modules.json.JsonConfig

[[autodoc]] datasets.packaged_modules.json.Json

Parquet

[[autodoc]] datasets.packaged_modules.parquet.ParquetConfig

[[autodoc]] datasets.packaged_modules.parquet.Parquet

Arrow

[[autodoc]] datasets.packaged_modules.arrow.ArrowConfig

[[autodoc]] datasets.packaged_modules.arrow.Arrow

SQL

[[autodoc]] datasets.packaged_modules.sql.SqlConfig

[[autodoc]] datasets.packaged_modules.sql.Sql

Images

[[autodoc]] datasets.packaged_modules.imagefolder.ImageFolderConfig

[[autodoc]] datasets.packaged_modules.imagefolder.ImageFolder

Audio

[[autodoc]] datasets.packaged_modules.audiofolder.AudioFolderConfig

[[autodoc]] datasets.packaged_modules.audiofolder.AudioFolder

WebDataset

[[autodoc]] datasets.packaged_modules.webdataset.WebDataset