Skip to content

Latest commit

 

History

History
117 lines (92 loc) · 2.9 KB

datasets.rst

File metadata and controls

117 lines (92 loc) · 2.9 KB

Datasets

PyTorchLTR provides several LTR datasets utility classes that can be used to automatically process and/or download the dataset files.

Warning

PyTorchLTR provides utilities to automatically download and prepare several public LTR datasets. We cannot vouch for the quality, correctness or usefulness of these datasets. We do not host or distribute any datasets and it is ultimately your responsibility to determine whether you have permission to use each dataset under its respective license.

Example

The following is a usage example for the small Example3 dataset.

>>> from pytorchltr.datasets import Example3
>>> train = Example3(split="train")
>>> test = Example3(split="test")
>>> print(len(train))
3
>>> print(len(test))
1
>>> sample = train[0]
>>> print(sample["features"])
tensor([[1.0000, 1.0000, 0.0000, 0.3333, 0.0000],
        [0.0000, 0.0000, 1.0000, 0.0000, 1.0000],
        [0.0000, 1.0000, 0.0000, 1.0000, 0.0000],
        [0.0000, 0.0000, 1.0000, 0.6667, 0.0000]])
>>> print(sample["relevance"])
tensor([3, 2, 1, 1])
>>> print(sample["n"])
4

Note

PyTorchLTR looks for dataset files in (and downloads them to) the following locations:

  • The location arg if it is specified in the constructor of each respective Dataset class.
  • $PYTORCHLTR_DATASET_PATH/{dataset_name} if $PYTORCHLTR_DATASET_PATH is a defined environment variable.
  • $DATASET_PATH/{dataset_name} if $DATASET_PATH is a defined environment variable.
  • $HOME/.pytorchltr_datasets/{dataset_name} if all the above fail.

SVMRank datasets

Example3

pytorchltr.datasets.Example3

__init__

collate_fn

__getitem__

__len__

Istella

pytorchltr.datasets.Istella

__init__

collate_fn

__getitem__

__len__

Istella-S

pytorchltr.datasets.IstellaS

__init__

collate_fn

__getitem__

__len__

Istella-X

pytorchltr.datasets.IstellaX

__init__

collate_fn

__getitem__

__len__

MSLR-WEB10K

pytorchltr.datasets.MSLR10K

__init__

collate_fn

__getitem__

__len__

MSLR-WEB30K

pytorchltr.datasets.MSLR30K

__init__

collate_fn

__getitem__

__len__