PyTorchLTR provides several LTR datasets utility classes that can be used to automatically process and/or download the dataset files.
Warning
PyTorchLTR provides utilities to automatically download and prepare several public LTR datasets. We cannot vouch for the quality, correctness or usefulness of these datasets. We do not host or distribute any datasets and it is ultimately your responsibility to determine whether you have permission to use each dataset under its respective license.
The following is a usage example for the small Example3 dataset.
>>> from pytorchltr.datasets import Example3
>>> train = Example3(split="train")
>>> test = Example3(split="test")
>>> print(len(train))
3
>>> print(len(test))
1
>>> sample = train[0]
>>> print(sample["features"])
tensor([[1.0000, 1.0000, 0.0000, 0.3333, 0.0000],
[0.0000, 0.0000, 1.0000, 0.0000, 1.0000],
[0.0000, 1.0000, 0.0000, 1.0000, 0.0000],
[0.0000, 0.0000, 1.0000, 0.6667, 0.0000]])
>>> print(sample["relevance"])
tensor([3, 2, 1, 1])
>>> print(sample["n"])
4
Note
PyTorchLTR looks for dataset files in (and downloads them to) the following locations:
- The
location
arg if it is specified in the constructor of each respective Dataset class. $PYTORCHLTR_DATASET_PATH/{dataset_name}
if$PYTORCHLTR_DATASET_PATH
is a defined environment variable.$DATASET_PATH/{dataset_name}
if$DATASET_PATH
is a defined environment variable.$HOME/.pytorchltr_datasets/{dataset_name}
if all the above fail.
pytorchltr.datasets.Example3
__init__
collate_fn
__getitem__
__len__
pytorchltr.datasets.Istella
__init__
collate_fn
__getitem__
__len__
pytorchltr.datasets.IstellaS
__init__
collate_fn
__getitem__
__len__
pytorchltr.datasets.IstellaX
__init__
collate_fn
__getitem__
__len__
pytorchltr.datasets.MSLR10K
__init__
collate_fn
__getitem__
__len__
pytorchltr.datasets.MSLR30K
__init__
collate_fn
__getitem__
__len__