Extension for accessing Webis datasets via ir_datasets.
Install the package from PyPI:
pip install ir-datasets-webis
Using this extension is simple. Just register the additional datasets by calling register()
. Then you can load the datasets with ir_datasets as usual:
from ir_datasets import load
from ir_datasets_webis import register
# Register the Webis datasets.
register()
# Use ir_datasets as usual.
dataset = load("webis-mastodon-2024")
If you want to use the CLI, just use the ir_datasets_webis
instead of ir_datasets
. All CLI commands will work as usual, e.g., to list the available datasets:
ir_datasets_webis list
ID | Name |
---|---|
webis-mastodon-2024 |
Webis Mastodon Corpus 2024 |
To build this package and contribute to its development you need to install the build
, setuptools
, and wheel
packages (pre-installed on most systems):
pip install build setuptools wheel
Create and activate a virtual environment:
python3.10 -m venv venv/
source venv/bin/activate
Install the package and test dependencies:
pip install -e .[tests]
Verify your changes against the test suite to verify.
ruff check . # Code format and LINT
mypy . # Static typing
bandit -c pyproject.toml -r . # Security
pytest . # Unit tests
Please also add tests for your newly developed code.
Wheels for this package can be built with:
python -m build
If you have any problems using this package, please file an issue. We're happy to help!
This repository is released under the MIT license.