GitHub - sciknoworg/tib-sid: TIB-SID: A bilingual (English/German) dataset of library catalog records with GND subject indexing for research on automated subject tagging and extreme multi-label classification.

The TIB Subject Indexing Dataset (TIB-SID) is a bilingual benchmark for extreme multi-label text classification (XMTC) over real library records, designed for domain classification and GND-based subject indexing. The dataset combines a large, structured, authority-controlled label space with long-tail sparsity, cross-lingual variation, and real-world domain imbalance, making it substantially closer to operational library cataloging than standard text classification benchmarks.

✨ At a glance

136,569 library records in JSON-LD with predefined train / dev / test benchmark splits
Languages: English and German
28 domains
Record types: article, book, conference, report, thesis

⬇️ Download

Download the dataset here: data

🔗 Related Links

TIB-SID was introduced through the LLMs4Subjects shared tasks organized in 2025. More than 12 LLM-based systems were developed and evaluated on the dataset by participating teams worldwide. The shared task websites provide additional context, task details, and leaderboard results.

📖 Citation

If you use TIB-SID, please cite:

Coming soon...

⚖️ License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
GND		GND
assets		assets
evaluation		evaluation
library-records-dataset		library-records-dataset
.gitattributes		.gitattributes
.gitignore		.gitignore
28_domains_list.csv		28_domains_list.csv
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ At a glance

⬇️ Download

🔗 Related Links

📖 Citation

⚖️ License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨ At a glance

⬇️ Download

🔗 Related Links

📖 Citation

⚖️ License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages