Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Turkish News Category Dataset - 270K - Lite Version #1967

Conversation

yavuzKomecoglu
Copy link
Contributor

This PR adds the Turkish News Categories Dataset (270K - Lite Version) dataset which is a text classification dataset by me, @basakbuluz and @serdarakyol.
This dataset contains the same news from the current interpress_news_category_tr dataset but contains less information, OCR errors are reduced, can be easily separated, and can be divided into 10 classes ("kültürsanat", "ekonomi", "siyaset", "eğitim", "dünya", "spor", "teknoloji", "magazin", "sağlık", "gündem") were rearranged.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks :)

datasets/interpress_news_category_tr_lite/README.md Outdated Show resolved Hide resolved
datasets/interpress_news_category_tr_lite/README.md Outdated Show resolved Hide resolved
@lhoestq
Copy link
Member

lhoestq commented Mar 2, 2021

Thanks for the change, merging now !

@lhoestq lhoestq merged commit d5afa3c into huggingface:master Mar 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants