Skip to content

stanfordnlp/en-worldwide-newswire

Repository files navigation

This dataset is composed of 1100 news articles from around the world, sourced from non-Western newswire. This dataset is specifically designed to exclude Western sourced texts and focuses on uncommon contexts of the English language. Below is a detailed breakdown of article origins.

South America: 94
Argentina 20
Bolivia 3
Chile 12
Colombia 10
Ecuador 10
Guyana 3
Paraguay 13
Peru 10
Uruguay 5
Venezuela 8
Central and North America: 178
Costa Rica 20
Cuba 15
El Salvador 20
Honduras 14
Mexico 29
Nicaragua 20
Panama 20
Indigenous Canadian 40
Africa: 265
General 65
Pan-Africa 20
Algeria 20
Ghana 20
Kenya 23
Mauritius 20
Egypt 22
Ethiopia 9
Namibia 28
South Africa 38
Asia: 347
General 14
China 104
Japan 15
India 71
Korea 37
Taiwan 26
Malaysia 11
Bangladesh 31
Thailand 27
Mongolia 11
Middle East: 167
Oman 12
Jordan 21
Israel 20
Iran 16
UAE 17
Saudi Arabia 27
Pakistan 2
Qatar 16
Kuwait 36
Oceania: 48
Indigenous Australia 28
Indigenous New Zealand 20

About

NER dataset built from foreign newswire

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published