Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding African Stopwords #3141

Closed
chrisemezue opened this issue Apr 16, 2023 · 1 comment
Closed

Adding African Stopwords #3141

chrisemezue opened this issue Apr 16, 2023 · 1 comment
Assignees

Comments

@chrisemezue
Copy link

Hello NLTK dev team,

The NLTK package is a renowned NLP package, supporting a host of crucial and foundational NLP processes like stemming, parsing, etc. The NLTK package is used by many practitioners, both in academia and industry.

However, the NLTK package does not have much support for African languages. Currently no African language stopwords are supported by NLTK (from my last query of the supported languages today).

We are trying to mitigate this issue with our work called African Stopwords project, where we curated (and verified) the largest African stopwords to date. We currently have stopwords for 13 African languages and are reaching out to ask if it would be possible to include these stopwords in the NLTK package, thereby enabling support for many NLP tasks in these African languages.

This is just the beginning: the African stopwords project is an ever-ongoing project to curate trusted stopwords for African languages. At the Masakhane and Lanfrica communities, we have a team of dedicated African language experts who take pains to curate and verify the stopwords, as well as add new stopwords for other languages.. We are also working on automatically gathering these stopwords and then having human evaluators do the review (see our paper for more about that). We support an open discussion forum to encourage talks around African stopwords. That is to say, we will be adding more stopwords

I am proposing a collaboration between NLTK.org, Masakhane and Lanfrica to enable the inclusion of African languages in the NTLK, starting with stopwords. While we plan to build our own packages for unique support of African languages (like the Preprocessor), I strongly believe that also integrating some of our efforts into the widely used NLTK ecosystem will enable a wider adoption, thus fostering the inclusion of African languages in language technologies.

Please let me know if this is something the NLTK team would be interested in, and how we could go about it.

Chris Emezue

@stevenbird stevenbird self-assigned this Apr 16, 2023
@stevenbird
Copy link
Member

stevenbird commented Apr 16, 2023

Thank you Chris. This is very welcome. Please submit links to the wordlists and metadata in the NLTK-Data issue tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants