Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify "Corpus" by IETF language tag rather than just language #504

Closed
duramato opened this issue Dec 13, 2016 · 1 comment
Closed

Identify "Corpus" by IETF language tag rather than just language #504

duramato opened this issue Dec 13, 2016 · 1 comment

Comments

@duramato
Copy link

duramato commented Dec 13, 2016

My suggestion is to Identify the "Corpus" by IETF language tag rather than just language. What are the benefits? it would allow for different dialects to be made independent as i might want to just "teach" just one of them.

Why i'm bringging this idea up?

For example the corpus for the Portuguese language taking a look at it seems to have mostly Portuguese Brazilian (pt-BR) strings and some, here and there, in the Portuguese (pt or pt-PT). Using said corpus makes the bot a bit of biliangual freak, i'f im allowed to call it that :P

Same goes for the english corpus which, i'd say (not totally sure, but from some expressions), is English (en or en-GB) with United States English (en-US) in it.

Chinese is another language with ALOT of dialects.. (but this one is an unkown to me as i have zero knowledge in the language)
The story goes on..

@gunthercox
Copy link
Owner

This is a good suggestion. Making changes to use IETF language tags sounds like it would really benefit ChatterBot.

@gunthercox gunthercox changed the title Identify "Corpus" by IETF language tag rather than just language. Identify "Corpus" by IETF language tag rather than just language Jan 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants