You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My suggestion is to Identify the "Corpus" by IETF language tag rather than just language. What are the benefits? it would allow for different dialects to be made independent as i might want to just "teach" just one of them.
Why i'm bringging this idea up?
For example the corpus for the Portuguese language taking a look at it seems to have mostly Portuguese Brazilian (pt-BR) strings and some, here and there, in the Portuguese (pt or pt-PT). Using said corpus makes the bot a bit of biliangual freak, i'f im allowed to call it that :P
Same goes for the english corpus which, i'd say (not totally sure, but from some expressions), is English (en or en-GB) with United States English (en-US) in it.
Chinese is another language with ALOT of dialects.. (but this one is an unkown to me as i have zero knowledge in the language)
The story goes on..
The text was updated successfully, but these errors were encountered:
gunthercox
changed the title
Identify "Corpus" by IETF language tag rather than just language.
Identify "Corpus" by IETF language tag rather than just language
Jan 21, 2017
My suggestion is to Identify the "Corpus" by IETF language tag rather than just language. What are the benefits? it would allow for different dialects to be made independent as i might want to just "teach" just one of them.
Why i'm bringging this idea up?
For example the corpus for the Portuguese language taking a look at it seems to have mostly Portuguese Brazilian (pt-BR) strings and some, here and there, in the Portuguese (pt or pt-PT). Using said corpus makes the bot a bit of biliangual freak, i'f im allowed to call it that :P
Same goes for the english corpus which, i'd say (not totally sure, but from some expressions), is English (en or en-GB) with United States English (en-US) in it.
Chinese is another language with ALOT of dialects.. (but this one is an unkown to me as i have zero knowledge in the language)
The story goes on..
The text was updated successfully, but these errors were encountered: