Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features of the sample text for languages #52

Open
antlarr opened this issue Jun 25, 2021 · 1 comment
Open

Features of the sample text for languages #52

antlarr opened this issue Jun 25, 2021 · 1 comment

Comments

@antlarr
Copy link

antlarr commented Jun 25, 2021

I noticed the plugins directory contains some free books to extract language characteristics (n-grams) that I guess are used for the autocomplete feature.

In the case of Spanish, I saw that the book "Don Quijote de la Mancha" is used as sample text. This is good because the book is long and it has a large vocabulary, but it has the problem that it was written in 1605-1615, so it uses quite a lot of old Spanish vocabulary and expressions that are not used at all these days and it doesn't include new words and expressions that appeared since then.

So I think it would be good to find a substitute text.

Apart from it being free for (re)distribution. Are there any special features that the text should have?

@dobey
Copy link
Contributor

dobey commented Sep 16, 2021

Apart from it being free for (re)distribution. Are there any special features that the text should have?

I think, generally, we've relied on things that are in the Public Domain. Which of course has the common issue of being too old.

The auto completion and correction stuff definitely needs some major improvements. I'm not entirely sure what to do there yet, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants