Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support Malayalam #53

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions alphabet.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ const (
tamil
telugu
thai
malayalam
)

func (alphabet alphabet) matches(text string) bool {
Expand Down Expand Up @@ -82,6 +83,8 @@ func (alphabet alphabet) matches(text string) bool {
return teluguChars.MatchString(text)
case thai:
return thaiChars.MatchString(text)
case malayalam:
return malayalamChars.MatchString(text)
default:
return false
}
Expand Down Expand Up @@ -136,6 +139,7 @@ var (
tamilChars = createRegexp("Tamil")
teluguChars = createRegexp("Telugu")
thaiChars = createRegexp("Thai")
malayalamChars = createRegexp("Malayalam")
)

func createRegexp(charClass string) *regexp.Regexp {
Expand Down
1 change: 1 addition & 0 deletions builder_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ func TestLanguageDetectorBuilder_FromAllLanguagesWithout(t *testing.T) {
Xhosa,
Yoruba,
Zulu,
Malayalam,
},
builder.getLanguages(),
)
Expand Down
11 changes: 6 additions & 5 deletions cmd/accuracy-reports/aggregated-accuracy-values.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Afrikaans,51,21,39,92,55,22,46,98,64,38,62,93,79,58,81,97
Albanian,NaN,NaN,NaN,NaN,55,18,48,98,80,54,86,99,88,69,95,100
Arabic,89,77,91,99,90,79,92,100,94,88,96,99,98,96,99,100
Armenian,NaN,NaN,NaN,NaN,99,100,100,97,100,100,100,100,100,100,100,100
Azerbaijani,64,45,58,91,81,62,82,99,82,71,78,96,90,77,92,99
Azerbaijani,65,45,58,91,81,62,82,99,82,71,78,96,90,77,92,99
Basque,NaN,NaN,NaN,NaN,62,33,62,92,75,56,76,92,84,71,87,93
Belarusian,81,64,80,98,84,67,86,100,92,80,95,100,97,92,99,100
Bengali,100,100,100,100,99,98,99,99,100,100,100,100,100,100,100,100
Expand All @@ -16,7 +16,7 @@ Croatian,55,28,44,91,42,26,42,58,60,36,57,86,73,53,74,90
Czech,50,31,46,71,64,39,65,88,71,54,72,87,80,66,84,91
Danish,47,24,38,79,58,26,54,95,70,45,70,95,81,61,84,98
Dutch,47,22,36,82,58,29,47,97,64,36,61,94,77,55,81,96
English,49,17,35,94,54,22,44,97,63,29,62,97,81,55,89,99
English,49,18,35,94,54,22,44,97,63,29,62,97,81,55,89,99
Esperanto,52,25,45,88,57,22,51,98,66,44,61,93,84,67,85,98
Estonian,61,36,53,94,70,41,69,99,83,62,88,99,92,80,96,100
Finnish,71,45,70,98,80,58,84,99,91,77,95,100,96,90,98,100
Expand All @@ -27,7 +27,7 @@ German,65,38,60,97,66,40,62,98,80,57,84,99,89,74,94,100
Greek,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100,100
Gujarati,100,100,100,100,100,99,100,100,100,100,100,100,100,100,100,100
Hebrew,90,76,94,99,NaN,NaN,NaN,NaN,100,100,100,100,100,100,100,100
Hindi,52,27,40,88,58,34,45,95,33,11,20,67,73,61,64,95
Hindi,52,26,40,88,58,34,45,95,33,11,20,67,73,61,64,95
Hungarian,62,37,53,95,76,53,76,99,90,77,94,100,95,87,98,100
Icelandic,NaN,NaN,NaN,NaN,71,42,70,99,88,72,92,99,93,83,97,100
Indonesian,67,39,66,95,46,26,45,66,47,25,46,71,61,39,61,83
Expand All @@ -39,7 +39,7 @@ Korean,100,100,100,100,99,100,100,98,100,100,100,100,100,100,100,100
Latin,NaN,NaN,NaN,NaN,62,44,58,83,73,49,76,94,87,72,93,97
Latvian,59,36,54,87,75,51,77,98,87,75,90,97,93,85,97,99
Lithuanian,62,38,56,92,72,42,75,99,87,76,89,98,95,86,98,100
Macedonian,62,39,55,94,60,30,54,97,72,52,70,95,84,66,86,99
Macedonian,62,39,54,94,60,30,54,97,72,52,70,95,84,66,86,99
Malay,NaN,NaN,NaN,NaN,22,11,22,34,31,22,36,35,31,26,38,28
Maori,NaN,NaN,NaN,NaN,52,22,43,91,82,62,87,98,91,82,92,99
Marathi,73,52,74,93,84,69,84,98,39,16,30,72,85,74,85,96
Expand All @@ -49,7 +49,7 @@ Persian,70,46,66,99,76,57,70,99,80,62,80,98,90,78,94,100
Polish,66,45,59,94,77,51,80,99,90,77,93,99,95,85,98,100
Portuguese,57,26,48,96,53,21,40,97,69,42,70,95,81,59,85,99
Punjabi,100,100,100,100,100,99,100,100,100,100,100,100,100,100,100,100
Romanian,59,34,52,90,53,24,48,88,72,49,74,94,87,69,92,99
Romanian,59,35,52,90,53,24,48,88,72,49,74,94,87,69,92,99
Russian,53,40,52,68,71,48,72,93,78,59,84,92,90,76,95,98
Serbian,57,34,51,86,78,63,75,95,78,62,80,91,88,74,90,99
Shona,68,44,65,95,76,51,79,99,81,56,86,100,91,78,96,100
Expand All @@ -74,3 +74,4 @@ Welsh,NaN,NaN,NaN,NaN,69,43,66,98,82,61,87,99,91,78,96,99
Xhosa,NaN,NaN,NaN,NaN,66,40,65,92,69,45,67,94,82,64,85,98
Yoruba,22,11,14,41,15,5,11,28,62,33,61,92,74,50,77,96
Zulu,70,44,68,98,63,35,63,92,70,45,72,94,81,62,83,97
Malayalam,100,100,100,100,99,99,100,100,43,23,38,69,100,100,100,99
16 changes: 16 additions & 0 deletions cmd/accuracy-reports/cld3/Malayalam.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Malayalam #####

>>> Accuracy on average: 99.47%

>> Detection of 1000 single words (average length: 10 chars)
Accuracy: 99.10%
Erroneously classified as Unknown: 0.40%, Yoruba: 0.30%, Finnish: 0.10%, Hungarian: 0.10%

>> Detection of 1000 word pairs (average length: 20 chars)
Accuracy: 99.80%
Erroneously classified as Marathi: 0.10%, Vietnamese: 0.10%

>> Detection of 1000 sentences (average length: 127 chars)
Accuracy: 99.50%
Erroneously classified as Bengali: 0.20%, Japanese: 0.10%, Marathi: 0.10%, Vietnamese: 0.10%

16 changes: 16 additions & 0 deletions cmd/accuracy-reports/lingua-high-accuracy/Malayalam.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Malayalam #####

>>> Accuracy on average: 99.80%

>> Detection of 1000 single words (average length: 10 chars)
Accuracy: 100.00%
Erroneously classified as

>> Detection of 1000 word pairs (average length: 20 chars)
Accuracy: 100.00%
Erroneously classified as

>> Detection of 1000 sentences (average length: 127 chars)
Accuracy: 99.40%
Erroneously classified as Unknown: 0.30%, Bengali: 0.20%, Arabic: 0.10%

16 changes: 16 additions & 0 deletions cmd/accuracy-reports/lingua-low-accuracy/Malayalam.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
##### Malayalam #####

>>> Accuracy on average: 43.33%

>> Detection of 1000 single words (average length: 10 chars)
Accuracy: 22.70%
Erroneously classified as Unknown: 77.30%

>> Detection of 1000 word pairs (average length: 20 chars)
Accuracy: 37.90%
Erroneously classified as Unknown: 62.10%

>> Detection of 1000 sentences (average length: 127 chars)
Accuracy: 69.40%
Erroneously classified as Unknown: 30.30%, Bengali: 0.20%, Arabic: 0.10%

2 changes: 1 addition & 1 deletion cmd/accuracy-reports/whatlang/Afrikaans.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 21.00%
Erroneously classified as Unknown: 16.00%, Dutch: 10.30%, German: 7.00%, Danish: 5.70%, Bokmal: 4.20%, Estonian: 4.20%, Nynorsk: 3.40%, French: 3.20%, Swedish: 1.90%, Finnish: 1.80%, Turkish: 1.70%, Italian: 1.50%, Latvian: 1.50%, Romanian: 1.50%, Spanish: 1.50%, Portuguese: 1.40%, Somali: 1.30%, English: 1.20%, Hungarian: 1.20%, Indonesian: 1.20%, Shona: 1.00%, Slovene: 1.00%, Zulu: 0.90%, Esperanto: 0.80%, Lithuanian: 0.80%, Polish: 0.80%, Czech: 0.60%, Tagalog: 0.50%, Croatian: 0.40%, Vietnamese: 0.30%, Azerbaijani: 0.20%
Erroneously classified as Unknown: 16.00%, Dutch: 10.30%, German: 7.00%, Danish: 5.60%, Bokmal: 4.20%, Estonian: 4.20%, Nynorsk: 3.40%, French: 3.20%, Swedish: 1.90%, Finnish: 1.80%, Turkish: 1.70%, Italian: 1.50%, Latvian: 1.50%, Romanian: 1.50%, Spanish: 1.50%, Portuguese: 1.40%, Hungarian: 1.30%, Somali: 1.30%, English: 1.20%, Indonesian: 1.20%, Shona: 1.00%, Slovene: 1.00%, Zulu: 0.90%, Esperanto: 0.80%, Lithuanian: 0.80%, Polish: 0.80%, Czech: 0.60%, Tagalog: 0.50%, Croatian: 0.40%, Vietnamese: 0.30%, Azerbaijani: 0.20%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 39.30%
Expand Down
2 changes: 1 addition & 1 deletion cmd/accuracy-reports/whatlang/Arabic.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

>> Detection of 1000 single words (average length: 6 chars)
Accuracy: 77.30%
Erroneously classified as Unknown: 12.50%, Persian: 7.10%, Urdu: 3.10%
Erroneously classified as Unknown: 12.60%, Persian: 7.10%, Urdu: 3.00%

>> Detection of 1000 word pairs (average length: 14 chars)
Accuracy: 91.20%
Expand Down
10 changes: 5 additions & 5 deletions cmd/accuracy-reports/whatlang/Azerbaijani.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
##### Azerbaijani #####

>>> Accuracy on average: 64.50%
>>> Accuracy on average: 64.57%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 44.60%
Erroneously classified as Unknown: 24.00%, Turkish: 8.80%, Somali: 2.10%, Tagalog: 2.00%, Indonesian: 1.80%, Italian: 1.50%, Finnish: 1.40%, Croatian: 1.00%, French: 1.00%, Estonian: 0.90%, German: 0.90%, Lithuanian: 0.90%, Portuguese: 0.90%, Spanish: 0.90%, Afrikaans: 0.70%, English: 0.70%, Shona: 0.70%, Romanian: 0.60%, Zulu: 0.60%, Hungarian: 0.50%, Nynorsk: 0.50%, Swedish: 0.50%, Danish: 0.40%, Latvian: 0.40%, Slovene: 0.40%, Esperanto: 0.30%, Bokmal: 0.20%, Czech: 0.20%, Polish: 0.20%, Yoruba: 0.20%, Dutch: 0.10%, Vietnamese: 0.10%
Accuracy: 44.70%
Erroneously classified as Unknown: 23.80%, Turkish: 8.80%, Somali: 2.10%, Tagalog: 2.00%, Indonesian: 1.80%, Italian: 1.50%, Finnish: 1.40%, Croatian: 1.00%, French: 1.00%, Estonian: 0.90%, German: 0.90%, Lithuanian: 0.90%, Portuguese: 0.90%, Spanish: 0.90%, Afrikaans: 0.80%, English: 0.70%, Shona: 0.70%, Romanian: 0.60%, Zulu: 0.60%, Danish: 0.50%, Hungarian: 0.50%, Nynorsk: 0.50%, Swedish: 0.50%, Latvian: 0.40%, Slovene: 0.40%, Esperanto: 0.30%, Bokmal: 0.20%, Czech: 0.20%, Polish: 0.20%, Dutch: 0.10%, Vietnamese: 0.10%, Yoruba: 0.10%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 57.70%
Erroneously classified as Unknown: 18.70%, Turkish: 8.30%, Indonesian: 2.20%, Italian: 1.70%, Tagalog: 1.60%, Somali: 1.40%, Swedish: 0.90%, Estonian: 0.70%, Spanish: 0.70%, Finnish: 0.60%, German: 0.50%, Latvian: 0.50%, Lithuanian: 0.50%, Portuguese: 0.50%, Croatian: 0.40%, English: 0.40%, Slovene: 0.40%, Esperanto: 0.30%, Nynorsk: 0.30%, Romanian: 0.30%, Zulu: 0.30%, Afrikaans: 0.20%, Dutch: 0.20%, Hungarian: 0.20%, Shona: 0.20%, Bokmal: 0.10%, Czech: 0.10%, French: 0.10%

>> Detection of 1000 sentences (average length: 107 chars)
Accuracy: 91.20%
Erroneously classified as Turkish: 4.70%, Unknown: 3.20%, Italian: 0.20%, Somali: 0.20%, Croatian: 0.10%, Finnish: 0.10%, Indonesian: 0.10%, Romanian: 0.10%, Swedish: 0.10%
Accuracy: 91.30%
Erroneously classified as Turkish: 4.60%, Unknown: 3.20%, Italian: 0.20%, Somali: 0.20%, Croatian: 0.10%, Finnish: 0.10%, Indonesian: 0.10%, Romanian: 0.10%, Swedish: 0.10%

8 changes: 4 additions & 4 deletions cmd/accuracy-reports/whatlang/Bokmal.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##### Bokmal #####

>>> Accuracy on average: 34.47%
>>> Accuracy on average: 34.43%

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 15.00%
Erroneously classified as Danish: 13.30%, Unknown: 12.80%, Nynorsk: 9.80%, Swedish: 6.90%, Dutch: 4.30%, German: 4.10%, Afrikaans: 3.30%, Estonian: 3.30%, French: 3.30%, Spanish: 2.30%, Esperanto: 2.20%, Italian: 2.20%, Romanian: 2.20%, Hungarian: 2.00%, Turkish: 2.00%, English: 1.60%, Portuguese: 1.50%, Indonesian: 1.40%, Croatian: 1.00%, Tagalog: 0.80%, Finnish: 0.70%, Latvian: 0.70%, Czech: 0.60%, Lithuanian: 0.50%, Polish: 0.50%, Slovene: 0.50%, Somali: 0.40%, Vietnamese: 0.30%, Zulu: 0.30%, Shona: 0.20%
Accuracy: 14.90%
Erroneously classified as Danish: 13.50%, Unknown: 12.70%, Nynorsk: 9.80%, Swedish: 6.90%, Dutch: 4.30%, German: 4.10%, Afrikaans: 3.30%, Estonian: 3.30%, French: 3.30%, Spanish: 2.30%, Esperanto: 2.20%, Italian: 2.20%, Romanian: 2.20%, Hungarian: 2.00%, Turkish: 2.00%, English: 1.60%, Portuguese: 1.50%, Indonesian: 1.40%, Croatian: 1.00%, Tagalog: 0.80%, Finnish: 0.70%, Latvian: 0.70%, Czech: 0.60%, Lithuanian: 0.50%, Polish: 0.50%, Slovene: 0.50%, Somali: 0.40%, Vietnamese: 0.30%, Zulu: 0.30%, Shona: 0.20%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 28.50%
Erroneously classified as Danish: 17.70%, Nynorsk: 16.90%, Unknown: 5.00%, Swedish: 4.90%, Afrikaans: 3.40%, French: 3.40%, Dutch: 2.70%, German: 2.30%, Estonian: 1.90%, English: 1.70%, Esperanto: 1.40%, Portuguese: 1.30%, Italian: 1.10%, Spanish: 1.10%, Turkish: 1.10%, Finnish: 0.90%, Hungarian: 0.90%, Tagalog: 0.60%, Czech: 0.50%, Romanian: 0.50%, Zulu: 0.50%, Indonesian: 0.40%, Croatian: 0.30%, Slovene: 0.30%, Latvian: 0.20%, Lithuanian: 0.20%, Polish: 0.20%, Vietnamese: 0.10%
Erroneously classified as Danish: 17.70%, Nynorsk: 16.90%, Swedish: 5.00%, Unknown: 5.00%, Afrikaans: 3.40%, French: 3.40%, Dutch: 2.60%, German: 2.30%, Estonian: 1.90%, English: 1.70%, Esperanto: 1.40%, Portuguese: 1.40%, Spanish: 1.10%, Turkish: 1.10%, Italian: 1.00%, Finnish: 0.90%, Hungarian: 0.90%, Tagalog: 0.60%, Czech: 0.50%, Romanian: 0.50%, Zulu: 0.50%, Indonesian: 0.40%, Croatian: 0.30%, Slovene: 0.30%, Latvian: 0.20%, Lithuanian: 0.20%, Polish: 0.20%, Vietnamese: 0.10%

>> Detection of 1000 sentences (average length: 98 chars)
Accuracy: 59.90%
Expand Down
2 changes: 1 addition & 1 deletion cmd/accuracy-reports/whatlang/Bulgarian.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 36.80%
Erroneously classified as Macedonian: 20.30%, Russian: 12.70%, Serbian: 8.40%, Unknown: 8.20%, Ukrainian: 7.30%, Belarusian: 4.10%, Azerbaijani: 2.20%
Erroneously classified as Macedonian: 20.30%, Russian: 12.60%, Serbian: 8.40%, Unknown: 8.20%, Ukrainian: 7.40%, Belarusian: 4.10%, Azerbaijani: 2.20%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 56.90%
Expand Down
6 changes: 3 additions & 3 deletions cmd/accuracy-reports/whatlang/Croatian.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
##### Croatian #####

>>> Accuracy on average: 54.57%
>>> Accuracy on average: 54.60%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 28.30%
Erroneously classified as Unknown: 19.00%, Slovene: 13.70%, Czech: 3.70%, Romanian: 2.70%, Esperanto: 2.60%, Estonian: 2.40%, Lithuanian: 2.00%, Nynorsk: 1.80%, Polish: 1.80%, Portuguese: 1.80%, Swedish: 1.70%, Zulu: 1.70%, Spanish: 1.60%, Tagalog: 1.40%, Afrikaans: 1.30%, Bokmal: 1.30%, Dutch: 1.20%, Turkish: 1.10%, Italian: 1.00%, Latvian: 1.00%, Shona: 1.00%, Danish: 0.90%, English: 0.90%, Finnish: 0.90%, Indonesian: 0.80%, German: 0.70%, French: 0.60%, Hungarian: 0.50%, Somali: 0.40%, Azerbaijani: 0.10%, Vietnamese: 0.10%
Accuracy: 28.40%
Erroneously classified as Unknown: 19.10%, Slovene: 13.70%, Czech: 3.60%, Romanian: 2.70%, Esperanto: 2.60%, Estonian: 2.40%, Lithuanian: 2.00%, Nynorsk: 1.80%, Polish: 1.80%, Portuguese: 1.80%, Swedish: 1.70%, Zulu: 1.70%, Spanish: 1.60%, Afrikaans: 1.30%, Bokmal: 1.30%, Tagalog: 1.30%, Dutch: 1.20%, Turkish: 1.10%, Italian: 1.00%, Latvian: 1.00%, Shona: 1.00%, Danish: 0.90%, English: 0.90%, Finnish: 0.90%, Indonesian: 0.80%, German: 0.70%, French: 0.60%, Hungarian: 0.50%, Somali: 0.40%, Azerbaijani: 0.10%, Vietnamese: 0.10%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 44.00%
Expand Down
8 changes: 4 additions & 4 deletions cmd/accuracy-reports/whatlang/Czech.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##### Czech #####

>>> Accuracy on average: 49.57%
>>> Accuracy on average: 49.53%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 31.40%
Erroneously classified as Unknown: 17.00%, Croatian: 7.30%, Slovene: 5.70%, Polish: 3.70%, Esperanto: 3.40%, Romanian: 2.80%, English: 2.60%, German: 2.00%, Portuguese: 2.00%, French: 1.90%, Shona: 1.80%, Zulu: 1.80%, Estonian: 1.70%, Nynorsk: 1.40%, Spanish: 1.40%, Italian: 1.20%, Afrikaans: 1.10%, Somali: 1.00%, Turkish: 1.00%, Hungarian: 0.90%, Lithuanian: 0.90%, Tagalog: 0.90%, Indonesian: 0.80%, Swedish: 0.80%, Bokmal: 0.70%, Finnish: 0.70%, Latvian: 0.60%, Yoruba: 0.50%, Danish: 0.40%, Dutch: 0.40%, Vietnamese: 0.20%
Erroneously classified as Unknown: 17.00%, Croatian: 7.30%, Slovene: 5.60%, Polish: 3.70%, Esperanto: 3.40%, Romanian: 2.80%, English: 2.60%, German: 2.00%, Portuguese: 2.00%, French: 1.90%, Shona: 1.80%, Zulu: 1.80%, Estonian: 1.70%, Nynorsk: 1.40%, Spanish: 1.40%, Italian: 1.20%, Afrikaans: 1.10%, Somali: 1.00%, Turkish: 1.00%, Hungarian: 0.90%, Lithuanian: 0.90%, Tagalog: 0.90%, Indonesian: 0.80%, Swedish: 0.80%, Bokmal: 0.70%, Finnish: 0.70%, Latvian: 0.60%, Yoruba: 0.60%, Danish: 0.40%, Dutch: 0.40%, Vietnamese: 0.20%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 46.30%
Erroneously classified as Unknown: 9.10%, Croatian: 8.70%, Slovene: 5.50%, Polish: 2.90%, Esperanto: 2.80%, Portuguese: 2.40%, Spanish: 2.20%, German: 2.10%, Romanian: 1.90%, French: 1.50%, Estonian: 1.40%, Tagalog: 1.30%, Danish: 1.20%, Dutch: 1.20%, Italian: 1.20%, Hungarian: 1.10%, English: 1.00%, Afrikaans: 0.80%, Bokmal: 0.80%, Latvian: 0.80%, Zulu: 0.80%, Indonesian: 0.70%, Finnish: 0.40%, Nynorsk: 0.40%, Shona: 0.40%, Lithuanian: 0.30%, Swedish: 0.30%, Somali: 0.20%, Turkish: 0.20%, Yoruba: 0.10%
Accuracy: 46.20%
Erroneously classified as Unknown: 9.10%, Croatian: 8.70%, Slovene: 5.50%, Polish: 2.90%, Esperanto: 2.80%, Portuguese: 2.40%, Spanish: 2.20%, German: 2.10%, Romanian: 1.90%, French: 1.50%, Estonian: 1.40%, Italian: 1.30%, Tagalog: 1.30%, Danish: 1.20%, Dutch: 1.20%, Hungarian: 1.10%, English: 1.00%, Afrikaans: 0.80%, Bokmal: 0.80%, Latvian: 0.80%, Zulu: 0.80%, Indonesian: 0.70%, Finnish: 0.40%, Nynorsk: 0.40%, Shona: 0.40%, Lithuanian: 0.30%, Swedish: 0.30%, Somali: 0.20%, Turkish: 0.20%, Yoruba: 0.10%

>> Detection of 1000 sentences (average length: 93 chars)
Accuracy: 71.00%
Expand Down
6 changes: 3 additions & 3 deletions cmd/accuracy-reports/whatlang/Danish.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
##### Danish #####

>>> Accuracy on average: 46.80%
>>> Accuracy on average: 46.87%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 23.80%
Erroneously classified as Unknown: 13.10%, Bokmal: 9.60%, Nynorsk: 6.10%, Swedish: 5.30%, Dutch: 5.20%, German: 4.00%, French: 3.90%, Estonian: 3.40%, Afrikaans: 3.20%, English: 2.80%, Turkish: 2.40%, Spanish: 2.30%, Hungarian: 2.10%, Italian: 1.80%, Esperanto: 1.30%, Slovene: 1.30%, Romanian: 1.10%, Czech: 1.00%, Lithuanian: 0.90%, Portuguese: 0.90%, Croatian: 0.80%, Indonesian: 0.70%, Latvian: 0.60%, Zulu: 0.60%, Finnish: 0.50%, Shona: 0.40%, Somali: 0.30%, Tagalog: 0.30%, Vietnamese: 0.20%, Polish: 0.10%
Accuracy: 24.00%
Erroneously classified as Unknown: 13.00%, Bokmal: 9.50%, Nynorsk: 6.20%, Swedish: 5.30%, Dutch: 5.20%, German: 4.00%, French: 3.90%, Estonian: 3.40%, Afrikaans: 3.20%, English: 2.80%, Turkish: 2.40%, Spanish: 2.30%, Hungarian: 2.10%, Italian: 1.80%, Esperanto: 1.30%, Slovene: 1.30%, Czech: 1.00%, Romanian: 1.00%, Lithuanian: 0.90%, Portuguese: 0.90%, Croatian: 0.80%, Indonesian: 0.70%, Latvian: 0.60%, Zulu: 0.60%, Finnish: 0.50%, Shona: 0.40%, Somali: 0.30%, Tagalog: 0.30%, Vietnamese: 0.20%, Polish: 0.10%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 37.70%
Expand Down
4 changes: 2 additions & 2 deletions cmd/accuracy-reports/whatlang/Dutch.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@

>> Detection of 1000 single words (average length: 9 chars)
Accuracy: 22.40%
Erroneously classified as Unknown: 14.60%, German: 10.00%, Afrikaans: 9.60%, Danish: 3.70%, French: 3.70%, Estonian: 3.60%, English: 3.40%, Bokmal: 3.30%, Spanish: 3.00%, Finnish: 2.40%, Nynorsk: 2.40%, Swedish: 2.10%, Indonesian: 1.40%, Romanian: 1.40%, Hungarian: 1.30%, Portuguese: 1.30%, Slovene: 1.20%, Lithuanian: 1.10%, Turkish: 1.10%, Zulu: 1.10%, Italian: 1.00%, Polish: 0.90%, Esperanto: 0.80%, Czech: 0.70%, Latvian: 0.70%, Somali: 0.50%, Tagalog: 0.50%, Shona: 0.30%, Vietnamese: 0.30%, Croatian: 0.20%
Erroneously classified as Unknown: 14.60%, German: 10.00%, Afrikaans: 9.50%, Danish: 3.80%, French: 3.80%, Estonian: 3.60%, English: 3.40%, Bokmal: 3.30%, Spanish: 2.90%, Finnish: 2.40%, Nynorsk: 2.40%, Swedish: 2.00%, Indonesian: 1.40%, Romanian: 1.40%, Hungarian: 1.30%, Portuguese: 1.30%, Slovene: 1.20%, Lithuanian: 1.10%, Turkish: 1.10%, Zulu: 1.10%, Italian: 1.00%, Polish: 0.90%, Esperanto: 0.80%, Latvian: 0.80%, Czech: 0.70%, Somali: 0.50%, Tagalog: 0.50%, Shona: 0.30%, Vietnamese: 0.30%, Croatian: 0.20%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 35.70%
Erroneously classified as German: 13.00%, Afrikaans: 12.90%, Unknown: 7.00%, Danish: 3.90%, Bokmal: 3.50%, French: 3.40%, English: 3.10%, Spanish: 2.20%, Nynorsk: 2.10%, Swedish: 2.10%, Estonian: 1.60%, Romanian: 1.40%, Finnish: 1.30%, Italian: 0.90%, Indonesian: 0.80%, Portuguese: 0.80%, Turkish: 0.70%, Somali: 0.60%, Czech: 0.30%, Esperanto: 0.30%, Hungarian: 0.30%, Latvian: 0.30%, Polish: 0.30%, Tagalog: 0.30%, Croatian: 0.20%, Lithuanian: 0.20%, Shona: 0.20%, Slovene: 0.20%, Vietnamese: 0.20%, Zulu: 0.20%
Erroneously classified as Afrikaans: 12.90%, German: 12.90%, Unknown: 7.00%, Danish: 4.00%, Bokmal: 3.50%, French: 3.40%, English: 3.10%, Spanish: 2.20%, Nynorsk: 2.10%, Swedish: 2.10%, Estonian: 1.60%, Romanian: 1.40%, Finnish: 1.30%, Italian: 0.90%, Indonesian: 0.80%, Portuguese: 0.80%, Turkish: 0.70%, Somali: 0.60%, Czech: 0.30%, Esperanto: 0.30%, Hungarian: 0.30%, Latvian: 0.30%, Polish: 0.30%, Tagalog: 0.30%, Croatian: 0.20%, Lithuanian: 0.20%, Shona: 0.20%, Slovene: 0.20%, Vietnamese: 0.20%, Zulu: 0.20%

>> Detection of 1000 sentences (average length: 107 chars)
Accuracy: 82.50%
Expand Down
Loading