Skip to content

Commit

Permalink
Update documentation and release notes
Browse files Browse the repository at this point in the history
  • Loading branch information
pemistahl committed Dec 12, 2022
1 parent 1b6b2e8 commit 2bd12b3
Show file tree
Hide file tree
Showing 13 changed files with 90 additions and 9 deletions.
8 changes: 4 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,12 @@ this library's fields of application.
[isocode639_1 url]: https://github.com/pemistahl/lingua-go/blob/main/isocode.go#L31
[isocode639_3 url]: https://github.com/pemistahl/lingua-go/blob/main/isocode.go#L261
[wikipedia isocodes list]: https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
[language url]: https://github.com/pemistahl/lingua-go/blob/main/language.go#L31
[language url]: https://github.com/pemistahl/lingua-go/blob/main/language.go#L25
[alphabet url]: https://github.com/pemistahl/lingua-go/blob/main/alphabet.go#L26
[language method url]: https://github.com/pemistahl/lingua-go/blob/main/language.go#L607
[language method url]: https://github.com/pemistahl/lingua-go/blob/main/language.go#L601
[chars to languages mapping url]: https://github.com/pemistahl/lingua-go/blob/main/constant.go#L31
[language model files writer url]: https://github.com/pemistahl/lingua-go/blob/main/writer.go#L55
[test data files writer url]: https://github.com/pemistahl/lingua-go/blob/main/writer.go#L201
[language model files writer url]: https://github.com/pemistahl/lingua-go/blob/main/writer.go#L56
[test data files writer url]: https://github.com/pemistahl/lingua-go/blob/main/writer.go#L202
[language models directory url]: https://github.com/pemistahl/lingua-go/tree/main/language-models
[testdata directory url]: https://github.com/pemistahl/lingua-go/tree/main/cmd/language-testdata
[accuracy reporter url]: https://github.com/pemistahl/lingua-go/blob/main/cmd/accuracy_reporter.go
57 changes: 53 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1816,7 +1816,7 @@ Erroneously classified as Dutch: 0.20%, Latin: 0.10%

## 7. How to add it to your project?

go get github.com/pemistahl/lingua-go@v1.1.1
go get github.com/pemistahl/lingua-go

## 8. How to build?

Expand Down Expand Up @@ -2032,7 +2032,56 @@ build the detector from all supported languages. When you have knowledge about
the texts you want to classify you can almost always rule out certain languages as impossible
or unlikely to occur.

### 9.6 Methods to build the LanguageDetector
### 9.6 Detection of multiple languages in mixed-language texts

In contrast to most other language detectors, *Lingua* is able to detect multiple languages
in mixed-language texts. This feature can yield quite reasonable results but it is still
in an experimental state and therefore the detection result is highly dependent on the input
text. It works best in high-accuracy mode with multiple long words for each language.
The shorter the phrases and their words are, the less accurate are the results. Reducing the
set of languages when building the language detector can also improve accuracy for this task
if the languages occurring in the text are equal to the languages supported by the respective
language detector instance.

```go
package main

import (
"fmt"
"github.com/pemistahl/lingua-go"
)

func main() {
languages := []lingua.Language{
lingua.English,
lingua.French,
lingua.German,
}

detector := lingua.NewLanguageDetectorBuilder().
FromLanguages(languages...).
Build()

sentence := "Parlez-vous français? " +
"Ich spreche Französisch nur ein bisschen. " +
"A little bit is better than nothing."

for _, result := range detector.DetectMultipleLanguagesOf(sentence) {
fmt.Printf("%s: '%s'\n", result.Language(), sentence[result.StartIndex():result.EndIndex()])
}

// Output:
// French: 'Parlez-vous français? '
// German: 'Ich spreche Französisch nur ein bisschen. '
// English: 'A little bit is better than nothing.'
}
```

In the example above, a slice of [`DetectionResult`](https://github.com/pemistahl/lingua-go/blob/main/result.go#L22)
is returned. Each entry in the slice describes a contiguous single-language text section,
providing start and end indices of the respective substring.

### 9.7 Methods to build the LanguageDetector

There might be classification tasks where you know beforehand that your language data is
definitely not written in Latin, for instance. The detection accuracy can become better
Expand Down Expand Up @@ -2062,9 +2111,9 @@ lingua.NewLanguageDetectorBuilder().FromIsoCodes639_1(lingua.EN, lingua.DE)
lingua.NewLanguageDetectorBuilder().FromIsoCodes639_3(lingua.ENG, lingua.DEU)
```

## 10. What's next for version 1.2.0?
## 10. What's next for version 1.3.0?

Take a look at the [planned issues](https://github.com/pemistahl/lingua-go/milestone/3).
Take a look at the [planned issues](https://github.com/pemistahl/lingua-go/milestone/4).

## 11. Contributions

Expand Down
7 changes: 7 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
## Lingua 1.2.0 (released on 12 Dec 2022)

### Features

- The new method `LanguageDetector.DetectMultipleLanguagesOf()` has been
introduced. It allows to detect multiple languages in mixed-language text. (#9)

## Lingua 1.1.1 (released on 22 Nov 2022)

### Documentation
Expand Down
Binary file modified cmd/images/plots/barplot-average.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified cmd/images/plots/barplot-sentences.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified cmd/images/plots/barplot-single-words.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified cmd/images/plots/barplot-word-pairs.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified cmd/images/plots/boxplot-average.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified cmd/images/plots/boxplot-sentences.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified cmd/images/plots/boxplot-single-words.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified cmd/images/plots/boxplot-word-pairs.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions example_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,31 @@ func Example_basic() {
// Output: English
}

func Example_multipleLanguagesDetection() {
languages := []lingua.Language{
lingua.English,
lingua.French,
lingua.German,
}

detector := lingua.NewLanguageDetectorBuilder().
FromLanguages(languages...).
Build()

sentence := "Parlez-vous français? " +
"Ich spreche Französisch nur ein bisschen. " +
"A little bit is better than nothing."

for _, result := range detector.DetectMultipleLanguagesOf(sentence) {
fmt.Printf("%s: '%s'\n", result.Language(), sentence[result.StartIndex():result.EndIndex()])
}

// Output:
// French: 'Parlez-vous français? '
// German: 'Ich spreche Französisch nur ein bisschen. '
// English: 'A little bit is better than nothing.'
}

// By default, Lingua returns the most likely language for a given input text.
// However, there are certain words that are spelled the same in more than one
// language. The word `prologue`, for instance, is both a valid English and
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ module github.com/pemistahl/lingua-go
go 1.18

require (
github.com/pemistahl/lingua-go/serialization v0.0.0-00010101000000-000000000000
github.com/pemistahl/lingua-go/serialization v1.2.0
github.com/stretchr/testify v1.8.1
golang.org/x/exp v0.0.0-20221106115401-f9659909a136
google.golang.org/protobuf v1.28.1
Expand Down

0 comments on commit 2bd12b3

Please sign in to comment.