Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 89 additions & 55 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Spellchecker

[![Go Reference](https://pkg.go.dev/badge/github.com/f1monkey/spellchecker.svg)](https://pkg.go.dev/github.com/f1monkey/spellchecker)
[![CI](https://github.com/f1monkey/spellchecker/actions/workflows/test.yml/badge.svg)](https://github.com/f1monkey/spellchecker/actions/workflows/test.yml)
[![Go Reference](https://pkg.go.dev/badge/github.com/f1monkey/spellchecker.svg)](https://pkg.go.dev/github.com/f1monkey/spellchecker/v3)
[![CI](https://github.com/f1monkey/spellchecker/actions/workflows/test.yaml/badge.svg)](https://github.com/f1monkey/spellchecker/actions/workflows/test.yaml)

Yet another spellchecker written in go.

Expand All @@ -22,62 +22,116 @@ Yet another spellchecker written in go.
## Installation

```
go get github.com/f1monkey/spellchecker/v2@latest
go get -v github.com/f1monkey/spellchecker/v3
```

## Usage


### Quick start

```go
1. Initialize the spellchecker. You need to pass an alphabet: a set of allowed characters that will be used for indexing and primary word checks. (All other characters will be ignored for these operations.)

func main() {
```go
// Create a new instance
sc, err := spellchecker.New(
"abcdefghijklmnopqrstuvwxyz1234567890", // allowed symbols, other symbols will be ignored
)
if err != nil {
panic(err)
}
```

// The weight increases the likelihood that the word will be chosen as a correction.
weight := uint(1)
2. Add some words to the dictionary:
1. from any `io.Reader`:
```go
in, _ := os.Open("data/sample.txt")
sc.AddFrom(in)
```
2. Or add words manually:
```go
sc.AddMany([]string{"lock", "stock", "and", "two", "smoking"})
sc.Add("barrels")
```

3. Use the spellchecker:
1. Check if a word is correct:
```go
result := sc.IsCorrect("stock")
fmt.Println(result) // true
```
2. Suggest corrections:
```go
// Find up to 10 suggestions for a word
matches := sc.Suggest(nil, "rang", 10)
fmt.Println(matches) // [range, orange]
```
### Options

// Load data from any io.Reader
in, err := os.Open("data/sample.txt")
if err != nil {
panic(err)
}
### Options

sc.AddFrom(&spellchecker.AddOptions{Weight: weight}, in)
// OR
sc.AddFrom(nil, in)
The spellchecker supports customizable options for both searching/suggesting corrections and adding words to the dictionary.

// Add words manually
sc.Add(nil, "lock", "stock", "and", "two", "smoking", "barrels")
#### Search/Suggestion Options

// Check if a word is valid
result := sc.IsCorrect("coffee")
fmt.Println(result) // true
These options are passed to the `Suggest` method (or to `SuggestWith...` helpers).

// Correct a single word
fixed, isCorrect := sc.Fix(nil, "awepon")
fmt.Println(isCorrect) // false
fmt.Println(fixed) // weapon
- **`SuggestWithMaxErrors(maxErrors int)`**
Sets the maximum allowed edit distance (in "bits") between the input word and dictionary candidates.
- Deletion: 1 bit (e.g., "proble" → "problem")
- Insertion: 1 bit (e.g., "problemm" → "problem")
- Substitution: 2 bits (e.g., "problam" → "problem")
- Transposition: 0 bits (e.g., "problme" → "problem")

// Find up to 10 suggestions for a word
matches := sc.Suggest(nil, "rang", 10)
fmt.Println(matches) // [range, orange]
Default: `2`.
Increasing this value beyond 2 is not recommended as it can significantly degrade performance.

if len(os.Args) < 2 {
log.Fatal("dict path must be provided")
}
- **`SuggestWithFilterFunc(f FilterFunc)`**
Replaces the default scoring/filtering function with a custom one.
The function receives:
- `src`: runes of the input word
- `candidate`: runes of the dictionary word
- `count`: frequency count of the candidate in the dictionary

It must return:
- a `float64` score (higher = better suggestion)
- a `bool` indicating whether the candidate should be kept

The default filter uses Levenshtein distance (with costs: insert/delete=1, substitute=1, transpose=1), filters out candidates exceeding `maxErrors`, and boosts score based on word frequency and shared prefix/suffix length.

Example usage:
```go
matches := sc.Suggest(
"rang",
10,
spellchecker.SuggestWithMaxErrors(1),
spellchecker.SuggestWithFilterFunc(myCustomFilter),
)
```

### Options
#### Add Options
These options are passed to `Add`, `AddMany`, or `AddFrom`.

See [options.go](./options.go) for the list of available options.
- **`AddWithWeight(weight uint)`**
Sets the frequency weight for added word(s). Higher weight increases the chance that the word will appear higher in suggestion results.
Default: 1.
- **`AddWithSplitter(splitter bufio.SplitFunc)`**
Customizes how AddFrom(reader) splits the input stream into words.

The default splitter:
- Uses bufio.ScanWords as base
- Converts to lowercase
- Keeps only sequences matching [-\pL]+ (letters and hyphens)

Example:
```go
sc.AddFrom(
file,
spellchecker.AddWithWeight(10), // these words are very common
spellchecker.AddWithSplitter(customSplitter),
)

sc.AddMany([]string{"hello", "world"},
spellchecker.AddWithWeight(5),
)
```

### Save/load

Expand All @@ -102,26 +156,6 @@ See [options.go](./options.go) for the list of available options.
}
```

### Custom score function

You can provide a custom scoring function if needed:

```go
var fn spellchecker.FilterFunc = func(src, candidate []rune, cnt int) (float64, bool) {
// you can calculate Levenshtein distance here (see defaultFilterFunc in options.go for example)

return 1.0, true // constant score
}

sc, err := spellchecker.New("abc", spellchecker.WithFilterFunc(fn))
if err != nil {
// handle err
}

sc.Fix(fn, "word")
```


## Benchmarks

Tests are based on data from [Peter Norvig's article about spelling correction](http://norvig.com/spell-correct.html)
Expand Down
4 changes: 2 additions & 2 deletions dictionary.go
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ func (d *dictionary) has(word string) bool {
}

// add puts the word to the dictionary
func (d *dictionary) add(word string, n uint) (uint32, error) {
func (d *dictionary) add(word string, n uint) uint32 {
id := d.nextID()
d.ids[word] = id

Expand All @@ -59,7 +59,7 @@ func (d *dictionary) add(word string, n uint) (uint32, error) {
key := sum(d.alphabet.encode(wordRunes))
d.index[key] = append(d.index[key], id)

return id, nil
return id
}

// inc increase word occurence counter
Expand Down
6 changes: 2 additions & 4 deletions dictionary_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,16 +27,14 @@ func Test_dictionary_add(t *testing.T) {
dict, err := newDictionary(DefaultAlphabet)
require.NoError(t, err)

id, err := dict.add("qwe", 1)
require.NoError(t, err)
id := dict.add("qwe", 1)
require.Equal(t, uint32(1), id)
require.Equal(t, uint(1), dict.counts[id])
require.Equal(t, []rune("qwe"), dict.words[id])
require.Equal(t, 1, len(dict.ids))
require.Len(t, dict.index, 1)

id, err = dict.add("asd", 2)
require.NoError(t, err)
id = dict.add("asd", 2)
require.Equal(t, uint32(2), id)
require.Equal(t, uint(2), dict.counts[id])
require.Equal(t, []rune("asd"), dict.words[id])
Expand Down
4 changes: 2 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module github.com/f1monkey/spellchecker/v2
module github.com/f1monkey/spellchecker/v3

go 1.24
go 1.25

require (
github.com/agext/levenshtein v1.2.3
Expand Down
88 changes: 0 additions & 88 deletions options.go

This file was deleted.

Loading