Skip to content

Commit

Permalink
Update Danish stemmer for alphanumeric change
Browse files Browse the repository at this point in the history
  • Loading branch information
ojwb committed Nov 14, 2018
1 parent cc24ba5 commit 306f4e0
Show file tree
Hide file tree
Showing 3 changed files with 86 additions and 95 deletions.
5 changes: 4 additions & 1 deletion algorithms/danish/stemmer.tt
Expand Up @@ -216,7 +216,10 @@ The following letters are vowels:
</DL>

<p>
A consonant is defined as a non-vowel.
A consonant is defined as a character from ASCII a-z which isn't a vowel
(originally this was "A consonant is defined as a non-vowel" but since
2018-11-15 we've changed this definition to avoid the stemmer from altering
alphanumeric codes which end with a repeated digit).
</p>

<p>
Expand Down
6 changes: 4 additions & 2 deletions code/danish.sbl
Expand Up @@ -12,7 +12,7 @@ strings ( ch )

integers ( p1 x )

groupings ( v s_ending )
groupings ( c v s_ending )

stringescapes {}

Expand All @@ -22,6 +22,8 @@ stringdef ae '{U+00E6}'
stringdef ao '{U+00E5}'
stringdef o/ '{U+00F8}'

define c 'bcdfghjklmnpqrstvwxz'

define v 'aeiouy{ae}{ao}{o/}'

define s_ending 'abcdfghjklmnoprtvyz{ao}'
Expand Down Expand Up @@ -73,7 +75,7 @@ backwardmode (
)
)
define undouble as (
setlimit tomark p1 for ([non-v] ->ch)
setlimit tomark p1 for ([c] ->ch)
ch
delete
)
Expand Down

0 comments on commit 306f4e0

Please sign in to comment.