Skip to content

Commit

Permalink
Refactor ENCLITICS_MAP
Browse files Browse the repository at this point in the history
  • Loading branch information
LFDM committed Aug 25, 2014
1 parent c9b60a3 commit 6f406b5
Showing 1 changed file with 9 additions and 8 deletions.
17 changes: 9 additions & 8 deletions lib/llt/tokenizer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -202,18 +202,19 @@ def enclitic(val)
"#{@enclitics_marker}#{val}"
end

ENCLITICS_MAP = {
/^(nec)$/i => 'c',
/^(ne|se)u$/i => 'u',
/^(nisi)$/i => 'si',
/^(οὐ|μή|εἰ)τε$/i => 'τε',
/^(οὐ|μή)δε$/i => 'δε',
}
ENCLITICS_MAP = [
/^(ne)(c)$/i,
/^(ne|se)(u)$/i,
/^(ni)(si)$/i,
/^(οὐ|μή|εἰ)(τε)$/i,
/^(οὐ|μή)(δε)$/i
]
def split_frequent_enclitics
container = []
@worker.each_with_index do |token, i|
ENCLITICS_MAP.each do |regex, encl|
if token.match(regex)
if m = token.match(regex)
encl = m[2]
token.slice!(-encl.length, encl.length)
container << [encl, (i + container.size + @shift_range)]
end
Expand Down

0 comments on commit 6f406b5

Please sign in to comment.