Implosion/to_s problem with Enclitics #68

n8 opened this Issue Jan 13, 2014 · 2 comments

2 participants

n8 commented Jan 13, 2014
    text = "It's about time."
    text = sentence(text).apply(:tokenize, :parse)
    puts text.to_s

Results in:

It 's about time.

Should that to_s without the extra space between It and `s?


it's is a contraction - for tokenisation contractions are often considered two words (because they are really) - this is the case in Stanford Core -

One option, as suggested in the above link, would be to handle imploding enclitics in the implode method - in treat this would be in module Treat::Entities::Entity::Stringable


so - looks like the issue is with the current implode method on string able - although it attempts to handle enclitics then from what i can see in the current implementation then 'value' would already be blank, so calling strip! would make no difference - when the imploded parts are merged the space is still there (as it is outside the scope of the strip!)

here's a fixed version - modified the recursive call to pass the value string and operations are all performed on the string instead of multiple copies - but a disclaimer is that i only started looking at treat about 3 hours ago!


for the same code, this now gives:

It's about time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment