fix not-contraction offsets + add test (resolves #15) #17

KDercksen · 2021-12-17T09:09:53Z

This PR implements a fix for the issue noted in #15. Specifically, in the case of "don't" you now get do @ 0, n't @ 2 when not replacing not-contractions; if you set replace_not_contraction=True, you will still get do @ 0, not @ 3 because of the implicit whitespace introduction. I assumed you would still want to introduce a space in that case, because keeping the text pristine doesn't necessarily matter there.

Let me know if you disagree or would like to see something changed!

fnl · 2021-12-17T09:17:28Z

I think offsets should all ways be truthful to the input text, as that is where there value lies: tracking the provenance of a token.

So even can’t would become “ca”:0, “not”:2 in that scenario.

Does that seem right to you, too?

KDercksen · 2021-12-17T09:27:37Z

Yes, I agree actually! I was doubting myself because of the comment written here though.

I do think it would be best, even if the token text changes ("n't" -> "not"), that the offsets should be truthful to the original text. I can change the PR to reflect this!

…rue or False

fnl · 2021-12-17T12:14:07Z

Thank you very much for your contribution, Koen, much appreciated!

fix not-contraction offsets + add test

3e8472f

do not differ in offset calculation when using replace_not_contract=T…

30dde80

…rue or False

fnl merged commit 9d217f6 into fnl:master Dec 17, 2021

fnl mentioned this pull request Dec 17, 2021

Bug in not-contraction handling code #15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix not-contraction offsets + add test (resolves #15) #17

fix not-contraction offsets + add test (resolves #15) #17

KDercksen commented Dec 17, 2021

fnl commented Dec 17, 2021

KDercksen commented Dec 17, 2021

fnl commented Dec 17, 2021

fix not-contraction offsets + add test (resolves #15) #17

fix not-contraction offsets + add test (resolves #15) #17

Conversation

KDercksen commented Dec 17, 2021

fnl commented Dec 17, 2021

KDercksen commented Dec 17, 2021

fnl commented Dec 17, 2021