You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
May sound silly, but would it be possible to create a method that would allow retrieving sentences from the tokenizer without whitespace between punctuation marks (e.g. untokenized)? E.g. maybe providing a tuple that would hold two versions of a sentence, both the tokenized, as well as the original?
It is practical to keep the untokenized sentence in some scenarios (e.g. showing them to end users), and reconstructing it by script would be rather hacky and imprecise I guess.
The text was updated successfully, but these errors were encountered:
Not at bad idea at all, the information is availabe inside ucto after all (and propagated to the FoliA output), so we could do something similar for the python binding.
May sound silly, but would it be possible to create a method that would allow retrieving sentences from the tokenizer without whitespace between punctuation marks (e.g. untokenized)? E.g. maybe providing a tuple that would hold two versions of a sentence, both the tokenized, as well as the original?
It is practical to keep the untokenized sentence in some scenarios (e.g. showing them to end users), and reconstructing it by script would be rather hacky and imprecise I guess.
The text was updated successfully, but these errors were encountered: