Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces after #1322

Merged
merged 9 commits into from
Dec 16, 2023
Merged

Spaces after #1322

merged 9 commits into from
Dec 16, 2023

Conversation

AngledLuffa
Copy link
Collaborator

Add SpacesAfter and SpacesBefore when processing a document with the Pipeline.

#1315

@AngledLuffa AngledLuffa force-pushed the spaces_after branch 6 times, most recently from 272d3b4 to b818e9d Compare December 15, 2023 18:04
… when tokenizing

Somewhere else there is MWT=Yes| added with the bar, so we get rid of blank entries after splitting
Somewhere else there is MWT=Yes| added with the bar, so we get rid of blank entries after splitting
…Word objects have all been made. Will reduce repeat annotations & make there be one canonical source of where the whitespace markers are
…tting us extract the SpacesBefore and SpacesAfter as appropriate for each of the documents in the bulk tokenization
… the Token, not just stuck on the MISC field
@AngledLuffa AngledLuffa merged commit 2120a87 into dev Dec 16, 2023
1 check passed
@AngledLuffa AngledLuffa deleted the spaces_after branch December 16, 2023 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant