Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collinizer #1344

Merged
merged 4 commits into from Feb 26, 2023
Merged

Collinizer #1344

merged 4 commits into from Feb 26, 2023

Conversation

AngledLuffa
Copy link
Contributor

Update the collinizer to delete words based on the gold tags, not the guess tags, in case the POS tagger disagrees with the original treebank on which words are symbols vs punct

…te the transform method so the gold & guess trees can have the same punctuation treatment
- the plan is to make the AbstractCollinizer use a difference interface
… tree and the gold tree

Not used yet, though - the gold tree is ignored for now
…ld tree can be used to determine whether or not to eliminate the words in the guess tree. This will make it so the test & gold trees are the same, hopefully eliminating most or all of the 'Unable to evaluate...' that happens after retagging trees with the POS tagger

Also do the ChineseCollinizer and the NegraPennCollinizer.
Both are tested using derivatives of the English test
(using English trees, but with the tags specific for the other treebank)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant