Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Con silver #1348

Merged
merged 7 commits into from
Feb 24, 2024
Merged

Con silver #1348

merged 7 commits into from
Feb 24, 2024

Conversation

AngledLuffa
Copy link
Collaborator

Add some mechanisms for building and manipulating a silver dataset for the constituency parser. Filtering the trees by number of matching parsers seems to help make a better silver dataset, whereas filtering by variance does not. Will continue experimenting

… new silver dataset producing script which uses two ensembles at once to directly find the matching trees along with counting the number of parsers which agree on the best tree

Add a script which extracts the trees we want of a certain match level

Sample command line for the wiki tokenization script
… is to make it so that we can skip 10 models at a time and use a series of these to measure the variance in a silver dataset's scores
(this will be sent back via the proto in the next version of CoreNLP after 4.5.6)
…es over a sequence of trained models. Works either by taking the least or the most variance. Hopefully this script will work to filter a large silver dataset to a more manageable silver dataset
@AngledLuffa AngledLuffa merged commit fe45f11 into dev Feb 24, 2024
1 check passed
@AngledLuffa AngledLuffa deleted the con_silver branch February 24, 2024 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant