Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create holdout set #145

Merged
merged 17 commits into from
Nov 9, 2022
Merged

Create holdout set #145

merged 17 commits into from
Nov 9, 2022

Conversation

lfoppiano
Copy link
Owner

@lfoppiano lfoppiano commented Oct 24, 2022

This PR will select some paper to have an holdout set.
At the moment, as the data set is small, we will use all the documents for create the final models, however we will keep a fixed holdout set to have a more strict and precise evaluation. Except for Units where the evaluation set was borrowed by a different source.

The holdout set was created using an automatic script and re-balanced based on the distribution of entities between training and holdout set.

The python script to reproduce the holdout dataset are contained under scripts.

The statistics about the training/holdout set can be found in:

@lfoppiano lfoppiano linked an issue Oct 24, 2022 that may be closed by this pull request
@coveralls
Copy link

coveralls commented Oct 24, 2022

Coverage Status

Coverage remained the same at 27.67% when pulling 06c7e11 on feature/holdout-set into 0957bc6 on master.

@lfoppiano
Copy link
Owner Author

I think this is ready to merge

@lfoppiano lfoppiano merged commit 8da45fe into master Nov 9, 2022
@lfoppiano lfoppiano deleted the feature/holdout-set branch November 9, 2022 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create a holdout dataset
2 participants