-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a Holdout Set #38
Comments
I would like to work on this @HarshCasper |
@aryanVijaywargia Assigned |
Holdout set will be from the same distribution, hence the performance will be the same as validation. This is the main problem of machine learning models that they do not perform well on out of the distribution data. For the demo at the client end, we can take few samples (ie: 10) from the main directory and then split the data into train, Val/test set (so that model doesn't get trained on demo data). @aryanVijaywargia @HarshCasper |
I guess we can create a seperate issue for that @macabdul9 |
@macabdul9 open a new issue for this. You'll be assigned to work on it. |
I have a query. I have written a python script that generates 100 samples at random from each class and moves the images to the holdout_dataset directory. So should my pr contain both the holdout_dataset directory (containing the images) as well as the code or only the code will suffice? @BALaka-18 |
@aryanVijaywargia both. The sample you created can be used for initial testing, or as an example when we document our model. |
Thanks for clarifying @BALaka-18 |
I think mentors can not contribute |
@macabdul9 I'm sorry I forgot. Open an issue then, participants will be assigned. |
Is this issue open? |
Type
Feature
Description
While training, validating and optimizing our model we could over time start to overfit the validation data without realizing it. This means that the model will perform well during training but it will perform poorly on unseen data.
Create a holdout set containing 200 images. We will keep this holdout set aside and only use it at the end to check how the final model performs on unseen data. Keep the code in a
Scripts/
directory for future use.Tools
Have you read the Contributing Guidelines on Pull Requests?
Yes
The text was updated successfully, but these errors were encountered: