Create a Holdout Set #38

HarshCasper · 2020-08-18T03:58:46Z

Type

Feature

Description

While training, validating and optimizing our model we could over time start to overfit the validation data without realizing it. This means that the model will perform well during training but it will perform poorly on unseen data.

Create a holdout set containing 200 images. We will keep this holdout set aside and only use it at the end to check how the final model performs on unseen data. Keep the code in a Scripts/ directory for future use.

Tools

Python

Have you read the Contributing Guidelines on Pull Requests?

Yes

The text was updated successfully, but these errors were encountered:

aryanVijaywargia · 2020-08-18T07:32:26Z

I would like to work on this @HarshCasper

BALaka-18 · 2020-08-18T08:36:13Z

@aryanVijaywargia Assigned

macabdul9 · 2020-08-18T10:05:15Z

Holdout set will be from the same distribution, hence the performance will be the same as validation. This is the main problem of machine learning models that they do not perform well on out of the distribution data. For the demo at the client end, we can take few samples (ie: 10) from the main directory and then split the data into train, Val/test set (so that model doesn't get trained on demo data). @aryanVijaywargia @HarshCasper

HarshCasper · 2020-08-18T10:30:44Z

I guess we can create a seperate issue for that @macabdul9

BALaka-18 · 2020-08-18T12:37:00Z

Holdout set will be from the same distribution, hence the performance will be the same as validation. This is the main problem of machine learning models that they do not perform well on out of the distribution data. For the demo at the client end, we can take few samples (ie: 10) from the main directory and then split the data into train, Val/test set (so that model doesn't get trained on demo data). @aryanVijaywargia @HarshCasper

@macabdul9 open a new issue for this. You'll be assigned to work on it.

aryanVijaywargia · 2020-08-18T12:51:07Z

I have a query. I have written a python script that generates 100 samples at random from each class and moves the images to the holdout_dataset directory. So should my pr contain both the holdout_dataset directory (containing the images) as well as the code or only the code will suffice? @BALaka-18

BALaka-18 · 2020-08-18T13:28:41Z

I have a query. I have written a python script that generates 100 samples at random from each class and moves the images to the holdout_dataset directory. So should my pr contain both the holdout_dataset directory (containing the images) as well as the code or only the code will suffice? @BALaka-18

@aryanVijaywargia both. The sample you created can be used for initial testing, or as an example when we document our model.

aryanVijaywargia · 2020-08-18T13:43:38Z

Thanks for clarifying @BALaka-18

macabdul9 · 2020-08-18T13:48:43Z

Holdout set will be from the same distribution, hence the performance will be the same as validation. This is the main problem of machine learning models that they do not perform well on out of the distribution data. For the demo at the client end, we can take few samples (ie: 10) from the main directory and then split the data into train, Val/test set (so that model doesn't get trained on demo data). @aryanVijaywargia @HarshCasper

@macabdul9 open a new issue for this. You'll be assigned to work on it.

I think mentors can not contribute

BALaka-18 · 2020-08-18T13:58:20Z

Holdout set will be from the same distribution, hence the performance will be the same as validation. This is the main problem of machine learning models that they do not perform well on out of the distribution data. For the demo at the client end, we can take few samples (ie: 10) from the main directory and then split the data into train, Val/test set (so that model doesn't get trained on demo data). @aryanVijaywargia @HarshCasper

@macabdul9 open a new issue for this. You'll be assigned to work on it.

I think mentors can not contribute

@macabdul9 I'm sorry I forgot. Open an issue then, participants will be assigned.

rutujadhanawade · 2020-09-01T05:43:54Z

Is this issue open?

HarshCasper added CH20 enhancement New feature or request Medium Python Urgent labels Aug 18, 2020

BALaka-18 assigned aryanVijaywargia Aug 18, 2020

BALaka-18 added the Assigned label Aug 18, 2020

aryanVijaywargia mentioned this issue Aug 23, 2020

Holdout set #85

Merged

HarshCasper closed this as completed Sep 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a Holdout Set #38

Create a Holdout Set #38

HarshCasper commented Aug 18, 2020

aryanVijaywargia commented Aug 18, 2020

BALaka-18 commented Aug 18, 2020

macabdul9 commented Aug 18, 2020 •

edited

Loading

HarshCasper commented Aug 18, 2020

BALaka-18 commented Aug 18, 2020

aryanVijaywargia commented Aug 18, 2020

BALaka-18 commented Aug 18, 2020

aryanVijaywargia commented Aug 18, 2020

macabdul9 commented Aug 18, 2020

BALaka-18 commented Aug 18, 2020

rutujadhanawade commented Sep 1, 2020

Create a Holdout Set #38

Create a Holdout Set #38

Comments

HarshCasper commented Aug 18, 2020

Type

Description

Tools

Have you read the Contributing Guidelines on Pull Requests?

aryanVijaywargia commented Aug 18, 2020

BALaka-18 commented Aug 18, 2020

macabdul9 commented Aug 18, 2020 • edited Loading

HarshCasper commented Aug 18, 2020

BALaka-18 commented Aug 18, 2020

aryanVijaywargia commented Aug 18, 2020

BALaka-18 commented Aug 18, 2020

aryanVijaywargia commented Aug 18, 2020

macabdul9 commented Aug 18, 2020

BALaka-18 commented Aug 18, 2020

rutujadhanawade commented Sep 1, 2020

macabdul9 commented Aug 18, 2020 •

edited

Loading