Increase training size for feature level classifier #45

bkowshik · 2017-05-30T07:19:47Z

Ref #43

We currently use 5,269 changesets for training our feature level classifier.
From changesets reviewed on osmcha with one feature modifications, it looks like we can potentially add upto 4,000 changesets.
This increase in the samples in the training dataset should in-turn improve the model.

Next actions

Update dataset with the additional 4,000 changesets - @bkowshik

cc: @batpad @geohacker

The text was updated successfully, but these errors were encountered:

bkowshik · 2017-06-13T12:40:22Z

Curious to see the effect training size of the model has on the metrics, we have the following:

Notes / Questions

The metrics although diminishing have a significant positive slope.
If roc_auc score is 0.8 with 6,000 samples, what would it look like with 10,000 samples?
When do we know that we have enough samples?

cc: @anandthakker

bkowshik · 2017-06-13T12:45:14Z

Workflow

Set number of samples to use for the current run
Use only this subset of samples from the labelled training data
Train a model on this subset of training data
Get predictions from model for the entire validation dataset
Extract metrics on validation dataset
Increase number of samples to use for the next run and go again

bkowshik · 2017-06-21T12:01:08Z

Before we had 8,620 labelled samples out of which 6,036 was used for training and 2,584 for validation. With the backfill done, we now have 10,165 out out which we use 7115 for testing and 3050 for validation.

In total we added 1,545 new changesets to the labelled dump. 🎉

Interestingly, the nice upward graph now has become something like below. I don't understand why this is happening though.

We are 💯 to close here.

bkowshik added this to the version-0.5 milestone May 31, 2017

This was referenced Jun 13, 2017

What is the effect of number of training samples on metrics #57

Merged

Effect of attributes on the feature level classifier #59

Closed

bkowshik mentioned this issue Jun 21, 2017

Weekly update from Gabbarland #26

Open

bkowshik closed this as completed Jun 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase training size for feature level classifier #45

Increase training size for feature level classifier #45

bkowshik commented May 30, 2017

bkowshik commented Jun 13, 2017

bkowshik commented Jun 13, 2017

bkowshik commented Jun 21, 2017

Increase training size for feature level classifier #45

Increase training size for feature level classifier #45

Comments

bkowshik commented May 30, 2017

Next actions

bkowshik commented Jun 13, 2017

Notes / Questions

bkowshik commented Jun 13, 2017

Workflow

bkowshik commented Jun 21, 2017