Accelerate learning #52

YuK1Game · 2019-12-02T06:56:49Z

Hi.

My learning is slow.
How to speed up?

Use GPU
Multithread

thanks

andrewdalpino · 2019-12-02T08:12:16Z

Hi @YuK1Game thanks for the question

As of 0.0.17-beta we do not support CPU or GPU multithreading

However, this is a feature that we are currently working on

What learner are you using?

How many samples do you have?

How many features do you have?

Have you seen the section of the FAQ entitled Training is slower than usual?

YuK1Game · 2019-12-03T01:13:04Z

using...

        $estimator = new PersistentModel(
            new GradientBoost(new RegressionTree(4), 0.1),
            new Filesystem($this->modelFilepath, true)
        );

and 100,000 records.
and 11 columns.

Unlimited memory settings

ini_set('memory_limit', -1);

but, using 140MB

andrewdalpino · 2019-12-03T01:35:40Z

Ok everything seems pretty reasonable so far ...

How long is training taking?

What are you comparing the training time to? XGBoost? ScikitLearn?

The implementation of Gradient Boost is similar to the ScikitLearn one with a few of exceptions ...

Rubix ML GBM supports both categorical and continuous data, sklearn does not
Sklearn does the gradient computation and gradient descent step over multiple threads using NumPY under the hood, Rubix ML is single-threaded for the time being
Sklearn offers GBMs with either a regular decision tree or a light (histogram splitting) decision tree that is quicker, the Rubix ML decision tree already implements an optimization that is somewhere in between both of these - instead of using histograms, it uses the percentile method

YuK1Game · 2019-12-03T06:02:46Z

Start

[2019-11-29 11:44:52] netkeiba.INFO: Training base learner

and now

[2019-12-03 05:54:20] netkeiba.INFO: Epoch 328 score=0.39623370712645 loss=13.036411570958

Cannot compare due to first attempt

thanks

andrewdalpino · 2019-12-03T06:42:10Z

Are your features categorical or continuous or a mix of both?

Where are you extracting the data from?

How long does the learner take between epochs?

What version of PHP are you using?

andrewdalpino · 2019-12-07T01:41:42Z

Hey @YuK1Game let me know if you can answer those questions above

I'm thinking there may be an issue with how the data is being imported (perhaps as categorical features instead of continuous) ... if that is the case then each Regression Tree will have to search a much larger space to find the best split (could also help to explain the low R Squared score)

or potentially an issue with garbage collection

Any additional context will help me diagnose the issue

Thanks

YuK1Game · 2019-12-09T01:30:52Z

Hi.

Sample row

array(11) {
    [0]=>    
    string(6) "中山"
    [1]=>
    string(3) "晴"
    [2]=>
    string(3) "重"
    [3]=>
    int(1200)
    [4]=>
    string(15) "サンクララ"
    [5]=>
    int(2)
    [6]=>
    int(3)
    [7]=>
    int(428)
    [8]=>
    int(54)
    [9]=>
    int(9)
    [10]=>
    int(510)
  }

and, want to learn record count 120,000

Are your features categorical or continuous or a mix of both?

The label is entering a number (score).

Where are you extracting the data from?

data from Database(MySQL)
Extract immediately.

How long does the learner take between epochs?

epoch is early but slow.

first,

[2019-12-09 01:23:18] test.INFO: Learner init booster=RegressionTree rate=0.1 ratio=0.5 estimators=1000 min_change=0.0001 window=10 hold_out=0.1 metric=RSquared base=DummyRegressor
[2019-12-09 01:23:18] netkeiba.INFO: Training base learner
[2019-12-09 01:23:18] netkeiba.INFO: Epoch 1 score=-0.3767914110961 loss=1824212.958192
[2019-12-09 01:23:18] netkeiba.INFO: Epoch 2 score=-0.35629949818136 loss=1669563.4770574
[2019-12-09 01:23:18] netkeiba.INFO: Epoch 3 score=-0.2977777404172 loss=1381185.1642557
[2019-12-09 01:23:18] netkeiba.INFO: Epoch 4 score=-0.24026440379874 loss=1145989.5650973
[2019-12-09 01:23:18] netkeiba.INFO: Epoch 5 score=-0.21870186348087 loss=1145573.0300535
[2019-12-09 01:23:18] netkeiba.INFO: Epoch 6 score=-0.18248081881129 loss=1142564.511372

and now (8000 records progress)

[2019-12-09 01:28:07] netkeiba.INFO: Epoch 21 score=0.010869674438744 loss=217648.58429299
[2019-12-09 01:28:09] netkeiba.INFO: Epoch 22 score=0.010815738916415 loss=216145.40970688
[2019-12-09 01:28:10] netkeiba.INFO: Epoch 23 score=0.038142362305688 loss=212999.1373364
[2019-12-09 01:28:12] netkeiba.INFO: Epoch 24 score=0.037980473891026 loss=209966.29716697
[2019-12-09 01:28:14] netkeiba.INFO: Epoch 25 score=0.037948646127267 loss=206445.56595115
[2019-12-09 01:28:15] netkeiba.INFO: Epoch 26 score=0.038896182805228 loss=200410.77220361
[2019-12-09 01:28:17] netkeiba.INFO: Epoch 27 score=0.039153822056527 loss=200318.92461847

What version of PHP are you using?

$ php -v
PHP 7.2.14 (cli) (built: Jan  9 2019 22:23:26) ( ZTS MSVC15 (Visual C++ 2017) x64 )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.2.0, Copyright (c) 1998-2018 Zend Technologies

Listen if you still need information

thanks

andrewdalpino · 2019-12-10T22:19:03Z

Thanks for the info @YuK1Game

It looks like you are getting sub-second performance per epoch

I see that duration starts to rise to about 1 - 2 seconds per epoch as training progresses ... it's hard to say if that is an indicator of an issue because of the way that Regression Trees work under the hood. I see you have both categorical and continuous features in your dataset. Searching for the best split of a Regression Tree is handled differently for either categorical or continuous feature columns - and one can be much faster than the other. For example, if your categorical feature columns have 10 possible choices, then the tree only needs to search a space of 10 discrete values. However, if it is a continuous column, then a set of k percentiles (linear operation in the number of samples at that node split in expectation) along with as many as 200 comparisons will need to be computed. The disparity shown in the excerpt of your training log could be explained by this. However, I would need to see the full training log in order to be certain.

To clarify, this is with an 8,000 sample dataset? If so, performance seems to be good

What is the duration between epochs using the full dataset (100,000 samples)?

Is the learner able to converge to a good solution with a small dataset? (say, greater than a 0.7 R Squared score)

Also, it's probably will be best for you to post the whole training log - more information is always better than less when it comes to debugging issues with many factors such as performance

Thanks

andrewdalpino · 2020-03-08T10:44:22Z

Hi @YuK1Game

The CART implementation has been optimized in the latest commit 89f6991

We're seeing up to an order of magnitude speed improvement with Gradient Boost as a result. It is particularly better with large datasets. Give the latest dev-master a try or you can wait until the next release.

YuK1Game added the question Further information is requested label Dec 2, 2019

andrewdalpino added the outdated Contains information that is no longer relevant or accurate label Mar 22, 2020

andrewdalpino closed this as completed Mar 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accelerate learning #52

Accelerate learning #52

YuK1Game commented Dec 2, 2019

andrewdalpino commented Dec 2, 2019

YuK1Game commented Dec 3, 2019

andrewdalpino commented Dec 3, 2019 •

edited

YuK1Game commented Dec 3, 2019

andrewdalpino commented Dec 3, 2019 •

edited

andrewdalpino commented Dec 7, 2019 •

edited

YuK1Game commented Dec 9, 2019

andrewdalpino commented Dec 10, 2019 •

edited

andrewdalpino commented Mar 8, 2020

Accelerate learning #52

Accelerate learning #52

Comments

YuK1Game commented Dec 2, 2019

andrewdalpino commented Dec 2, 2019

YuK1Game commented Dec 3, 2019

andrewdalpino commented Dec 3, 2019 • edited

YuK1Game commented Dec 3, 2019

andrewdalpino commented Dec 3, 2019 • edited

andrewdalpino commented Dec 7, 2019 • edited

YuK1Game commented Dec 9, 2019

andrewdalpino commented Dec 10, 2019 • edited

andrewdalpino commented Mar 8, 2020

andrewdalpino commented Dec 3, 2019 •

edited

andrewdalpino commented Dec 3, 2019 •

edited

andrewdalpino commented Dec 7, 2019 •

edited

andrewdalpino commented Dec 10, 2019 •

edited