New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accelerate learning #52
Comments
Hi @YuK1Game thanks for the question As of 0.0.17-beta we do not support CPU or GPU multithreading However, this is a feature that we are currently working on What learner are you using? How many samples do you have? How many features do you have? Have you seen the section of the FAQ entitled Training is slower than usual? |
Ok everything seems pretty reasonable so far ... How long is training taking? What are you comparing the training time to? XGBoost? ScikitLearn? The implementation of Gradient Boost is similar to the ScikitLearn one with a few of exceptions ...
|
Start
and now
Cannot compare due to first attempt thanks |
Are your features categorical or continuous or a mix of both? Where are you extracting the data from? How long does the learner take between epochs? What version of PHP are you using? |
Hey @YuK1Game let me know if you can answer those questions above I'm thinking there may be an issue with how the data is being imported (perhaps as categorical features instead of continuous) ... if that is the case then each Regression Tree will have to search a much larger space to find the best split (could also help to explain the low R Squared score) or potentially an issue with garbage collection Any additional context will help me diagnose the issue Thanks |
Hi. Sample row
and, want to learn record count 120,000
The label is entering a number (score).
data from Database(MySQL)
epoch is early but slow. first,
and now (8000 records progress)
Listen if you still need information thanks |
Thanks for the info @YuK1Game It looks like you are getting sub-second performance per epoch I see that duration starts to rise to about 1 - 2 seconds per epoch as training progresses ... it's hard to say if that is an indicator of an issue because of the way that Regression Trees work under the hood. I see you have both categorical and continuous features in your dataset. Searching for the best split of a Regression Tree is handled differently for either categorical or continuous feature columns - and one can be much faster than the other. For example, if your categorical feature columns have 10 possible choices, then the tree only needs to search a space of 10 discrete values. However, if it is a continuous column, then a set of k percentiles (linear operation in the number of samples at that node split in expectation) along with as many as 200 comparisons will need to be computed. The disparity shown in the excerpt of your training log could be explained by this. However, I would need to see the full training log in order to be certain. To clarify, this is with an 8,000 sample dataset? If so, performance seems to be good What is the duration between epochs using the full dataset (100,000 samples)? Is the learner able to converge to a good solution with a small dataset? (say, greater than a 0.7 R Squared score) Also, it's probably will be best for you to post the whole training log - more information is always better than less when it comes to debugging issues with many factors such as performance Thanks |
Hi.
My learning is slow.
How to speed up?
thanks
The text was updated successfully, but these errors were encountered: