Suggestions for better results #7

karasjoh000 · 2019-03-20T08:21:55Z

Hi I am training regression svm using this library, and I get horrible results. I verified with libsvm and it provides the same results. What is a good approach to svm regression?
Here is what I should get:
Screen Shot 2019-03-20 at 1.17.59 AM.pdf
Here is the predicted:
Screen Shot 2019-03-20 at 1.18.35 AM.pdf
I tried all possible kernels but I cannot get it to work.
Screen Shot 2019-03-20 at 1.18.56 AM.pdf
Screen Shot 2019-03-20 at 1.21.41 AM.pdf

I have 219k data points with 100 dimensions.

karasjoh000 · 2019-03-20T08:23:32Z

It is weird with the fact that if I remove features, it becomes more accurate. The more features I add, it becomes less accurate. This is not an issue with the library just asking for some advice.

ralfbiedert · 2019-03-20T11:01:07Z

Without knowing details, here is what I would do, in this specific order:

Foremost, make sure your data is "clean", or at least representative. If you train with lots of error on the features, or with conflicting labels, you are unlikely to get good results. How you do that is very task specific.

After your data is "clean":

Start with and focus on the RBF kernel (usually gives best results at expense of speed)
Make sure the data is normalized (sounds like you did)
To speed up grid search, sample a small subset of your data (e.g., on 1% of your data)
Perform a grid search, see FAQ (basically, running libSVM's grid.py tool).
Apply parameters you found in grid search to training

Doing grid search if you haven't done so can MASSIVELY improve your results. To use it with SVRs, here is the command you'd run, where -s 4 means using the nu-SVR (if that's what you want), and -t 2 using the RBF kernel.

python3 PATH_TO_LIBSVM/tools/grid.py -s 4 -t 2 ./problem.in

In the end that will output something like:

[local] 13 -3 0.014441 (best c=0.03125, g=3.0517578125e-05, rate=6.0388)

Then you know you should use c=0.03125 and g=3.0517578125e-05 for training. If grid.py is taking too long, reduce the sample size (e.g., to 0.1% of your data). If it's relatively fast, increase the sample size instead.

karasjoh000 changed the title ~~Very bad results~~ Suggestions for better results Mar 20, 2019

ralfbiedert added the question label Mar 20, 2019

ralfbiedert closed this as completed Jan 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions for better results #7

Suggestions for better results #7

karasjoh000 commented Mar 20, 2019

karasjoh000 commented Mar 20, 2019

ralfbiedert commented Mar 20, 2019 •

edited

Suggestions for better results #7

Suggestions for better results #7

Comments

karasjoh000 commented Mar 20, 2019

karasjoh000 commented Mar 20, 2019

ralfbiedert commented Mar 20, 2019 • edited

ralfbiedert commented Mar 20, 2019 •

edited