Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for better results #7

Closed
karasjoh000 opened this issue Mar 20, 2019 · 2 comments
Closed

Suggestions for better results #7

karasjoh000 opened this issue Mar 20, 2019 · 2 comments
Labels

Comments

@karasjoh000
Copy link

Hi I am training regression svm using this library, and I get horrible results. I verified with libsvm and it provides the same results. What is a good approach to svm regression?
Here is what I should get:
Screen Shot 2019-03-20 at 1.17.59 AM.pdf
Here is the predicted:
Screen Shot 2019-03-20 at 1.18.35 AM.pdf
I tried all possible kernels but I cannot get it to work.
Screen Shot 2019-03-20 at 1.18.56 AM.pdf
Screen Shot 2019-03-20 at 1.21.41 AM.pdf

I have 219k data points with 100 dimensions.

@karasjoh000
Copy link
Author

It is weird with the fact that if I remove features, it becomes more accurate. The more features I add, it becomes less accurate. This is not an issue with the library just asking for some advice.

@karasjoh000 karasjoh000 changed the title Very bad results Suggestions for better results Mar 20, 2019
@ralfbiedert
Copy link
Owner

ralfbiedert commented Mar 20, 2019

Without knowing details, here is what I would do, in this specific order:

  • Foremost, make sure your data is "clean", or at least representative. If you train with lots of error on the features, or with conflicting labels, you are unlikely to get good results. How you do that is very task specific.

After your data is "clean":

  1. Start with and focus on the RBF kernel (usually gives best results at expense of speed)
  2. Make sure the data is normalized (sounds like you did)
  3. To speed up grid search, sample a small subset of your data (e.g., on 1% of your data)
  4. Perform a grid search, see FAQ (basically, running libSVM's grid.py tool).
  5. Apply parameters you found in grid search to training

Doing grid search if you haven't done so can MASSIVELY improve your results. To use it with SVRs, here is the command you'd run, where -s 4 means using the nu-SVR (if that's what you want), and -t 2 using the RBF kernel.

python3 PATH_TO_LIBSVM/tools/grid.py -s 4 -t 2 ./problem.in

In the end that will output something like:

[local] 13 -3 0.014441 (best c=0.03125, g=3.0517578125e-05, rate=6.0388)

Then you know you should use c=0.03125 and g=3.0517578125e-05 for training. If grid.py is taking too long, reduce the sample size (e.g., to 0.1% of your data). If it's relatively fast, increase the sample size instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants