Lots of printlines from Math.java when running Lasso #10

Xyclade · 2015-02-19T09:43:03Z

In the Math.java on lines 4596 to 4601 there are 2 prints. When running a Lasso regression this causes a lot of output on the console, making it hard to read console output from my own code. Could these be turned off or, made optional if they are really important? 😄

haifengl · 2015-02-19T14:15:00Z

Sure, you can remove them. At least the first one. But I am little worried about this case. This function ( iterative biconjugate gradient method) usually should converge pretty fast and thus not much print out. Please check if your LASSO model works well for test data. Thanks!

Xyclade · 2015-02-19T14:16:53Z

The Lasso model is not working well for the test data, but that is the actual goal this time. I'm writing an example of how it can go wrong, thank you for the heads up though! 👍

haifengl · 2015-02-19T15:27:27Z

Interesting. It doesn't work well because LASSO is not a good fit for the problem? Or the parameter settings (e.g. regularization factor)?

Xyclade · 2015-02-19T15:39:05Z

It's because there is too few datapoints and no actual statistical relation within the data.

As John Tukey once said:

The data may not contain the answer. The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.

This is the case in my example, to make people aware that Machine Learning cannot perform miracles on every single dataset.

haifengl · 2015-02-19T15:41:20Z

Is your data size less than the dimensionality?

Xyclade · 2015-02-19T15:48:05Z

yes, its 100 datapoints (with rank 1 to 100) with 27000 + features, this can never go well :) But since it's an example for my blog rather than an actual dataset that I want to use for predictions, I still worked it out to show what goes wrong when you do these kind of things.

In the end result the trained LASSO model should predict a rank value for a new datapoint, but it predicts worse than just predicting the average (50). This makes it a perfect example for what can go wrong if you have no clue what you are doing 👍

haifengl · 2015-02-19T15:56:04Z

That is the true reason. It is known as small sample size problem. I have a paper (http://lectures.molgen.mpg.de/networkanalysis13/LDA_cancer_classif.pdf) to deal with it.

Xyclade · 2015-02-19T15:58:03Z

Cool tnx! I'll read that and see if I can incorporate it if that's ok with you?

haifengl · 2015-02-19T16:01:37Z

Try FLD in SMILE for your case. I don't remember if I implemented this algorithm there. Thanks!

haifengl closed this as completed Feb 23, 2015

haifengl added the question label Oct 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lots of printlines from Math.java when running Lasso #10

Lots of printlines from Math.java when running Lasso #10

Xyclade commented Feb 19, 2015

haifengl commented Feb 19, 2015

Xyclade commented Feb 19, 2015

haifengl commented Feb 19, 2015

Xyclade commented Feb 19, 2015

haifengl commented Feb 19, 2015

Xyclade commented Feb 19, 2015

haifengl commented Feb 19, 2015

Xyclade commented Feb 19, 2015

haifengl commented Feb 19, 2015

Lots of printlines from Math.java when running Lasso #10

Lots of printlines from Math.java when running Lasso #10

Comments

Xyclade commented Feb 19, 2015

haifengl commented Feb 19, 2015

Xyclade commented Feb 19, 2015

haifengl commented Feb 19, 2015

Xyclade commented Feb 19, 2015

haifengl commented Feb 19, 2015

Xyclade commented Feb 19, 2015

haifengl commented Feb 19, 2015

Xyclade commented Feb 19, 2015

haifengl commented Feb 19, 2015