Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Design #1

Closed
Evizero opened this issue Sep 11, 2015 · 24 comments
Closed

API Design #1

Evizero opened this issue Sep 11, 2015 · 24 comments
Milestone

Comments

@Evizero
Copy link
Owner

Evizero commented Sep 11, 2015

I could use your opinion on this package @lindahua . Basically, so far I implemented a Linear SVM based on Regression.jl. I think SVMs deserve its own package, because there are a lot of specialized algorithms, such as dual coordinate descent, (which is the one I implemented thus far).

I implemented it in a way that you can also use the standard Regression.jl solver for the LinearSVM case.

a = hcat(rand(10)+0,rand(10)+2)
b = hcat(rand(10)+2, rand(10)+0)
X = vcat(a,b)'
Y = vcat(ones(10),ones(10)*-1)
train = randbool(size(X,2))
ret = Regression.solve(hingereg(X[:,train], Y[train]; bias = 0.0),
                       reg = SqrL2Reg(1.0e-2),
                       solver = DualCoordinateDescent())

I am not yet sure how to fit the kernel SVMs into this design, but I will think about it. Probably over the prediction function

Any opinion on this?

@Evizero
Copy link
Owner Author

Evizero commented Sep 11, 2015

Note, I didn't want to touch SVM.jl, simply because I'd have to delete the whole code anyway. Beside inefficient implementations, there are a few strange things going on in that package, such as convergences being defines as iter >= maxiter, checking if a float is exactly 0.0 without tolerance, or storing a scalar as a vector of identical scalars.

@lindahua
Copy link

This would become a great package. Your efforts will be appreciated by the community.

In terms of incorporating kernels, I think it may be a good idea to just write kernel based prediction functions. Not completely sure whether it may eventually work out, but I think it is good to try.

When this package matures, we should consider merge this and SVM.jl in certain way -- people tend to search for SVM.jl first instead of KSVM.jl. It seems like that SVM has not been actively updated for a while (except for changes to make it work with Julia 0.4).

cc @johnmyleswhite

@Evizero
Copy link
Owner Author

Evizero commented Sep 11, 2015

To really make it clean, I will have to make some additions to EmpiricalRisks.

For example I currently abuse the L1 / L2 regularizer to specify if I use L1-SVM or L2-SVM. However, in the later case that L1/L2 actually refers to the (squared) hinge loss and not the penalty. So given the same parameters, my new solver does a very different thing than the others. Also the C in SVM is not the same as the lambda in the usual case, but 1 / lambda.

Bottom line: I will implement the squared Hinge Loss

I'll submit a PR as soon as I get to it. Do you have any objections right away?

@johnmyleswhite
Copy link

I think it would be good to let that SVM.jl name go vacant for a month or two to make people realize the package is dead. After that, it would be pretty easy to move this new over to that name.

@Evizero
Copy link
Owner Author

Evizero commented Sep 11, 2015

I agree with that. There is no rush to claim that name. This package needs a lot of work before I am willing to Tag it.

There still need to be ...

  • A version that fits the bias (efficiently) as well
  • A special implementation for sparse matrices
  • ~~~At least the stochastic version dual coordinate descent for the LinearSVM~~~ (This will have to wait until I come up with a good general solution in SupervisedLearning.jl)
  • A one-vs-all version for dual coordinate descent
  • At least one algorithm for linear support-vector regression
  • Hard tests that really proof this package does what it pretends to do
  • An implementation of Platt's method for computing probabilities
  • A clean way to access the support vectors as user
  • Kernels and SMO for Kernel-based problems
  • Benchmarks against LibLinear and LibSVM respectively

Can anyone think of anything important I should add to that list? If you know people that have a stake or strong opinion on SVMs, please add them to the conversation. (Maybe @tbreloff ?)

@johnmyleswhite
Copy link

If you're looking for work, some performance comparisons against LibSVM would be very compelling.

@Evizero
Copy link
Owner Author

Evizero commented Sep 11, 2015

That's a good idea. added.

I did look at that code. It is quite impressive. I'm hopefull we'll get within a factor of 10

@Evizero Evizero added this to the Version 0.0.1 milestone Sep 13, 2015
@Evizero
Copy link
Owner Author

Evizero commented Sep 13, 2015

I added sparse matrix support to the list (at least for linear SVM). I think that is the single most important usecase for linear SVMs anyhow, so I will tackle that soon.

Does anyone have a specific suggestion for what package(s) to use for the sparse stuff, or is Base ok? The associated issues are kind of confusion regarding the current best practice

@johnmyleswhite
Copy link

The only meaningful sparse support I know of is in Base and it's only for CSC format. Eventually that will be moved out to a package, but the timeline on that is very unclear.

@Evizero
Copy link
Owner Author

Evizero commented Sep 16, 2015

Thanks. Turns out the inner structure of the SparseMatrixCSC type is perfect for dual coordinate descent and I was able to really abuse this nicely

@Evizero
Copy link
Owner Author

Evizero commented Sep 16, 2015

Well, so the first informal benchmarks (that are simply an adoption of here ) hints on a 2000x20000 matrix that for this version of the dual cd algorithm, that my implementation is at least somewhat on par with Pythons Scikit learn. The algorithm I implemented is pretty much the same as the one LibLinear (and thus Scikit-learn) uses, however it is neither online nor does it use anything like shrinking at this point. That being said I compare in the tests against Scikit-learns and get the same accuracy to at least three decimal places on those datasets.

So for this very informal benchmark I use scikit-learns LinearSVC which is just a liblinear wrapper, and I set the same settings for iterations and tolerance

python: 45.3 sec
julia - dense version: 22.5 sec
julia - sparse version: 28.5 sec

note that the generated training data is not sparse at all but very dense (the SVMLight file is 1 GB), thus I am actually surprised that the sparse implementation is competitive for this particular example.

I will implement a formal benchmark and check for significance, but that will have to wait a little until the other tasks are done

Edit: There is a good chance though that these numbers are overly optimistic, because I am not yet sure if I do the convergence check in the same manner as LibLinear. For this dataset I got .93 acc and scitlearn .94, which makes me suspect that my implementation simply exited earlier

@johnmyleswhite
Copy link

Very impressive work, even if the numbers need further revision.

@Evizero
Copy link
Owner Author

Evizero commented Sep 18, 2015

About half an hour ago I have read a discussion on licensing and derivative work and that got me a little paranoid.

First of all I did look at the SVM.jl code, even though I didn't take it as a reference (I implemented the algorithm just according to the paper). That being said, it's fair to say that the package did influence me. But that shouldn't be a problem, as everything should be OK if I giving credit for inspiration here, which I will gladly do.

But there might be an issue. I aim to make this package available under MIT. However, I just remembered that I did browse over the liblinear code, mainly to get a feel of how complex it is to implement something like this in C. Nonetheless, I technically looked at it for a minute or so. Since I am implementing the same algorithm which should give the same result, I guess one could argue that I am in a bit of a grey area here from a license standpoint.

What am I to do now? Can I still make this package MIT licensed, or do I have to remove dual coordinate descent and implement an other algorithm while being really careful to not stumble on any code that implements it?

Edit: I didn't realize you were the author of SVM.jl. Shows how ignorant I can be sometimes. Sorry if I stepped on your toes. I am pretty new to the open source movement, and did not give the licensing issues as much attention as I probably should have.

@johnmyleswhite
Copy link

What is the liblinear license?

A good rule of thumb is that you can never read GPL code unless you're writing a GPL library. Reading BSD code is harmless, although you still need to give attribution.

-- John

Sent from my iPhone

On Sep 18, 2015, at 8:27 AM, Christof Stocker notifications@github.com wrote:

About half an hour ago I have read a discussion on licensing and derivative work and that got me a little paranoid.

First of all I did look at the SVM.jl code, even though I didn't take it as a reference (I implemented the algorithm just according to the paper). That being said, it's fair to say that the package did influence me. But that shouldn't be a problem, as everything should be OK if I giving credit for inspiration here, which I will gladly do.

But there might be an issue. I aim to make this package available under MIT. However, I just remembered that I did browse over the liblinear code, mainly to get a feel of how complex it is to implement something like this in C. Nonetheless, I technically looked at it for a minute or so. Since I am implementing the same algorithm which should give the same result, I guess one could argue that I am in a bit of a grey area here from a license standpoint.

What am I to do now? Can I still make this package MIT licensed, or do I have to remove dual coordinate descent and implement an other algorithm while being really careful to not stumble on any code that implements it?


Reply to this email directly or view it on GitHub.

@Evizero
Copy link
Owner Author

Evizero commented Sep 18, 2015

modified BSD. So I guess I am not screwed? Man, should have known better. I didn't even get anything out of looking at it anyway.

Thank you, that's good thing to know. I'll simply never look at any related code again. Thankfully I haven't encountered any frustrating implementation issue thus far. might have ended up looking in the wrong place for answers

@johnmyleswhite
Copy link

Modified BSD == 2-clause or 3-clause? Either are fine.

The licensing situation is very complicated and we're being a little paranoid just to avoid any dangers for corporate users. But public domain, MIT and BSD code are all fine (except for 4-clause BSD). APL is ok, although you probably have to release your code as APL if you base anything on an APL library. But that's arguably better for some corporate users anyway.

@Evizero
Copy link
Owner Author

Evizero commented Sep 18, 2015

It's 3-clause. I should have really spend time on this issue before-hand. Didn't think simply browsing over it could cause me issues. I was planning to look at the code of a GPL licensed package, but luckily I didn't get around to it yet (I am sure my IP traces and such can prove that). Dodged a bullet there. I'll avoid anything code-related like the plague from now on.

You don't happen to know how I can properly give attribution for looking at the BSD code? I don't seem to find anything that describes this situation.

Also (you probably missed it since I edited my post before), I came to realize that you are the author of SVM.jl, which I did look at very closely. How can I properly give you attribution? This is my naive try to do so

@johnmyleswhite
Copy link

Usually I do things like this: https://github.com/JuliaLang/julia/blob/e97588db65f590d473e7fbbb127f30c01ea94995/src/support/asprintf.c

Since SVM is MIT, I'm not worried about attribution, but you could do the same thing as the BSD example above.

@Evizero
Copy link
Owner Author

Evizero commented Sep 18, 2015

Thank you, I will happily do that.

Thank you for your very quick and helpful advise, I got really paranoid and frustrated there for a second. I am just glad that I can continue this project with a clean conscience.

@johnmyleswhite
Copy link

Welcome to the world that the GPL has created. :)

@Evizero
Copy link
Owner Author

Evizero commented Sep 18, 2015

So, I hope this is appropriate: link

@johnmyleswhite
Copy link

That's more than enough

@Evizero
Copy link
Owner Author

Evizero commented Sep 21, 2015

This package will soon be at a usable stage, so I have started sketching out a very rough draft for the README. Since you both provide pretty good examples in Regression.jl and SVM.jl respectively I have been thinking of adapting those.

Are you both okay with how I assign you credit for the examples in the README?

@johnmyleswhite
Copy link

Works for me

@Evizero Evizero closed this as completed Nov 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants