New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Frank wolfe #67
WIP Frank wolfe #67
Conversation
Ok so I observed that the duality gap goes negative, even with exact inference. Also, there is a jump in the objective at the beginning of each epoch. |
I disabled the stopping criterion and just tried without line search as that seemed to be simpler. |
Ok, the jump in the objective is probably caused by not using the line-search and the online updates. will try the batch version ^^ |
so with line search and batch and my change the dual is increasing monotonically... that is a good sign ^^ |
The objective is converging to a different value than the other methods, though. And it it not a factor of |
Ok, so the meaning of C is different and the objective is scaled differently. I'll try to fix that. but other than that I think it looks very good! |
It also needs to use "batch_loss_augmented_inference" and "batch_psi" in the batch version..... |
… Works now that C is scaled correctly :)
Something is still odd. The objective on the ocr data is around 230 for bcfw but 250 for oneslack cutting plane (using ad3 which should be exact as this is a chain). |
ping @xianghang :) |
self._frank_wolfe_batch(X, Y) | ||
else: | ||
self._frank_wolfe_bc(X, Y) | ||
except KeyboardInterrupt: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it fine if the try/except is outside of the inner loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes. It is the same for SubgradientSSVM
. What do youthink could go wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just checking. If it can be done at such a high level, can't it be made
into a @decorator or maybe put in a base class? Anything that could go
wrong?
On Wed, Aug 14, 2013 at 11:09 AM, Andreas Mueller
notifications@github.comwrote:
In pystruct/learners/frankwolfe_ssvm.py:
print("p = %d, dual: %f, dual_gap: %f, primal: %f, positive slack: %d"
% (p, dual_val, dual_gap, primal_val, n_pos_slack))
if dual_gap < self.tol:
return
- def fit(self, X, Y):
self.model.initialize(X, Y)
self.objective_curve_, self.primal_objective_curve_ = [], []
self.timestamps_ = [time()]
self.w = getattr(self, "w", np.zeros(self.model.size_psi))
try:
if self.batch_mode:
self._frank_wolfe_batch(X, Y)
else:
self._frank_wolfe_bc(X, Y)
except KeyboardInterrupt:
I think so, yes. It is the same for SubgradientSSVM. What do youthink
could go wrong?—
Reply to this email directly or view it on GitHubhttps://github.com//pull/67/files#r5757516
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some classes have a finalization in fit
. I think currently it is mostly computing the objective once again. I didn't do that here, as it probably didn't move much since the last call to the duality gap computation. In the OneSlackSSVM on the other hand, the last inference migh be from the cach and won't tell you much about the actual objective. It would probably be worthwhile to put the innermost loop in a _fit
function and wrap the try
around that, just to make the code easier to read.
For decorators: I am not sure if having a single decorator instead of three lines of code is better, as the code is pretty clear and the decorator is not very obvious to someone not familiar with the codebase.
…relative to the iteration, not the update (otherwise the overhead is hard to judge).
So for the name... should we leave it as Frank-Wolfe? It also implements the batch version so BCFW is a bit misleading. |
…multi_class example in batch mode much faster....
…hmarked. I think we could improve it a bit more but want to merge it now.
Ok, I merged this guy :) |
This is the implementation of the Block-coordinate Frank Wolfe by Xianghang Liu :)
batch_psi
andbatch_loss_augmented_inference