Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP Frank wolfe #67

Merged
merged 29 commits into from Aug 15, 2013
Merged

WIP Frank wolfe #67

merged 29 commits into from Aug 15, 2013

Conversation

amueller
Copy link
Member

This is the implementation of the Block-coordinate Frank Wolfe by Xianghang Liu :)

  • rescale C to be consistent with the rest of pystruct
  • increase test coverage
  • use batch_psi and batch_loss_augmented_inference
  • add objective curves and timing information
  • write documentation
  • rescale objective to be consistent with the rest of pystruct
  • benchmark
  • rename estimator to BCFW (should we? that is the name used in the paper)
  • allow stopping with ctrl+c
  • implement averaging step
  • implement averaging for batch version
  • make verbosity actually control verbosity
  • Allow min-batches for parallel inference on multi-core computers.

@amueller
Copy link
Member Author

Ok so I observed that the duality gap goes negative, even with exact inference. Also, there is a jump in the objective at the beginning of each epoch.

@amueller
Copy link
Member Author

I disabled the stopping criterion and just tried without line search as that seemed to be simpler.

@amueller
Copy link
Member Author

Ok, the jump in the objective is probably caused by not using the line-search and the online updates. will try the batch version ^^

@amueller
Copy link
Member Author

so with line search and batch and my change the dual is increasing monotonically... that is a good sign ^^

@amueller
Copy link
Member Author

The objective is converging to a different value than the other methods, though. And it it not a factor of n_samples :-/

@amueller
Copy link
Member Author

Ok, so the meaning of C is different and the objective is scaled differently. I'll try to fix that. but other than that I think it looks very good!

@amueller
Copy link
Member Author

It also needs to use "batch_loss_augmented_inference" and "batch_psi" in the batch version.....

@amueller
Copy link
Member Author

Something is still odd. The objective on the ocr data is around 230 for bcfw but 250 for oneslack cutting plane (using ad3 which should be exact as this is a chain).
Does any one have an idea why that happens?

@amueller
Copy link
Member Author

ping @xianghang :)

self._frank_wolfe_batch(X, Y)
else:
self._frank_wolfe_bc(X, Y)
except KeyboardInterrupt:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it fine if the try/except is outside of the inner loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, yes. It is the same for SubgradientSSVM. What do youthink could go wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking. If it can be done at such a high level, can't it be made
into a @decorator or maybe put in a base class? Anything that could go
wrong?

On Wed, Aug 14, 2013 at 11:09 AM, Andreas Mueller
notifications@github.comwrote:

In pystruct/learners/frankwolfe_ssvm.py:

  •                    print("p = %d, dual: %f, dual_gap: %f, primal: %f, positive slack: %d"
    
  •                          % (p, dual_val, dual_gap, primal_val, n_pos_slack))
    
  •                if dual_gap < self.tol:
    
  •                    return
    
  • def fit(self, X, Y):
  •    self.model.initialize(X, Y)
    
  •    self.objective_curve_, self.primal_objective_curve_ = [], []
    
  •    self.timestamps_ = [time()]
    
  •    self.w = getattr(self, "w", np.zeros(self.model.size_psi))
    
  •    try:
    
  •        if self.batch_mode:
    
  •            self._frank_wolfe_batch(X, Y)
    
  •        else:
    
  •            self._frank_wolfe_bc(X, Y)
    
  •    except KeyboardInterrupt:
    

I think so, yes. It is the same for SubgradientSSVM. What do youthink
could go wrong?


Reply to this email directly or view it on GitHubhttps://github.com//pull/67/files#r5757516
.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some classes have a finalization in fit. I think currently it is mostly computing the objective once again. I didn't do that here, as it probably didn't move much since the last call to the duality gap computation. In the OneSlackSSVM on the other hand, the last inference migh be from the cach and won't tell you much about the actual objective. It would probably be worthwhile to put the innermost loop in a _fit function and wrap the try around that, just to make the code easier to read.
For decorators: I am not sure if having a single decorator instead of three lines of code is better, as the code is pretty clear and the decorator is not very obvious to someone not familiar with the codebase.

@amueller
Copy link
Member Author

So for the name... should we leave it as Frank-Wolfe? It also implements the batch version so BCFW is a bit misleading.
We could also call it "LJJSP" (the authors) but that is really hard to remember and spell ^^. So maybe FrankWolfeSSVM is fine?

@amueller amueller merged commit aafcfbc into pystruct:master Aug 15, 2013
@amueller
Copy link
Member Author

Ok, I merged this guy :)

@amueller amueller deleted the frank_wolfe branch August 15, 2013 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants