Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble reproducing Stochastic FW benchmarks #100

Open
busycalibrating opened this issue Jan 20, 2023 · 12 comments
Open

Trouble reproducing Stochastic FW benchmarks #100

busycalibrating opened this issue Jan 20, 2023 · 12 comments

Comments

@busycalibrating
Copy link

Hello!

I'm trying play around with the stochastic Frank-Wolfe benchmarks you have in this example and I'm having trouble reproducing the figure by running the provided example. The expected results are as follows:

image

However when I run the script out of the box, the first thing to note is that it takes a very long time; for 10 iters, these are the timing results:

CPU times: 
     user 16min 12s
     sys: 7min 29s
     total: 23min 42s
Wall time: 12min 25s

Perhaps this is fine seeing as ~10 steps should be sufficient (this would correspond to ~10^8 gradient evaluations which would match the figure in the example), but then the problem comes with the actual convergence results; if I set max_iter=500 (vs 1e4) and let it run for the full 2h x 3 runs, I get worse results:

image

In particular, after 10^8 gradient evaluations, the relative suboptimality

Some additional info that may be useful:

  • Python 3.7.6
  • SciPy 1.7.3
  • NumPy 1.21.6
  • Latest (development) version of COPT.

If useful, we can play around with the example in a colab notebook.

Thanks for any help!

@fabianp
Copy link
Member

fabianp commented Jan 23, 2023

Thanks for the report. For the timing, having numba installed can help. In your notebook you can install it with "!pip install numpy scipy numba" . However, I can see from the example site (https://openopt.github.io/copt/auto_examples/frank_wolfe/plot_sfw_real_data.html#sphx-glr-download-auto-examples-frank-wolfe-plot-sfw-real-data-py) that it anyway took 47 min, so I don't expect numba to change the runtime too much.

For the different suboptimality I don't have an answer. If I understand correctly, the issue is that for more iterations one gets worse FW gap and suboptimality? It could also just be that the example in our website is out of date. I'll try to re-generate it just in case

Any ideas @GeoffNN ?

@busycalibrating
Copy link
Author

Thanks for the response! I'll note that I did try using numba on my local device and it was taking even longer, even after a full run through one epoch. I haven't looked into this much so this may be a versioning issue though.

@fabianp
Copy link
Member

fabianp commented Jan 24, 2023

@GeoffNN : any idea why it's so slow? Could it be related to the use of sparse matrices?

@busycalibrating
Copy link
Author

Were you able to reproduce the figures in the doc/paper by the way? Or were you getting similar results to mine?

@fabianp
Copy link
Member

fabianp commented Jan 24, 2023 via email

@busycalibrating
Copy link
Author

Sounds great, thank you for your help!

@GeoffNN
Copy link
Collaborator

GeoffNN commented Jan 24, 2023

Hi!

I'm not sure what's going on...
Can you please try the version on the branch geoff/plotsSFW? , and let me know if it changes anything?

@fabianp
Copy link
Member

fabianp commented Jan 24, 2023

out of curiosity @busycalibrating , have you tried running the example locally? It could also be that colab's free tier has caps CPU speed or other resources

@busycalibrating
Copy link
Author

Yes, I tried it both locally (both linux and an m1 mac) and on a compute cluster. I think the actual execution time seems to be okay; it looks like the script should take ~10 iters (epochs?) which translates to ~10^8 gradient evaluations, and roughly ~45m for the script which is in line with what the doc says. It's the convergence rate that's confusing me more.

@GeoffNN I'll give it a try!

@GeoffNN
Copy link
Collaborator

GeoffNN commented Jan 24, 2023

Someone had reached out to me on a different channel, and had a similar issue. I couldn't identify exactly what was going on, but it seems like the code on main runs less iterations than the code on geoff/plotsSFW. I think it may be a x-axis issue with the plotting, rather than a problem with the algorithm itself.
They managed to reproduce with the branch code.
Sorry, I never got around to identifying exactly what was going on, since diffing the files didn't expose the problem.

The speed part is more surprising, and may be a Colab issue as @fabianp is mentioning.

@busycalibrating
Copy link
Author

busycalibrating commented Jan 24, 2023

Update: it seems like the branch you pointed me to is working; speed wise it's much faster, and with numba it's hitting ~2 iter/s so I think the older version of the package works as intended.

I also ran the code up to 500 epochs (~5m with numba), and it now translates to about 10^7 steps. Furthermore, the convergence rate looks consistent with the old figures in the documentation and paper, so I think there must be a bug in the latest version somewhere.

image

@GeoffNN
Copy link
Collaborator

GeoffNN commented Jan 24, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants