Could you release the notebook to reproduce figure 10 of the paper? #8

Thomaswbt · 2023-03-06T10:41:13Z

Hi Samuel!

Thank you for your repo and great work! It is reported in the paper that single-objective LaMBO can greatly outperform LSBO in the penalized logp task. I found your released notebooks reproducing figure 1 and figure 3 of the paper very helpful and it would be of great help if you could also release the notebook and the wandb loggings to reproduce the figure 10 in the appendix.

Thanks a lot in advance!

Thomaswbt · 2023-03-07T07:48:07Z

The reason I raise this issue is that I tried to train a single-objective LaMBO model with the exact command from README:
python scripts/black_box_opt.py optimizer=lambo optimizer.encoder_obj=lanmt task=chem_lsbo tokenizer=selfies surrogate=single_task_svgp acquisition=ei encoder=lanmt_cnn

but this is the wandb logging I get wrt the penalized logp metric:

the bb evaluations totaled 64*50=3.2k, but the best score was just above 6, which is different from the results in figure 10, so I wonder if I have missed some extra processing steps to reproduce the results. Thanks!

samuelstanton · 2023-03-07T14:42:05Z

Sorry for the delayed response, would you mind sharing the link to the wandb data for your run?

Thomaswbt · 2023-03-07T16:43:14Z

Sure! This is the link to this particular run:
https://wandb.ai/thomaswang/lambo_replicate/runs/3shmruby?workspace=user-thomaswang

samuelstanton · 2023-03-07T19:56:25Z

thanks for sharing. there's three things that account for the discrepancy here

For consistency with PyMOO I followed the convention in the code that all objectives are minimized, so you need to account for the sign difference for maximized properties like penalized logP.
the candidates/obj_val_* field shows the best objective value within each query batch over time, so to transform into a plot like the one shown in the paper you need to apply a cummin transform to show the best-so-far as a function of time
I recall there being pretty substantial variance in performance across seeds (which is why I plotted quantiles), so you'd want to run for at least 5 trials, apply the cummin transform and compute the quantiles to reproduce the plot in the paper.

I will see what I can do about getting you a notebook to reproduce, but it may have to wait a bit while I deal with other work on my plate. In any case, I'm delighted you're taking the time to reproduce these experiments, if you have any further questions don't hesitate to ask :)

Thomaswbt · 2023-03-08T03:02:59Z

Thank you for your suggestion! I will first fix these differences.

samuelstanton · 2023-03-08T03:40:14Z

sounds good. note that the seed is fixed in the config, so you'll want to be sure to override it, e.g.

python scripts/black_box_opt.py -m optimizer=lambo optimizer.encoder_obj=lanmt task=chem_lsbo tokenizer=selfies surrogate=single_task_svgp acquisition=ei encoder=lanmt_cnn seed=1,2,3,4

samuelstanton · 2023-03-08T03:54:52Z

one more thing, obj_val_0 is actually the negative penalized logP, so you'll want to either apply cummin and negate, or negate then apply cummax. I've edited my previous response to reflect this.

https://github.com/samuelstanton/lambo/blob/main/lambo/tasks/chem/chem.py#L105

Thomaswbt · 2023-03-09T03:13:59Z

Sure, thanks for the reminder! The experiments are still running, and I also wonder if it's reasonable that the single-objective experiments need 1 day 12 hours to finish, while the multi-objective experiments need just 5 hours? Intuitively the single-objective runs should be faster than the multi-objective ones?

samuelstanton · 2023-03-10T03:15:11Z

fair question. the single-objective experiment collects bigger batches of data over more rounds than the multi-objective experiments, so using exact GP inference would require a lot of GPU memory and would likely be numerically unstable. Instead for this task I use a variational GP, which has constant memory footprint and is more numerically stable for large datasets. Unfortunately variational GPs are fairly slow to train, which leads to the dramatic increase in runtime. There probably is room for optimization here, the current training recipe is optimized more for stability than speed.

Thomaswbt · 2023-03-10T10:08:40Z

Thanks for the reply! It makes sense now.

However, as I re-ran the experiments with seed 1, 2, 3, 4, I found that the optimization performance is still under expectation. The wandb loggings are the ones with id 12, 13, 14, 15 of the project:

https://wandb.ai/thomaswang/lambo_replicate/groups/test/table?workspace=user-thomaswang

I did not do cummin operations for the log outputs, but we can see that the min values for the obj_val_0 are around -7 in all runs, which are 7 for penalized logp.

I wonder if there are some problems with the default configurations of the setting? Would it be possible for you to double check the configurations? On my side, I will also double check if there is something wrong with my reproduction.

Thanks very much!

samuelstanton · 2023-03-16T02:48:12Z

hm ok I'll take a look, thanks for raising the issue

jasonkyuyim · 2023-03-27T09:07:20Z

Hi! I am also intereested in the single objective use case for LaMBO. Is there any update on reproducing the published numbers?

kirjner · 2023-03-27T14:26:55Z

@samuelstanton I'm also having some trouble reproducing the results, I ran the following line:
python scripts/black_box_opt.py -m optimizer=lambo optimizer.encoder_obj=lanmt task=chem_lsbo tokenizer=selfies surrogate=single_task_svgp acquisition=ei encoder=lanmt_cnn seed=1,2,3,4
and, while the script is still running, I'm getting results similar to @Thomaswbt above (in fact slightly worse)

It would be great to get an update on this, thank you!

samuelstanton · 2023-04-25T22:34:16Z

Thank you all for your patience. I've determined that the some of the default hyperparameters were indeed misconfigured and have updated the command in the README. That being said the results I'm getting now are not quite what I expect and I will continue to investigate. Here's what I'm getting now

40%, 60%, and 80% quantiles across 5 seeds (0-4)

Performance by seed

While this is much better than the results you were seeing and the algorithm does "solve" the problem for 3/5 seeds (i.e. learn to output long hydrocarbon chains), this is not as good as what I was seeing before and is more sensitive to the random seed than I'd like. In any case I wanted to share an update while I continue looking in to this. I've also pushed the notebook I used to create these plots to notebooks/plot_lsbo_comparison.ipynb.

samuelstanton · 2023-04-25T22:40:40Z

The major hypers that have been corrected are:

optimizer.window_size=1 --> optimizer.window_size=8 this hyperparameter controls how many corruptions are made to the seed sequence and can have a major effect when the optimal solution requires large increases to the sequence length.
surrogate.bs=32 --> surrogate.bs=256 with a larger dataset increasing the batch size decreases the run time significantly, I was seeing about 6 hours per seed on an A100 after this change
optimizer.resampling_weight=1.0 --> optimizer.resampling_weight=0.5 this change makes the optimizer sample "good" seeds more aggressively when constructing batches of candidates.

samuelstanton · 2023-05-01T14:38:48Z

Increasing the max context length to 256 (task.max_len=256) improves performance on this benchmark, as I noted in the paper, but variance across seeds is still an issue.

Thomaswbt · 2023-06-08T06:00:20Z

Sorry for the late response. Thank you for your effort! Previously I also found that the choice for the starting sequences matters a lot to the final results. I think I will close the issue.

Thomaswbt closed this as completed Jun 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you release the notebook to reproduce figure 10 of the paper? #8

Could you release the notebook to reproduce figure 10 of the paper? #8

Thomaswbt commented Mar 6, 2023

Thomaswbt commented Mar 7, 2023

samuelstanton commented Mar 7, 2023

Thomaswbt commented Mar 7, 2023

samuelstanton commented Mar 7, 2023 •

edited

Loading

Thomaswbt commented Mar 8, 2023

samuelstanton commented Mar 8, 2023 •

edited

Loading

samuelstanton commented Mar 8, 2023 •

edited

Loading

Thomaswbt commented Mar 9, 2023

samuelstanton commented Mar 10, 2023 •

edited

Loading

Thomaswbt commented Mar 10, 2023

samuelstanton commented Mar 16, 2023

jasonkyuyim commented Mar 27, 2023

kirjner commented Mar 27, 2023 •

edited

Loading

samuelstanton commented Apr 25, 2023

samuelstanton commented Apr 25, 2023

samuelstanton commented May 1, 2023 •

edited

Loading

Thomaswbt commented Jun 8, 2023

Could you release the notebook to reproduce figure 10 of the paper? #8

Could you release the notebook to reproduce figure 10 of the paper? #8

Comments

Thomaswbt commented Mar 6, 2023

Thomaswbt commented Mar 7, 2023

samuelstanton commented Mar 7, 2023

Thomaswbt commented Mar 7, 2023

samuelstanton commented Mar 7, 2023 • edited Loading

Thomaswbt commented Mar 8, 2023

samuelstanton commented Mar 8, 2023 • edited Loading

samuelstanton commented Mar 8, 2023 • edited Loading

Thomaswbt commented Mar 9, 2023

samuelstanton commented Mar 10, 2023 • edited Loading

Thomaswbt commented Mar 10, 2023

samuelstanton commented Mar 16, 2023

jasonkyuyim commented Mar 27, 2023

kirjner commented Mar 27, 2023 • edited Loading

samuelstanton commented Apr 25, 2023

samuelstanton commented Apr 25, 2023

samuelstanton commented May 1, 2023 • edited Loading

Thomaswbt commented Jun 8, 2023

samuelstanton commented Mar 7, 2023 •

edited

Loading

samuelstanton commented Mar 8, 2023 •

edited

Loading

samuelstanton commented Mar 8, 2023 •

edited

Loading

samuelstanton commented Mar 10, 2023 •

edited

Loading

kirjner commented Mar 27, 2023 •

edited

Loading

samuelstanton commented May 1, 2023 •

edited

Loading