Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow choosing the random seed (from python) #176

Closed
kno10 opened this issue Jan 13, 2022 · 2 comments · Fixed by #178
Closed

Allow choosing the random seed (from python) #176

kno10 opened this issue Jan 13, 2022 · 2 comments · Fixed by #178
Assignees

Comments

@kno10
Copy link

kno10 commented Jan 13, 2022

The algorithm is introduced as randomized, but it appears to return the same results when run multiple times.
As far as I can tell, this is because the random generator is not seeded.
At least I could not find an invocation of arma::set_seed_random.
I would prefer a parameter that allows the (python-) user to set the seed in a reproducible way, i.e., add an option to the function call that is then used to seed the RNG; if not set it could default to seed using the current time.

P.S. Sorry for spamming you with so many issue tickets, but my impression is that this may suit your workflow and may help you keep track of such small TODOs.

@motiwari motiwari self-assigned this Jan 14, 2022
@motiwari
Copy link
Owner

@kno10 please don't worry -- I love the issues! It certainly does help keep organized, and your beta testing is extremely valuable for finding bugs. I have resolved #173 and #174 on our merge_fp1 branch, and am now looking into this issue and will ship all the fixes

@motiwari
Copy link
Owner

Hi @kno10 , this should be enabled in v3.0.2; could you try pip install --no-cache-dir banditpam==3.0.2?

An important point to note is that although BanditPAM is a randomized algorithm, we have tuned the hyperparameters so that it almost always returns the exact same results as PAM; see Figure 1a in the full paper. So though you may run the algorithm with different seeds, almost always you will get the same result.

There is the possibility of relaxing the requirement that BanditPAM returns the same result as PAM for an improvement in runtime, but that is currently beyond the scope of this package.

Lastly, I should note that the package currently supports multithreading; to ensure exact reproducibility of results from run to run, it's necessary to set the number of threads to 1; see below. I verified with some extra debugging that this allows the runs to be exactly reproducible.

import banditpam
banditpam.set_num_threads(1)  # Undocumented function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants