Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Som estimator seed #6

Merged
merged 5 commits into from
Apr 16, 2024
Merged

Som estimator seed #6

merged 5 commits into from
Apr 16, 2024

Conversation

TarikExner
Copy link

Hey,

thank you very much for this great implementation!

When running two consecutive runs of the FlowSOMEstimator.fit_predict() method, the SOM codes and therefore cluster annotations were different.

Reproducible example (fresh conda environment using python=3.10):

from flowsom.models import FlowSOMEstimator
import numpy as np

arr = np.random.randint(0,1000,2000).reshape(200,10)
fse = FlowSOMEstimator(cluster_kwargs = {"xdim": 10, "ydim": 10, "seed": 1}, metacluster_kwargs = {"n_clusters": 15})
fse2 = FlowSOMEstimator(cluster_kwargs = {"xdim": 10, "ydim": 10, "seed": 1}, metacluster_kwargs = {"n_clusters": 15})
c1 = fse.fit_predict(arr)
c2 = fse2.fit_predict(arr)
all(c1 == c2)
>>> False

The bug was caused by a missing numpy seed for the random selection of data points in the code initialization step.

This PR fixes this issue by introducing a seed-set if SOMEstimator.seed is not None.

I intentionally did not remove the call to numpy.random.seed in the SOM function due to differences in setting seeds in numba environments as described here. In fact, removal of the seed setting in the SOM function also results in above mentioned error, even with the proposed commits.

This commit also includes tests for the SOMEstimator class and the FlowSOMEstimator class for reproducibility.

Please let me know if there is anything missing from this commit.

Best,
Tarik

@berombau berombau merged commit 58ed09a into saeyslab:main Apr 16, 2024
1 of 3 checks passed
@berombau
Copy link
Member

Hi Tarik,

Thank you for investigating this issue. I added your fix and test functions and merged them with some changes to passing the input parameters. This is an important fix, so I'll merge it to main. We're working on some parallellization improvements the coming months, which will likely also change the Numba code and the random number generation (e.g. like in https://blog.scientific-python.org/numpy/numpy-rng/).

Best,
Benjamin

@TarikExner
Copy link
Author

Thank you very much for the quick merge!

Best,
Tarik

@TarikExner TarikExner deleted the SOMEstimator_seed branch April 17, 2024 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants