Som estimator seed #6

TarikExner · 2024-04-15T20:43:03Z

Hey,

thank you very much for this great implementation!

When running two consecutive runs of the FlowSOMEstimator.fit_predict() method, the SOM codes and therefore cluster annotations were different.

Reproducible example (fresh conda environment using python=3.10):

from flowsom.models import FlowSOMEstimator
import numpy as np

arr = np.random.randint(0,1000,2000).reshape(200,10)
fse = FlowSOMEstimator(cluster_kwargs = {"xdim": 10, "ydim": 10, "seed": 1}, metacluster_kwargs = {"n_clusters": 15})
fse2 = FlowSOMEstimator(cluster_kwargs = {"xdim": 10, "ydim": 10, "seed": 1}, metacluster_kwargs = {"n_clusters": 15})
c1 = fse.fit_predict(arr)
c2 = fse2.fit_predict(arr)
all(c1 == c2)
>>> False

The bug was caused by a missing numpy seed for the random selection of data points in the code initialization step.

This PR fixes this issue by introducing a seed-set if SOMEstimator.seed is not None.

I intentionally did not remove the call to numpy.random.seed in the SOM function due to differences in setting seeds in numba environments as described here. In fact, removal of the seed setting in the SOM function also results in above mentioned error, even with the proposed commits.

This commit also includes tests for the SOMEstimator class and the FlowSOMEstimator class for reproducibility.

Please let me know if there is anything missing from this commit.

Best,
Tarik

…lass

berombau · 2024-04-16T15:40:51Z

Hi Tarik,

Thank you for investigating this issue. I added your fix and test functions and merged them with some changes to passing the input parameters. This is an important fix, so I'll merge it to main. We're working on some parallellization improvements the coming months, which will likely also change the Numba code and the random number generation (e.g. like in https://blog.scientific-python.org/numpy/numpy-rng/).

Best,
Benjamin

TarikExner · 2024-04-17T06:52:30Z

Thank you very much for the quick merge!

Best,
Tarik

TarikExner and others added 5 commits April 15, 2024 20:15

fixed reproducibility issue due to missing seed in the SOMEstimator c…

249888f

…lass

removed unnecessary import for jit decorator

5e20f06

Merge branch 'main' into pr/6

3cb6a19

adapt to new parameters

0f3695d

Change FlowSOM input typing, add seed parameter

343f228

berombau merged commit 58ed09a into saeyslab:main Apr 16, 2024
1 of 3 checks passed

TarikExner deleted the SOMEstimator_seed branch April 17, 2024 06:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Som estimator seed #6

Som estimator seed #6

TarikExner commented Apr 15, 2024

berombau commented Apr 16, 2024

TarikExner commented Apr 17, 2024

Som estimator seed #6

Som estimator seed #6

Conversation

TarikExner commented Apr 15, 2024

berombau commented Apr 16, 2024

TarikExner commented Apr 17, 2024