Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum number of repetitions required for sumo run #20

Open
aakrosh opened this issue Jun 18, 2020 · 1 comment
Open

Minimum number of repetitions required for sumo run #20

aakrosh opened this issue Jun 18, 2020 · 1 comment
Labels
sumo run issue concerns "run" mode

Comments

@aakrosh
Copy link
Contributor

aakrosh commented Jun 18, 2020

sumo run fails with the following message when a small number of repetitions (-n 2 in this case) is used.

Traceback (most recent call last):
  File "sumo/env/bin/sumo", line 11, in <module>
    load_entry_point('python-sumo', 'console_scripts', 'sumo')()
  File "sumo/src/sumo/run.py", line 15, in main
    mode.run()
  File "sumo/src/sumo/modes/run/run.py", line 150, in run
    results = [_run_factorization(sparsity=sparsity, k=k, sumo_run=_sumo_run) for sparsity in self.sparsity]
  File "sumo/src/sumo/modes/run/run.py", line 150, in <listcomp>
    results = [_run_factorization(sparsity=sparsity, k=k, sumo_run=_sumo_run) for sparsity in self.sparsity]
  File "sumo/src/sumo/modes/run/run.py", line 336, in _run_factorization
    consensus_labels = extract_ncut(consensus, k=k)
  File "sumo/src/sumo/utils.py", line 195, in extract_ncut
    u, s, vh = np.linalg.svd(np.eye(a.shape[0]) - d @ a @ d)
  File "<__array_function__ internals>", line 6, in svd
  File "sumo/env/lib/python3.6/site-packages/numpy/linalg/linalg.py", line 1626, in svd
    u, s, vh = gufunc(a, signature=signature, extobj=extobj)
ValueError: On entry to DLASCL parameter number 4 had an illegal value

Is there a minimum value that should be specified for a successful run?

@sienkie
Copy link
Collaborator

sienkie commented Jun 25, 2020

Version 0.2.5 introduced two new parameters ('-subsample' and '-rep') that increase the stability of results by exploring consensus clustering properties to a greater extent.

The former parameter regulates the fraction of samples that are randomly removed from each factorization. While deciding which samples will be removed we explicitly make sure that all samples will be clustered at least once. The later parameter sets
the number of times a subset of all runs (random 80% of runs) is used to create a consensus matrix.

The above error appears when there is a sample that was not clustered in any of the runs in a subset. This is very unlikely while using default sumo parameters, as factorization is run 60 times and only 5% of samples are removed from each run.

For now, I recommend using the higher number of repetitions or setting '-subsample' parameter to 0 (which prevents encountering this issue even if -n is very small), however, this issue will have to be addressed in the future.

@sienkie sienkie added the sumo run issue concerns "run" mode label Jun 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sumo run issue concerns "run" mode
Projects
None yet
Development

No branches or pull requests

2 participants