compute-score can not complete #46

Dan-121 · 2022-11-06T02:26:17Z

Hi, thanks for developing such a helpful tool, but I have had some questions recently. When I run the compute-score process, the code can not finish. could you be so pleased to help me with the problem? Here are the code.

Single-cell disease relevance score (scDRS)
Version 1.0.2
Martin Jinye Zhang and Kangcheng Hou
HSPH / Broad Institute / UCLA
MIT License

Call: scdrs compute-score
--h5ad-file /data4/scDRS/data/cere/expr.h5ad
--h5ad-species human
--cov-file /data4/scDRS/data/cere/cov.tsv
--gs-file /data4/scDRS/data/cere/processed_geneset.gs
--gs-species human
--ctrl-match-opt mean_var
--weight-opt vs
--adj-prop None
--flag-filter-data True
--flag-raw-count True
--n-ctrl 1000
--flag-return-ctrl-raw-score False
--flag-return-ctrl-norm-score True
--out-folder /data4/scDRS/data/cere/out
Loading data:
--h5ad-file loaded: n_cell=62247, n_gene=23202 (sys_time=7.0s)
First 3 cells: ['E083_AAACCCAAGGGCTGAT-1', 'E083_AAACCCACAGGCAATG-1', 'E083_AAACCCACAGTATACC-1']
First 5 genes: ['AL627309.1', 'AL627309.5', 'LINC01409', 'FAM87B', 'LINC01128']
--cov-file loaded: covariates=['const', 'n_genes', 'timepoint'] (sys_time=7.0s)
First 5 values for 'const': [1, 1, 1, 1, 1]
First 5 values for 'n_genes': [3861, 4883, 5453, 2459, 5002]
First 5 values for 'timepoint': ['E083', 'E083', 'E083', 'E083', 'E083']
--gs-file loaded: n_trait=3 (sys_time=7.0s)
Print info for first 3 traits:
First 3 elements for 'SCZ': ['NRGN', 'DPYD', 'RBFOX1'], [7.6558, 7.6519, 7.3247]
First 3 elements for 'CEREV': ['RNF11', 'CDKN2C', 'TRRAP'], [6.4221, 6.1533, 6.1347]
First 3 elements for 'Height': ['WWOX', 'BNC2', 'GMDS'], [10.0, 10.0, 10.0]

Preprocessing:
scdrs.pp.category2dummy: Detected categorical columns: timepoint. Added dummy columns: timepoint_E093,timepoint_E101,timepoint_E102,timepoint_E108,timepoint_E117. Dropped columns: timepoint.

Computing scDRS score:
Trait=SCZ, n_gene=898: 165/62247 FDR<0.1 cells, 469/62247 FDR<0.2 cells (sys_time=839.1s)
Trait=CEREV, n_gene=819: 0/62247 FDR<0.1 cells, 0/62247 FDR<0.2 cells (sys_time=1529.8s)

And the computer keeps running even 2 days after. Could you please help with the problem? looking forward to your relay, thanks.

martinjzhang · 2022-11-07T16:12:22Z

Hi @dandata123-tech , it seems scDRS completed for the first two traits (SCZ & CEREV), each taking around 800 seconds. If this is true, the software should have output the .score.gz and .full_score.gz files for the first two traits. Could you confirm it? It is indeed weird that the software got stuck when processing the third trait, which should take around the same time to complete (~800s). We can look into it if you can provide a minimal reproducible example.

Dan-121 · 2022-11-08T07:15:12Z

Hi, thanks for the in-time reply, I can get the output the .score.gz and .full_score.gz files for the first two traits, but got stuck when processing the third trait, and If I change the order of the gs file, I can get the first two traits two and get stuck in the third traits, It is ok when I run the example of our data.

martinjzhang · 2022-11-08T16:23:49Z

Hi @dandata123-tech ,

I suspect that your .gs file contains illegal values (such as NA or negative values for the gene weights). Please refer to https://martinjzhang.github.io/scDRS/file_format.html#gs for an example of the .gs file.

As diagnostics, you can create 3 separate .gs files for the 3 traits to see which one gives you the error. scDRS processes each trait independently, so running scDRS on the 3 separate .gs files should not change the results.

Dan-121 · 2022-11-09T06:37:34Z

Hi,thanks for your intime reply.
I check the gs file and find that there is no illegal values and I try it on your sample gs.
then I find something wrong if I run each trait independently.Here is the error.

Task exception was never retrieved
future: <Task finished name='Task-13' coro=<ScriptMagics.shebang.._handle_stream() done, defined at /home/user/anaconda3/envs/dictys/lib/python3.10/site-packages/IPython/core/magics/script.py:211> exception=ValueError('Separator is not found, and chunk exceed the limit')>
Traceback (most recent call last):
File "/home/user/anaconda3/envs/dictys/lib/python3.10/asyncio/streams.py", line 525, in readline
line = await self.readuntil(sep)
File "/home/user/anaconda3/envs/dictys/lib/python3.10/asyncio/streams.py", line 603, in readuntil
raise exceptions.LimitOverrunError(
asyncio.exceptions.LimitOverrunError: Separator is not found, and chunk exceed the limit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/anaconda3/envs/dictys/lib/python3.10/site-packages/IPython/core/magics/script.py", line 213, in _handle_stream
line = (await stream.readline()).decode("utf8")
File "/home/user/anaconda3/envs/dictys/lib/python3.10/asyncio/streams.py", line 534, in readline
raise ValueError(e.args[0])
ValueError: Separator is not found, and chunk exceed the limit

Could you please help with the problem? looking forward to your relay, thanks.

martinjzhang · 2022-11-09T07:22:10Z

Hi @dandata123-tech

Thank you for following up. I am unable to identify the issue. The best way is to provide a minimal reproducible example. However, here are my guesses. The ValueError "ValueError: Separator is not found, and chunk exceed the limit" seems to indicate that scDRS couldn't parse the delimiters in your .gs file (\t or comma). Maybe it contains some non-English characters?

martinjzhang · 2022-11-11T06:26:39Z

Hi @dandata123-tech

Thank you for following up. Great that you have identified the issue.

Your procedures look about right. You can refer to this post for using MAGMA.

Dan-121 · 2022-11-13T06:22:50Z

Thank you.

Dan-121 closed this as completed Nov 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compute-score can not complete #46

compute-score can not complete #46

Dan-121 commented Nov 6, 2022 •

edited

martinjzhang commented Nov 7, 2022

Dan-121 commented Nov 8, 2022

martinjzhang commented Nov 8, 2022

Dan-121 commented Nov 9, 2022

martinjzhang commented Nov 9, 2022

martinjzhang commented Nov 11, 2022

Dan-121 commented Nov 13, 2022

compute-score can not complete #46

compute-score can not complete #46

Comments

Dan-121 commented Nov 6, 2022 • edited

martinjzhang commented Nov 7, 2022

Dan-121 commented Nov 8, 2022

martinjzhang commented Nov 8, 2022

Dan-121 commented Nov 9, 2022

martinjzhang commented Nov 9, 2022

martinjzhang commented Nov 11, 2022

Dan-121 commented Nov 13, 2022

Dan-121 commented Nov 6, 2022 •

edited