Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute-score can not complete #46

Closed
Dan-121 opened this issue Nov 6, 2022 · 7 comments
Closed

compute-score can not complete #46

Dan-121 opened this issue Nov 6, 2022 · 7 comments

Comments

@Dan-121
Copy link

Dan-121 commented Nov 6, 2022

Hi, thanks for developing such a helpful tool, but I have had some questions recently. When I run the compute-score process, the code can not finish. could you be so pleased to help me with the problem? Here are the code.


  • Single-cell disease relevance score (scDRS)
  • Version 1.0.2
  • Martin Jinye Zhang and Kangcheng Hou
  • HSPH / Broad Institute / UCLA
  • MIT License

Call: scdrs compute-score
--h5ad-file /data4/scDRS/data/cere/expr.h5ad
--h5ad-species human
--cov-file /data4/scDRS/data/cere/cov.tsv
--gs-file /data4/scDRS/data/cere/processed_geneset.gs
--gs-species human
--ctrl-match-opt mean_var
--weight-opt vs
--adj-prop None
--flag-filter-data True
--flag-raw-count True
--n-ctrl 1000
--flag-return-ctrl-raw-score False
--flag-return-ctrl-norm-score True
--out-folder /data4/scDRS/data/cere/out
Loading data:
--h5ad-file loaded: n_cell=62247, n_gene=23202 (sys_time=7.0s)
First 3 cells: ['E083_AAACCCAAGGGCTGAT-1', 'E083_AAACCCACAGGCAATG-1', 'E083_AAACCCACAGTATACC-1']
First 5 genes: ['AL627309.1', 'AL627309.5', 'LINC01409', 'FAM87B', 'LINC01128']
--cov-file loaded: covariates=['const', 'n_genes', 'timepoint'] (sys_time=7.0s)
First 5 values for 'const': [1, 1, 1, 1, 1]
First 5 values for 'n_genes': [3861, 4883, 5453, 2459, 5002]
First 5 values for 'timepoint': ['E083', 'E083', 'E083', 'E083', 'E083']
--gs-file loaded: n_trait=3 (sys_time=7.0s)
Print info for first 3 traits:
First 3 elements for 'SCZ': ['NRGN', 'DPYD', 'RBFOX1'], [7.6558, 7.6519, 7.3247]
First 3 elements for 'CEREV': ['RNF11', 'CDKN2C', 'TRRAP'], [6.4221, 6.1533, 6.1347]
First 3 elements for 'Height': ['WWOX', 'BNC2', 'GMDS'], [10.0, 10.0, 10.0]

Preprocessing:
scdrs.pp.category2dummy: Detected categorical columns: timepoint. Added dummy columns: timepoint_E093,timepoint_E101,timepoint_E102,timepoint_E108,timepoint_E117. Dropped columns: timepoint.

Computing scDRS score:
Trait=SCZ, n_gene=898: 165/62247 FDR<0.1 cells, 469/62247 FDR<0.2 cells (sys_time=839.1s)
Trait=CEREV, n_gene=819: 0/62247 FDR<0.1 cells, 0/62247 FDR<0.2 cells (sys_time=1529.8s)


And the computer keeps running even 2 days after. Could you please help with the problem? looking forward to your relay, thanks.

@martinjzhang
Copy link
Owner

Hi @dandata123-tech , it seems scDRS completed for the first two traits (SCZ & CEREV), each taking around 800 seconds. If this is true, the software should have output the .score.gz and .full_score.gz files for the first two traits. Could you confirm it? It is indeed weird that the software got stuck when processing the third trait, which should take around the same time to complete (~800s). We can look into it if you can provide a minimal reproducible example.

@Dan-121
Copy link
Author

Dan-121 commented Nov 8, 2022

Hi, thanks for the in-time reply, I can get the output the .score.gz and .full_score.gz files for the first two traits, but got stuck when processing the third trait, and If I change the order of the gs file, I can get the first two traits two and get stuck in the third traits, It is ok when I run the example of our data.

@martinjzhang
Copy link
Owner

Hi @dandata123-tech ,

I suspect that your .gs file contains illegal values (such as NA or negative values for the gene weights). Please refer to https://martinjzhang.github.io/scDRS/file_format.html#gs for an example of the .gs file.

As diagnostics, you can create 3 separate .gs files for the 3 traits to see which one gives you the error. scDRS processes each trait independently, so running scDRS on the 3 separate .gs files should not change the results.

@Dan-121
Copy link
Author

Dan-121 commented Nov 9, 2022

Hi,thanks for your intime reply.
I check the gs file and find that there is no illegal values and I try it on your sample gs.
then I find something wrong if I run each trait independently.Here is the error.


Task exception was never retrieved
future: <Task finished name='Task-13' coro=<ScriptMagics.shebang.._handle_stream() done, defined at /home/user/anaconda3/envs/dictys/lib/python3.10/site-packages/IPython/core/magics/script.py:211> exception=ValueError('Separator is not found, and chunk exceed the limit')>
Traceback (most recent call last):
File "/home/user/anaconda3/envs/dictys/lib/python3.10/asyncio/streams.py", line 525, in readline
line = await self.readuntil(sep)
File "/home/user/anaconda3/envs/dictys/lib/python3.10/asyncio/streams.py", line 603, in readuntil
raise exceptions.LimitOverrunError(
asyncio.exceptions.LimitOverrunError: Separator is not found, and chunk exceed the limit

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/anaconda3/envs/dictys/lib/python3.10/site-packages/IPython/core/magics/script.py", line 213, in _handle_stream
line = (await stream.readline()).decode("utf8")
File "/home/user/anaconda3/envs/dictys/lib/python3.10/asyncio/streams.py", line 534, in readline
raise ValueError(e.args[0])
ValueError: Separator is not found, and chunk exceed the limit


Could you please help with the problem? looking forward to your relay, thanks.

@martinjzhang
Copy link
Owner

Hi @dandata123-tech

Thank you for following up. I am unable to identify the issue. The best way is to provide a minimal reproducible example. However, here are my guesses. The ValueError "ValueError: Separator is not found, and chunk exceed the limit" seems to indicate that scDRS couldn't parse the delimiters in your .gs file (\t or comma). Maybe it contains some non-English characters?

@martinjzhang
Copy link
Owner

Hi @dandata123-tech

Thank you for following up. Great that you have identified the issue.

Your procedures look about right. You can refer to this post for using MAGMA.

@Dan-121
Copy link
Author

Dan-121 commented Nov 13, 2022

Thank you.

@Dan-121 Dan-121 closed this as completed Nov 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants