Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sce lib problem #1

Closed
mbahin opened this issue Sep 23, 2020 · 9 comments
Closed

sce lib problem #1

mbahin opened this issue Sep 23, 2020 · 9 comments

Comments

@mbahin
Copy link

mbahin commented Sep 23, 2020

Hi,

I would like to explore the binary tier matrix from alevin results so I wanted to follow the procedure here but when I try to import parser, I get the following message:
ModuleNotFoundError: No module named 'sce.sce'

I can't find any info on this sce lib.

By the way, I have a question that you might be able to answer.
When I run alevin, I get a number of cells from which certain have a low confidence. Is it possible to get the list of these low conficence barcodes? (I was actually hoping to find info about that in the tier matrix but not sure!).

Cheers,
Mathieu

@k3yavi
Copy link
Owner

k3yavi commented Sep 23, 2020

Hi @mbahin ,

May I ash how did you install vpolo ? If you are installing it through conda try installing it in a new environment with python 3.8.

re: low confidence barcodes, alevin should generate two files quants_mat_rows.txt and whitelist.txt, all the barcodes in the first file but not in the second are the low confidence barcodes.

Hope it helps!

@mbahin
Copy link
Author

mbahin commented Sep 23, 2020

Thanks @k3yavi for the quick answer.

I've installed it through conda and in a Python 3.6 environment, so I can try with a 3.8.

Installing vpolo was actually only a way to get to this split low/high confidence barcodes.
What I don't understand then is that, on the standard output, alevin wrote that I had 334 barcodes (which I find in quants_mat_rows.txt) with 201 low confidence ones. But in whitelist.txt, I only find 102 barcodes.

Cheers,
Mathieu

@k3yavi
Copy link
Owner

k3yavi commented Sep 23, 2020

Hi @mbahin ,

Can you share the log ? Basically Alevin performs whitelisting at multiple level. The logs on standard output is for the first rough whitelisting using knee based thresholding on the cdf of the cb frequency. The whitelist file is after another level of whitelisting, which can further filter / swap the cbs.

@mbahin
Copy link
Author

mbahin commented Sep 23, 2020

Here is the log:

[2020-09-22 15:24:18.089] [alevinLog] [info] Found 179839 transcripts(+194 decoys, +44 short and +0 duplicate names in the index)
[2020-09-22 15:24:18.259] [alevinLog] [info] Filled with 179883 txp to gene entries
[2020-09-22 15:24:18.286] [alevinLog] [info] Found all transcripts to gene mappings
[2020-09-22 15:24:18.313] [alevinLog] [info] Processing barcodes files (if Present)

[2020-09-22 15:25:35.437] [alevinLog] [info] Done barcode density calculation.
[2020-09-22 15:25:35.437] [alevinLog] [info] # Barcodes Used: 41789122 / 41789122.
[2020-09-22 15:25:40.841] [alevinLog] [info] Knee found left boundary at 1115
[2020-09-22 15:25:40.920] [alevinLog] [info] Gauss Corrected Boundary at 133
[2020-09-22 15:25:40.920] [alevinLog] [info] Learned InvCov: 125.325 normfactor: 70.3073
[2020-09-22 15:25:40.920] [alevinLog] [info] Total 334(has 201 low confidence) barcodes
[2020-09-22 15:25:41.116] [alevinLog] [info] Done True Barcode Sampling
[2020-09-22 15:25:41.182] [alevinLog] [info] Total 12.3872% reads will be thrown away because of noisy Cellular barcodes.
[2020-09-22 15:25:41.202] [alevinLog] [info] Done populating Z matrix
[2020-09-22 15:25:41.205] [alevinLog] [info] Total 6780 CB got sequence corrected
[2020-09-22 15:25:41.206] [alevinLog] [info] Done indexing Barcodes
[2020-09-22 15:25:41.206] [alevinLog] [info] Total Unique barcodes found: 254108
[2020-09-22 15:25:41.206] [alevinLog] [info] Used Barcodes except Whitelist: 6526
[2020-09-22 15:25:41.231] [alevinLog] [info] Done with Barcode Processing; Moving to Quantify

[2020-09-22 15:25:41.231] [alevinLog] [info] parsing read library format
[2020-09-22 15:31:41.621] [alevinLog] [info] Starting optimizer

[2020-09-22 15:31:42.366] [alevinLog] [warning] mrna file not provided; using is 1 less feature for whitelisting
[2020-09-22 15:31:42.366] [alevinLog] [warning] rrna file not provided; using is 1 less feature for whitelisting
[2020-09-22 15:31:44.735] [alevinLog] [info] Total 429829.00 UMI after deduplicating.
[2020-09-22 15:31:44.735] [alevinLog] [info] Total 2255592 BiDirected Edges.
[2020-09-22 15:31:44.735] [alevinLog] [info] Total 509951 UniDirected Edges.
[2020-09-22 15:31:44.749] [alevinLog] [info] Clearing EqMap; Might take some time.
[2020-09-22 15:31:44.870] [alevinLog] [info] Starting white listing of 333 cells
[2020-09-22 15:31:44.870] [alevinLog] [info] Starting to make feature Matrix
[2020-09-22 15:31:44.871] [alevinLog] [info] Done making feature Matrix
[2020-09-22 15:31:44.934] [alevinLog] [info] Finished white listing
[2020-09-22 15:31:44.954] [alevinLog] [info] Starting dumping cell v gene counts in mtx format
[2020-09-22 15:31:45.288] [alevinLog] [info] Finished dumping counts into mtx
[2020-09-22 15:31:45.290] [alevinLog] [info] Finished optimizer

Ok it makes sense if there are more steps before final whitelisting.

Can you provide a little explanation on the knee and Gauss corrected boundary please (Or point to a link where I can find more, I couldn't find)? We don't have a plot to see that right?
From what I understand, the knee selected 1115 barcodes but the gauss corrected found only 133 (and at some point there was 201 of bad quality).
The 12% of thrown away reads corresponds to all the filtered out barcodes (keeping only the 334 barcodes)?
There was 254108 barcodes found, 6780 were sequence corrected and amongst them, 6526 were corrected and matched a whitelist barcode?

Cheers,
Mathieu

@ACastanza
Copy link

I'm getting this sce error in a Python 3.8.6 environment

 from vpolo.alevin import parser
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/anaconda3/lib/python3.8/site-packages/vpolo/alevin/parser.py", line 9, in <module>
    import sce
  File "/usr/local/anaconda3/lib/python3.8/site-packages/sce/__init__.py", line 1, in <module>
    from .sce import *
ModuleNotFoundError: No module named 'sce.sce'

vpolo was installed using conda install -c bioconda vpolo

@k3yavi
Copy link
Owner

k3yavi commented Feb 12, 2021

hi @ACastanza ,

Can you check if this solution helps.

@ACastanza
Copy link

I tried installing Rust (conda install -c conda-forge rust)
And I'm still getting the same error trying to import the parser.

@k3yavi
Copy link
Owner

k3yavi commented Feb 12, 2021

I am guessing you reinstalled the vpolo using github as well ?
Another option is to use fishpond , unless python is a requirement ?

@ACastanza
Copy link

I hadn't actually since the conda forge version and the github version appeared to be the same version. I did just reinstall from github directly and now its working, so it looks like some of the dependencies might not be properly specified in the conda version.

Also, yes, I use fishpond when working in R, but I'm working on implementing the Alevin Single Cell Velocity tutorial and since scvelo is python based figured it would be easier to do the downstream work entirely in python. Thanks again!

@k3yavi k3yavi closed this as completed Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants