Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get a 'no file found' error for deconvolute_quantiseq() with btotalcells = TRUE #62

Closed
AGordonRobertson opened this issue Apr 7, 2021 · 27 comments

Comments

@AGordonRobertson
Copy link

AGordonRobertson commented Apr 7, 2021

I just ran immundeconv v2.0.3 on R v4.0.3 on RNAseq data for ~20k genes and ~250 tumor samples. (macOS 10.15.7)

  1. MCP-counter gave no problems.
    mcp_counter_res <- deconvolute_mcp_counter(
    as.matrix(merged.gencode.RNAseq.unique_gene_names),
    feature_types = "HUGO_symbols"
    )
    dim(mcp_counter_res)
    10 256

  2. quanTiseq gave no problems if btotalcells = default = FALSE.
    quanTiseq_res <- deconvolute_quantiseq(
    as.matrix(merged.gencode.RNAseq.unique_gene_names),
    tumor = TRUE,
    arrays = FALSE,
    scale_mrna = TRUE
    )
    Running quanTIseq deconvolution module
    Gene expression normalization and re-annotation (arrays: FALSE)
    Removing 17 noisy genes
    Removing 15 genes with high expression in tumors
    Signature genes found in data set: 137/138 (99.28%)
    Mixture deconvolution (method: lsei)
    Deconvolution sucessful!

  3. But if I set btotalcells = TRUE, I get an error.
    quanTiseq_res <- deconvolute_quantiseq(
    as.matrix(merged.gencode.RNAseq.unique_gene_names),
    tumor = TRUE,
    arrays = FALSE,
    scale_mrna = TRUE,
    btotalcells = TRUE
    )
    Running quanTIseq deconvolution module
    Gene expression normalization and re-annotation (arrays: FALSE)
    Removing 17 noisy genes
    Removing 15 genes with high expression in tumors
    Signature genes found in data set: 137/138 (99.28%)
    Mixture deconvolution (method: lsei)
    Deconvolution sucessful!
    Error in system.file("extdata", "quantiseq", "totalcells.txt", package = "immunedeconv", :
    no file found

Am I doing something silly?

@grst
Copy link
Collaborator

grst commented Apr 8, 2021

Hi @AGordonRobertson,

thanks for reporting this -- it seems nobody ever tried the totalcells option in immunedeconv before.

It certainly isn't implemented correctly in immunedeconv. Also this functionality requires an additional input file as shown in the original quantiseq documentation.

If you really need that feature I have to refer you to the original quantiseq pipeline for now.

@federicomarini is working on an improved R version of quanTIseq. @federicomarini: will it support the totalcells feature?

@FFinotello
Copy link
Collaborator

Hi @AGordonRobertson,

this is definitely a bug that @federicomarini and I are fixing in the new quanTIseq code.
We will post an update soon.

Sorry for the inconvenience,
Francesca

@AGordonRobertson
Copy link
Author

Thanks. Practically, it's extremely helpful to have different methods available within 'immunedeconv'.

'btotalcells' is described in help docs for deconvolute_quantiseq_default - "btotalcells = compute cell densities instead of fractions Default: FALSE"

I'm uncertain whether I need 'btotalcells'. I may not need it. But I'm uncertain about the 'units' of the results that different methods give -- so thought I'd try 'btotalcells = TRUE', and would compare its outputs across a set of expression subtypes, where I have purity values across the subtypes from a different team. I've made this comparison for other deconvolution methods within immunedeconv.

@AGordonRobertson
Copy link
Author

If it seems practical to you to fix 'btotalcells', please fix it. An less costly alternative would be to NOT fix it... I do NOT know that I need it.

@federicomarini
Copy link
Contributor

federicomarini commented Apr 8, 2021

In the current implementation here in immunedeconv it is looking for a file which is not there - but it is also containing information regarding the original images where you could compute the cell densities.
@FFinotello and I can chip in here once we have that running for quantiseqr 😉

@FFinotello
Copy link
Collaborator

Hello @AGordonRobertson and apologies for my late reply.

We are working on an improved version of the quanTIseq R code that should be ready soon and will, of course, also fix this bug.

Regarding your doubts about the "totalcells" argument, it was designed to pass to quanTIseq information on total cell densities estimated from images of tumor tissue-slides (e.g. H&E). This information is needed only if you want to scale the cell fractions (i.e. referred to total cells in a sample) to cell densities (i.e. cell counts per area as in pathology images), as explained in Fig. 1a from the original quanTIseq paper.

If you want to quantify the cellular composition of a sample using different methods applied only to transcriptomics data, you do not need (and have) cell densities. So, I would suggest keeping the default settings, regardless of the current bug.
In this way, you will get cell fractions that you can compare with the cell fractions or scores from the other methods.

Nevertheless, please remember that the outputs from the various methods can be by definition very different, and only EPIC and quanTIseq provide cell fractions referred to the overall cellular content of a sample (see also Table 1 from immunedeconv paper).

I hope this helps and please feel free to reach out again if you have any doubts.

Best,
Francesca

@AGordonRobertson
Copy link
Author

Thank you. It's very helpful to have your comments on the outputs of different methods. I'll use the current version and fractions.

@grst
Copy link
Collaborator

grst commented Nov 7, 2021

@federicomarini, is quantiseqR ready and should we switch to it in immunedeconv?

@federicomarini
Copy link
Contributor

I think so, quantiseqr is pretty much ready for prime time.

  • it's on Bioconductor
  • has already a couple of speedup optimizations
  • can handle matrix, eSets and also SummarizedExperiment objects

Especially this third point is something that can be worth having immunedeconv-wise, and so for the next projects. I think there's a nice advantage in handling the metadata in integrated containers that even take care of the matches, subsettings, and so on.

@FFinotello - (y)our call 😉

@grst
Copy link
Collaborator

grst commented Nov 7, 2021

@federicomarini, expressionSets are already supported:
https://github.com/icbi-lab/immunedeconv/blob/c70539f2b08901687561dca755337fc6a5130440/R/immune_deconvolution_methods.R#L331-L333

I've never worked with SummarizedExperiments, but I guess it should be trivial to support them.

@FFinotello
Copy link
Collaborator

FFinotello commented Nov 7, 2021 via email

@AGordonRobertson
Copy link
Author

AGordonRobertson commented Nov 7, 2021 via email

@AGordonRobertson
Copy link
Author

AGordonRobertson commented Nov 7, 2021 via email

@federicomarini
Copy link
Contributor

I can not see the question in your message, @AGordonRobertson - looks like to me you are just reporting the part from the Q&A in the vignette?

@AGordonRobertson
Copy link
Author

AGordonRobertson commented Nov 8, 2021 via email

@federicomarini
Copy link
Contributor

  1. Makes sense, but I'd argue it is more an aspect to curate in immundeconv - I am "only" a contributor of the quanTIseq porting 😉
  2. I agree this should be documented somewhere - in the end, we want the users to properly use the package!
    Wild guess: if you use otherwise normalized data, I don't think you'd be too much far away from the expected results, but you'd be using something in a sub-optimal way. And in full honesty, if one can, one should avoid it 👍

@grst
Copy link
Collaborator

grst commented Nov 8, 2021

I agree this should be documented somewhere - in the end, we want the users to properly use the package!
Wild guess: if you use otherwise normalized data, I don't think you'd be too much far away from the expected results, but you'd be using something in a sub-optimal way. And in full honesty, if one can, one should avoid it +1

This is documented here but it keeps coming up. Where do you think we should add it to make it more visible? We were thinking about an FAQ section at some point: #61

@AGordonRobertson
Copy link
Author

AGordonRobertson commented Nov 8, 2021 via email

@federicomarini
Copy link
Contributor

I agree this should be documented somewhere - in the end, we want the users to properly use the package!
Wild guess: if you use otherwise normalized data, I don't think you'd be too much far away from the expected results, but you'd be using something in a sub-optimal way. And in full honesty, if one can, one should avoid it +1

This is documented here but it keeps coming up. Where do you think we should add it to make it more visible? We were thinking about an FAQ section at some point: #61

I could argue it is well visible already, but maybe we could

  • point out somewhere in the vignette as well (Q&A sounds good)
  • in the function documentation of the wrapper, we can have a dedicated Details section for that

I mean, in the end it is up to the user to read the fantastic manual 😬

@AGordonRobertson
Copy link
Author

AGordonRobertson commented Nov 8, 2021 via email

@AGordonRobertson
Copy link
Author

AGordonRobertson commented Nov 8, 2021 via email

@federicomarini
Copy link
Contributor

I am guessing, but the FLD is the fragment length distribution? And that would be something you'd subtract from the length to "make it the effective length".
Apart from that: it can be that in some samples the length of the transcript might not really be the same. If using TPM quantifications from salmon or kallisto (as recommended, basically through the original concept of quanTIseq as a whole pipeline), then this aspect is already taken care of.

@AGordonRobertson
Copy link
Author

AGordonRobertson commented Nov 8, 2021 via email

@federicomarini
Copy link
Contributor

I'd suggest you can have a look at this very informative post by @rob-p for an explanation on the effective length 😉

http://robpatro.com/blog/?p=235#efflen

@AGordonRobertson
Copy link
Author

AGordonRobertson commented Nov 8, 2021 via email

@grst
Copy link
Collaborator

grst commented Apr 29, 2022

@federicomarini, can this be closed?

@federicomarini
Copy link
Contributor

Guess so - ended up spacing across a couple of topics in the end, but seems done to me

@grst grst closed this as completed Apr 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants