Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with SNP contamination for some samples #127

Open
Sarah145 opened this issue May 26, 2023 · 2 comments
Open

Error with SNP contamination for some samples #127

Sarah145 opened this issue May 26, 2023 · 2 comments

Comments

@Sarah145
Copy link

Hi, firstly, thank you for creating such a great tool!

I've been running numbat on samples from multiple individuals at three separate timepoints and for some individuals the tool runs smoothly and the results look nice. However, for some individuals I'm getting errors related to SNP contamination.

I ran the pileup_and_phase.R script for multiple samples from the same individual jointly and then ran run_numbat on each sample individually. Here's an example of the output I'm getting for one of the samples.

Loading required package: Matrix
Numbat version: 1.3.0
Running under parameters:
t = 1e-05
alpha = 1e-04
gamma = 15
min_cells = 10
init_k = 3
max_cost = 1378.2
n_cut = 0
max_iter = 2
max_nni = 100
min_depth = 0
use_loh = auto
segs_loh = None
call_clonal_loh = FALSE
segs_consensus_fix = None
multi_allelic = TRUE
min_LLR = 10
min_overlap = 0.45
max_entropy = 0.9
skip_nj = TRUE
diploid_chroms = None
ncores = 16
ncores_nni = 16
common_diploid = TRUE
tau = 0.3
check_convergence = FALSE
plot = FALSE
genome = hg38
Input metrics:
4594 cells
Mem used: 1.17Gb
Approximating initial clusters using smoothed expression ..
Mem used: 1.17Gb
number of genes left: 10520
running hclust...
Iteration 1
Mem used: 1.56Gb
High SNP contamination detected (40.9%). Please make sure that cells from only one individual are included in genotyping step.
Expression noise level (MSE): low (0.16). 
Running HMMs on 5 cell groups..
Error in `recycle_columns()`:
! Tibble columns must have compatible sizes.
• Size 166561: Column `3`.
• Size 229810: Column `2`.
• Size 245035: Column `1`.
ℹ Only values of size one are recycled.
Backtrace:
     ▆
  1. ├─numbat::run_numbat(...)
  2. │ └─bulk_subtrees %>% ...
  3. ├─numbat:::run_group_hmms(...)
  4. │ └─numbat:::find_common_diploid(...)
  5. │   └─... %>% bind_rows()
  6. └─dplyr::bind_rows(.)
  7.   ├─tibble::as_tibble(dots)
  8.   └─tibble:::as_tibble.list(dots)
  9.     └─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
 10.       └─tibble:::recycle_columns(x, .rows, lengths)
 11.         └─tibble:::abort_incompatible_size(.rows, names(x), lengths, "Requested with `.rows` argument")
 12.           └─tibble:::tibble_abort(...)
 13.             └─rlang::abort(x, class, ..., call = call, parent = parent, use_cli_format = TRUE)
Warning message:
In mclapply(bulks %>% split(.$sample), mc.cores = ncores, function(bulk) { :
  scheduled cores 2, 1 encountered errors in user code, all values of the jobs will be affected
Execution halted

I'm not sure if the error is being triggered by the high SNP contamination warning or something else but all cells are from the same individual so I'm not too sure why there's high SNP contamination.

Any insight you can provide would be greatly appreciated!

@teng-gao
Copy link
Collaborator

Hi Sarah,

The SNP contamination message indicates that a large fraction of the SNPs in the profile are homozygous. However, the analysis should still run so there may be an exception that is not handled properly. Feel free to share the input of one such sample via email (tgaoteng@gmail.com).

numbat/R/diagnostics.R

Lines 151 to 169 in a367fa5

#' check inter-individual contamination
#' @param bulk dataframe Pseudobulk profile
#' @return NULL
#' @keywords internal
check_contam = function(bulk) {
hom_rate = bulk %>% filter(DP >= 8) %>%
{mean(na.omit(.$AR == 0 | .$AR == 1))}
if (hom_rate > 0.4) {
msg = paste0(
'High SNP contamination detected ',
'(', round(hom_rate*100, 1), '%)',
'. Please make sure that cells from only one individual are included in genotyping step.')
message(msg)
log_warn(msg)
}
}

@Sarah145
Copy link
Author

Sarah145 commented Jun 2, 2023

Hi Teng, thanks for getting back to me! I shared the input files with you via email 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants