Error with TSS enrichment #485

danielcgingerich · 2021-02-25T21:45:37Z

Here is the error:

  obj <- TSSEnrichment(object = obj, fast = FALSE)
Extracting TSS positions
Finding + strand cut sites
Finding - strand cut sites
Error in `colnames<-`(`*tmp*`, value = seq_len(length.out = region.width) -  :
  attempt to set 'colnames' on an object with less than two dimensions

I looked at issue #376 and tried to make sure my object was formatted correctly and removed patch edition chromosomes. Here is how I made the annotations:

hg38 <- rtracklayer::import(hg38.path)
genome(hg38) <- 'hg38'
old.seqnames <- seqlevels(hg38)
new.seqnames <- c('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','X','Y','MT',
                  "GL000009.2", "GL000194.1", "GL000195.1", "GL000205.2", "GL000213.1", "GL000216.2", "GL000218.1", 
                     ... )
names(new.seqnames) <- old.seqnames
hg38 <- renameSeqlevels(hg38, new.seqnames)
hg38 <- hg38[hg38@seqnames %in% new.seqnames[1:25],]
hg38 <- dropSeqlevels(hg38, new.seqnames[26:length(new.seqnames)])

hg38$gene_biotype <- hg38$gene_type

seqlengths(hg38) <- c(248956422, 242193529, 198295559, 198295559,  181538259,	170805979,  
                     159345973, 145138636, 138394717, 133797422, 135086622, 133275309,
                     114364328, 107043718,  101991189, 90338345, 83257441, 80373285, 58617616,
                     64444167, 46709983, 50818468, 156040895, 57227415, 16569)

The text was updated successfully, but these errors were encountered:

timoast · 2021-02-25T21:53:23Z

Please include the full code you're running and the output of sessionInfo()

danielcgingerich · 2021-02-25T22:02:10Z

Full code:

atac.path <- '/path/to/data/'
meta.path <- '/path/to/metadata/'
hg38.path <- '/path/to/reference/genome'
number.list <- number.list <- list("99", "111", "127", "191", "196",
                                   "273", "347", "357", "372", "430",
                                   "542", "601","676", "688", "730", 
                                   "781", "963", "984", "1099", "1545",
                                   "1557", "1600", "1670", "1690")

hg38 <- rtracklayer::import(hg38.path)
seqlevelsStyle(hg38) <- 'UCSC'
genome(hg38) <- 'hg38'
old.seqnames <- seqlevels(hg38)
new.seqnames <- c('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','X','Y','MT',
                  "GL000009.2", "GL000194.1", "GL000195.1", "GL000205.2", "GL000213.1", "GL000216.2", "GL000218.1", "GL000219.1",
                  "GL000220.1", "GL000225.1","KI270442.1","KI270711.1", "KI270713.1", "KI270721.1", "KI270726.1", "KI270727.1", 
                  "KI270728.1", "KI270731.1", "KI270733.1", "KI270734.1", "KI270744.1", "KI270750.1")
names(new.seqnames) <- old.seqnames
hg38 <- renameSeqlevels(hg38, new.seqnames)
hg38 <- hg38[hg38@seqnames %in% new.seqnames[1:25],]
hg38 <- dropSeqlevels(hg38, new.seqnames[26:length(new.seqnames)])
hg38$gene_biotype <- hg38$gene_type
seqlengths(hg38) <- c(248956422, 242193529, 198295559, 198295559,  181538259,	170805979,  
                     159345973, 145138636, 138394717, 133797422, 135086622, 133275309,
                     114364328, 107043718,  101991189, 90338345, 83257441, 80373285, 58617616,
                     64444167, 46709983, 50818468, 156040895, 57227415, 16569)

obj.list <- list()
for (i in 1:length(number.list)){
  message(i)
  counts <- readRDS(paste0(atac.path, number.list[[i]], '_dgC.mtx.from.h5.rds')) 
  metadata <- read.csv(meta.path[i], header = TRUE, row.names = 1)
  
  message("creating assay")
  obj.list[[i]] <- CreateChromatinAssay(counts = counts, sep = c(":", "-"), genome = 'hg38', 
                                        fragments = frag.path[i], min.cells = -1)
  
  message("creating seurat")
  obj.list[[i]] <- CreateSeuratObject(counts = obj.list[[i]], assay = 'peaks', meta.data = metadata)
  
  Annotation(obj.list[[i]]) <- hg38
  
  message("nucleosome signal")
  obj.list[[i]] <- NucleosomeSignal(object = obj.list[[i]])
  
  message("tss enrichment")
  obj.list[[i]] <- TSSEnrichment(object = obj.list[[i]], fast = FALSE)
  
  message("pct reads in peaks")
  obj.list[[i]]$pct_reads_in_peaks <- obj.list[[i]]$peak_region_fragments / obj.list[[i]]$passed_filters * 100
  
  message("blacklist ratio")
  obj.list[[i]]$blacklist_ratio <- obj.list[[i]]$blacklist_region_fragments / obj.list[[i]]$peak_region_fragments
  
  message("saving...")
  saveRDS(obj.list[[i]], paste0(atac.path, number.list[[i]], 'srt_preprocessed_unfiltered.rds'))
}

session info:

R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.8 (Santiago)

Matrix products: default
BLAS/LAPACK: /gpfs/fs1/data/common/shared_conda_envs/miniconda3/envs/R-4.0.2/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
 [1] grid      splines   stats4    parallel  stats     graphics  grDevices
 [8] utils     datasets  methods   base

other attached packages:
 [1] rhdf5_2.32.4              dplyr_1.0.2
 [3] Matrix.utils_0.9.8        patchwork_1.1.1
 [5] harmony_1.0               Rcpp_1.0.5
 [7] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.12.1
 [9] AnnotationFilter_1.12.0   GenomicFeatures_1.40.1
[11] AnnotationDbi_1.50.3      cicero_1.6.2
[13] Gviz_1.32.0               GenomicRanges_1.40.0
[15] GenomeInfoDb_1.24.2       IRanges_2.22.2
[17] S4Vectors_0.26.1          monocle_2.16.0
[19] DDRTree_0.1.5             irlba_2.3.3
[21] VGAM_1.1-4                ggplot2_3.3.2
[23] Biobase_2.48.0            BiocGenerics_0.34.0
[25] Matrix_1.3-2              SeuratObject_4.0.0
[27] Seurat_4.0.0              Signac_1.1.1

loaded via a namespace (and not attached):
  [1] reticulate_1.18             tidyselect_1.1.0
  [3] RSQLite_2.2.1               htmlwidgets_1.5.3
  [5] docopt_0.7.1                combinat_0.0-8
  [7] BiocParallel_1.22.0         Rtsne_0.15
  [9] munsell_0.5.0               codetools_0.2-18
 [11] ica_1.0-2                   future_1.21.0
 [13] miniUI_0.1.1.1              withr_2.3.0
 [15] fastICA_1.2-2               colorspace_2.0-0
 [17] OrganismDbi_1.30.0          knitr_1.30
 [19] rstudioapi_0.13             ROCR_1.0-11
 [21] tensor_1.5                  listenv_0.8.0
 [23] slam_0.1-48                 GenomeInfoDbData_1.2.3
 [25] polyclip_1.10-0             pheatmap_1.0.12
 [27] bit64_4.0.5                 farver_2.0.3
 [29] parallelly_1.22.0           vctrs_0.3.6
 [31] generics_0.1.0              xfun_0.19
 [33] biovizBase_1.36.0           BiocFileCache_1.12.1
 [35] lsa_0.73.2                  ggseqlogo_0.1
 [37] R6_2.5.0                    bitops_1.0-6
 [39] spatstat.utils_1.17-0       reshape_0.8.8
 [41] DelayedArray_0.14.1         assertthat_0.2.1
 [43] promises_1.1.1              scales_1.1.1
 [45] nnet_7.3-15                 gtable_0.3.0
 [47] globals_0.14.0              goftest_1.2-2
 [49] ggbio_1.36.0                rlang_0.4.9
 [51] RcppRoll_0.3.0              rtracklayer_1.48.0
 [53] lazyeval_0.2.2              dichromat_2.0-0
 [55] checkmate_2.0.0             BiocManager_1.30.10
 [57] reshape2_1.4.4              abind_1.4-5
 [59] backports_1.2.0             httpuv_1.5.4
 [61] Hmisc_4.4-1                 RBGL_1.64.0
 [63] tools_4.0.2                 ellipsis_0.3.1
 [65] RColorBrewer_1.1-2          ggridges_0.5.2
 [67] plyr_1.8.6                  base64enc_0.1-3
 [69] progress_1.2.2              zlibbioc_1.34.0
 [71] densityClust_0.3            purrr_0.3.4
 [73] RCurl_1.98-1.2              prettyunits_1.1.1
 [75] rpart_4.1-15                openssl_1.4.3
 [77] deldir_0.2-3                viridis_0.5.1
 [79] pbapply_1.4-3               cowplot_1.1.1
 [81] zoo_1.8-8                   grr_0.9.5
 [83] SummarizedExperiment_1.18.2 ggrepel_0.9.0
 [85] cluster_2.1.1               magrittr_2.0.1
 [87] data.table_1.13.4           scattermore_0.7
 [89] lmtest_0.9-38               RANN_2.6.1
 [91] SnowballC_0.7.0             ProtGenerics_1.20.0
 [93] fitdistrplus_1.1-3          matrixStats_0.57.0
 [95] hms_0.5.3                   mime_0.9
 [97] xtable_1.8-4                XML_3.99-0.5
 [99] jpeg_0.1-8.1                sparsesvd_0.2
[101] gridExtra_2.3               HSMMSingleCell_1.8.0
[103] compiler_4.0.2              biomaRt_2.44.4
[105] tibble_3.0.4                KernSmooth_2.23-18
[107] crayon_1.3.4                htmltools_0.5.1.1
[109] mgcv_1.8-34                 later_1.1.0.1
[111] Formula_1.2-4               tidyr_1.1.2
[113] DBI_1.1.0                   tweenr_1.0.1
[115] dbplyr_2.0.0                MASS_7.3-53.1
[117] rappdirs_0.3.1              igraph_1.2.6
[119] pkgconfig_2.0.3             GenomicAlignments_1.24.0
[121] foreign_0.8-81              plotly_4.9.2.2
[123] xml2_1.3.2                  XVector_0.28.0
[125] stringr_1.4.0               VariantAnnotation_1.34.0
[127] digest_0.6.27               sctransform_0.3.2
[129] RcppAnnoy_0.0.18            graph_1.66.0
[131] spatstat.data_1.7-0         Biostrings_2.56.0
[133] leiden_0.3.6                fastmatch_1.1-0
[135] htmlTable_2.1.0             uwot_0.1.10
[137] curl_4.3                    shiny_1.5.0
[139] Rsamtools_2.4.0             lifecycle_0.2.0
[141] nlme_3.1-152                jsonlite_1.7.2
[143] Rhdf5lib_1.10.1             limma_3.44.3
[145] viridisLite_0.3.0           askpass_1.1
[147] BSgenome_1.56.0             pillar_1.4.7
[149] lattice_0.20-41             GGally_2.0.0
[151] fastmap_1.0.1               httr_1.4.2
[153] survival_3.2-7              glue_1.4.2
[155] qlcMatrix_0.9.7             FNN_1.1.3
[157] spatstat_1.64-1             png_0.1-7
[159] bit_4.0.4                   ggforce_0.3.2
[161] stringi_1.5.3               blob_1.2.1
[163] latticeExtra_0.6-29         memoise_1.1.0
[165] future.apply_1.6.0

timoast · 2021-02-25T22:05:15Z

What is the gene annotation you're using here? I'd suggest just using the Ensembl annotation for hg38 as shown in the vignettes

danielcgingerich · 2021-02-25T22:18:01Z

Annotation is from gencode release 32. In the past I have used EnsDb.Hsapiens.v86, which worked well. However, I am comparing my data to snRNA data, and need to make sure I am using the same annotations as the reference genome used for mapping snRNA reads. I looked at table(rownames(snRNA) %in% EnsDb.Hsapiens.v86$gene_name) and not all genes were included. Would it be ok to use EnsDb.Hsapiens.v86 annotation for TSSEnrichment, but use the gencode annotation for creating the gene activity matrix? Both annotations are hg38 I believe.

timoast · 2021-02-25T23:11:59Z

This works for me:

library(Signac)
library(rtracklayer)

pbmc <- readRDS(...) # hg38-mapped object
hg38 <- import("~/gencode.v32.annotation.gtf.gz")

genome(hg38) <- "hg38"
seqlevelsStyle(hg38) <- "UCSC"
hg38$gene_biotype <- hg38$gene_type
Annotation(pbmc) <- hg38
pbmc <- TSSEnrichment(pbmc)

You need to ensure there's a column named "gene_biotype" in the annotation. I will add a check for this in TSSEnrichment or the annotation assignment function, so we can throw a more informative error message.

danielcgingerich · 2021-02-26T16:59:16Z

Interesting, I dont know what went wrong yesterday, but it seems to be working fine now. About to run my last scripts for tss enrichment. Thank you @timoast

danielcgingerich · 2021-02-26T17:00:27Z

maybe all I needed to do was specify the genome, seqlevelsStyle, and add gene_biotype. The other stuff I did to the genome was perhaps the problem? not sure

timoast · 2021-02-26T17:05:35Z

Not sure, I didn't try running all those steps. gene_biotype is definitely needed, I'll leave this issue open until I push an update adding a check for it in the annotation

strawberry789 · 2021-03-05T04:16:47Z

Using the TSSEnrichment function, I'm getting this error:

Error in strand[[1]]: subscript out of bounds

I have looked at issues #377 and #376, and it seemed like I didn't have a gene_biotype column in my granges object, so I added it as per above

gtf$gene_biotype <- gtf$gene_type

but I'm still getting the same error.

I used the UCSC fasta and gtf files to build the atac genome reference for running CellRanger, and I'm using the same gtf file to add gene information to my Signac object.

danielcgingerich · 2021-03-08T03:15:03Z

@strawberry789 I would love to try and help if I can. what is your code for constructing the reference, and can I download the gtf from somewhere online?

strawberry789 · 2021-03-08T03:31:53Z

@danielcgingerich Thanks for your reply. I used 10X Genomics' mkref function to construct the reference:

cellranger-atac mkref dm6_UCSC_atac --config dm6_UCSC_atac.config

where dm6_UCSC_atac.config contains:

GENOME_FASTA_INPUT: "path/dm6.fa",
GENE_ANNOTATION_INPUT: "path/dm6.ensGene.gtf.gz",
MOTIF_INPUT: "http://jaspar.genereg.net/download/CORE/JASPAR2020_CORE_insects_non-redundant_pfms_jaspar.txt",
ORGANISM: "Drosophila melanogaster",
PRIMARY_CONTIGS: ["chr2L", "chr2R", "chr3L", "chr3R", "chr4", "chrX", "chrY"],
NON_NUCLEAR_CONTIGS: ["chrM"]

The gtf was downloaded from: http://hgdownload.soe.ucsc.edu/goldenPath/dm6/bigZips/genes/dm6.ensGene.gtf.gz
The fasta was downloaded from: http://hgdownload.soe.ucsc.edu/goldenPath/dm6/bigZips/dm6.fa.gz

timoast · 2021-03-16T19:25:32Z

This should be fixed now on the develop branch (improved error message)

timoast closed this as completed Mar 16, 2021

timoast mentioned this issue Apr 28, 2021

TSSEnrichment Error #594

Closed

AAA-3 mentioned this issue Nov 29, 2021

TSSEnrichment error #887

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with TSS enrichment #485

Error with TSS enrichment #485

danielcgingerich commented Feb 25, 2021 •

edited

Loading

timoast commented Feb 25, 2021

danielcgingerich commented Feb 25, 2021 •

edited

Loading

timoast commented Feb 25, 2021

danielcgingerich commented Feb 25, 2021

timoast commented Feb 25, 2021

danielcgingerich commented Feb 26, 2021

danielcgingerich commented Feb 26, 2021

timoast commented Feb 26, 2021

strawberry789 commented Mar 5, 2021

danielcgingerich commented Mar 8, 2021

strawberry789 commented Mar 8, 2021

timoast commented Mar 16, 2021

Error with TSS enrichment #485

Error with TSS enrichment #485

Comments

danielcgingerich commented Feb 25, 2021 • edited Loading

timoast commented Feb 25, 2021

danielcgingerich commented Feb 25, 2021 • edited Loading

timoast commented Feb 25, 2021

danielcgingerich commented Feb 25, 2021

timoast commented Feb 25, 2021

danielcgingerich commented Feb 26, 2021

danielcgingerich commented Feb 26, 2021

timoast commented Feb 26, 2021

strawberry789 commented Mar 5, 2021

danielcgingerich commented Mar 8, 2021

strawberry789 commented Mar 8, 2021

timoast commented Mar 16, 2021

danielcgingerich commented Feb 25, 2021 •

edited

Loading

danielcgingerich commented Feb 25, 2021 •

edited

Loading