Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error with TSS enrichment #485

Closed
danielcgingerich opened this issue Feb 25, 2021 · 12 comments
Closed

Error with TSS enrichment #485

danielcgingerich opened this issue Feb 25, 2021 · 12 comments

Comments

@danielcgingerich
Copy link

danielcgingerich commented Feb 25, 2021

Here is the error:

  obj <- TSSEnrichment(object = obj, fast = FALSE)
Extracting TSS positions
Finding + strand cut sites
Finding - strand cut sites
Error in `colnames<-`(`*tmp*`, value = seq_len(length.out = region.width) -  :
  attempt to set 'colnames' on an object with less than two dimensions

I looked at issue #376 and tried to make sure my object was formatted correctly and removed patch edition chromosomes. Here is how I made the annotations:

hg38 <- rtracklayer::import(hg38.path)
genome(hg38) <- 'hg38'
old.seqnames <- seqlevels(hg38)
new.seqnames <- c('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','X','Y','MT',
                  "GL000009.2", "GL000194.1", "GL000195.1", "GL000205.2", "GL000213.1", "GL000216.2", "GL000218.1", 
                     ... )
names(new.seqnames) <- old.seqnames
hg38 <- renameSeqlevels(hg38, new.seqnames)
hg38 <- hg38[hg38@seqnames %in% new.seqnames[1:25],]
hg38 <- dropSeqlevels(hg38, new.seqnames[26:length(new.seqnames)])

hg38$gene_biotype <- hg38$gene_type

seqlengths(hg38) <- c(248956422, 242193529, 198295559, 198295559,  181538259,	170805979,  
                     159345973, 145138636, 138394717, 133797422, 135086622, 133275309,
                     114364328, 107043718,  101991189, 90338345, 83257441, 80373285, 58617616,
                     64444167, 46709983, 50818468, 156040895, 57227415, 16569)

@timoast
Copy link
Collaborator

timoast commented Feb 25, 2021

Please include the full code you're running and the output of sessionInfo()

@danielcgingerich
Copy link
Author

danielcgingerich commented Feb 25, 2021

Full code:

atac.path <- '/path/to/data/'
meta.path <- '/path/to/metadata/'
hg38.path <- '/path/to/reference/genome'
number.list <- number.list <- list("99", "111", "127", "191", "196",
                                   "273", "347", "357", "372", "430",
                                   "542", "601","676", "688", "730", 
                                   "781", "963", "984", "1099", "1545",
                                   "1557", "1600", "1670", "1690")

hg38 <- rtracklayer::import(hg38.path)
seqlevelsStyle(hg38) <- 'UCSC'
genome(hg38) <- 'hg38'
old.seqnames <- seqlevels(hg38)
new.seqnames <- c('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','X','Y','MT',
                  "GL000009.2", "GL000194.1", "GL000195.1", "GL000205.2", "GL000213.1", "GL000216.2", "GL000218.1", "GL000219.1",
                  "GL000220.1", "GL000225.1","KI270442.1","KI270711.1", "KI270713.1", "KI270721.1", "KI270726.1", "KI270727.1", 
                  "KI270728.1", "KI270731.1", "KI270733.1", "KI270734.1", "KI270744.1", "KI270750.1")
names(new.seqnames) <- old.seqnames
hg38 <- renameSeqlevels(hg38, new.seqnames)
hg38 <- hg38[hg38@seqnames %in% new.seqnames[1:25],]
hg38 <- dropSeqlevels(hg38, new.seqnames[26:length(new.seqnames)])
hg38$gene_biotype <- hg38$gene_type
seqlengths(hg38) <- c(248956422, 242193529, 198295559, 198295559,  181538259,	170805979,  
                     159345973, 145138636, 138394717, 133797422, 135086622, 133275309,
                     114364328, 107043718,  101991189, 90338345, 83257441, 80373285, 58617616,
                     64444167, 46709983, 50818468, 156040895, 57227415, 16569)

obj.list <- list()
for (i in 1:length(number.list)){
  message(i)
  counts <- readRDS(paste0(atac.path, number.list[[i]], '_dgC.mtx.from.h5.rds')) 
  metadata <- read.csv(meta.path[i], header = TRUE, row.names = 1)
  
  message("creating assay")
  obj.list[[i]] <- CreateChromatinAssay(counts = counts, sep = c(":", "-"), genome = 'hg38', 
                                        fragments = frag.path[i], min.cells = -1)
  
  message("creating seurat")
  obj.list[[i]] <- CreateSeuratObject(counts = obj.list[[i]], assay = 'peaks', meta.data = metadata)
  
  Annotation(obj.list[[i]]) <- hg38
  
  message("nucleosome signal")
  obj.list[[i]] <- NucleosomeSignal(object = obj.list[[i]])
  
  message("tss enrichment")
  obj.list[[i]] <- TSSEnrichment(object = obj.list[[i]], fast = FALSE)
  
  message("pct reads in peaks")
  obj.list[[i]]$pct_reads_in_peaks <- obj.list[[i]]$peak_region_fragments / obj.list[[i]]$passed_filters * 100
  
  message("blacklist ratio")
  obj.list[[i]]$blacklist_ratio <- obj.list[[i]]$blacklist_region_fragments / obj.list[[i]]$peak_region_fragments
  
  message("saving...")
  saveRDS(obj.list[[i]], paste0(atac.path, number.list[[i]], 'srt_preprocessed_unfiltered.rds'))
}

session info:

R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.8 (Santiago)

Matrix products: default
BLAS/LAPACK: /gpfs/fs1/data/common/shared_conda_envs/miniconda3/envs/R-4.0.2/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
 [1] grid      splines   stats4    parallel  stats     graphics  grDevices
 [8] utils     datasets  methods   base

other attached packages:
 [1] rhdf5_2.32.4              dplyr_1.0.2
 [3] Matrix.utils_0.9.8        patchwork_1.1.1
 [5] harmony_1.0               Rcpp_1.0.5
 [7] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.12.1
 [9] AnnotationFilter_1.12.0   GenomicFeatures_1.40.1
[11] AnnotationDbi_1.50.3      cicero_1.6.2
[13] Gviz_1.32.0               GenomicRanges_1.40.0
[15] GenomeInfoDb_1.24.2       IRanges_2.22.2
[17] S4Vectors_0.26.1          monocle_2.16.0
[19] DDRTree_0.1.5             irlba_2.3.3
[21] VGAM_1.1-4                ggplot2_3.3.2
[23] Biobase_2.48.0            BiocGenerics_0.34.0
[25] Matrix_1.3-2              SeuratObject_4.0.0
[27] Seurat_4.0.0              Signac_1.1.1

loaded via a namespace (and not attached):
  [1] reticulate_1.18             tidyselect_1.1.0
  [3] RSQLite_2.2.1               htmlwidgets_1.5.3
  [5] docopt_0.7.1                combinat_0.0-8
  [7] BiocParallel_1.22.0         Rtsne_0.15
  [9] munsell_0.5.0               codetools_0.2-18
 [11] ica_1.0-2                   future_1.21.0
 [13] miniUI_0.1.1.1              withr_2.3.0
 [15] fastICA_1.2-2               colorspace_2.0-0
 [17] OrganismDbi_1.30.0          knitr_1.30
 [19] rstudioapi_0.13             ROCR_1.0-11
 [21] tensor_1.5                  listenv_0.8.0
 [23] slam_0.1-48                 GenomeInfoDbData_1.2.3
 [25] polyclip_1.10-0             pheatmap_1.0.12
 [27] bit64_4.0.5                 farver_2.0.3
 [29] parallelly_1.22.0           vctrs_0.3.6
 [31] generics_0.1.0              xfun_0.19
 [33] biovizBase_1.36.0           BiocFileCache_1.12.1
 [35] lsa_0.73.2                  ggseqlogo_0.1
 [37] R6_2.5.0                    bitops_1.0-6
 [39] spatstat.utils_1.17-0       reshape_0.8.8
 [41] DelayedArray_0.14.1         assertthat_0.2.1
 [43] promises_1.1.1              scales_1.1.1
 [45] nnet_7.3-15                 gtable_0.3.0
 [47] globals_0.14.0              goftest_1.2-2
 [49] ggbio_1.36.0                rlang_0.4.9
 [51] RcppRoll_0.3.0              rtracklayer_1.48.0
 [53] lazyeval_0.2.2              dichromat_2.0-0
 [55] checkmate_2.0.0             BiocManager_1.30.10
 [57] reshape2_1.4.4              abind_1.4-5
 [59] backports_1.2.0             httpuv_1.5.4
 [61] Hmisc_4.4-1                 RBGL_1.64.0
 [63] tools_4.0.2                 ellipsis_0.3.1
 [65] RColorBrewer_1.1-2          ggridges_0.5.2
 [67] plyr_1.8.6                  base64enc_0.1-3
 [69] progress_1.2.2              zlibbioc_1.34.0
 [71] densityClust_0.3            purrr_0.3.4
 [73] RCurl_1.98-1.2              prettyunits_1.1.1
 [75] rpart_4.1-15                openssl_1.4.3
 [77] deldir_0.2-3                viridis_0.5.1
 [79] pbapply_1.4-3               cowplot_1.1.1
 [81] zoo_1.8-8                   grr_0.9.5
 [83] SummarizedExperiment_1.18.2 ggrepel_0.9.0
 [85] cluster_2.1.1               magrittr_2.0.1
 [87] data.table_1.13.4           scattermore_0.7
 [89] lmtest_0.9-38               RANN_2.6.1
 [91] SnowballC_0.7.0             ProtGenerics_1.20.0
 [93] fitdistrplus_1.1-3          matrixStats_0.57.0
 [95] hms_0.5.3                   mime_0.9
 [97] xtable_1.8-4                XML_3.99-0.5
 [99] jpeg_0.1-8.1                sparsesvd_0.2
[101] gridExtra_2.3               HSMMSingleCell_1.8.0
[103] compiler_4.0.2              biomaRt_2.44.4
[105] tibble_3.0.4                KernSmooth_2.23-18
[107] crayon_1.3.4                htmltools_0.5.1.1
[109] mgcv_1.8-34                 later_1.1.0.1
[111] Formula_1.2-4               tidyr_1.1.2
[113] DBI_1.1.0                   tweenr_1.0.1
[115] dbplyr_2.0.0                MASS_7.3-53.1
[117] rappdirs_0.3.1              igraph_1.2.6
[119] pkgconfig_2.0.3             GenomicAlignments_1.24.0
[121] foreign_0.8-81              plotly_4.9.2.2
[123] xml2_1.3.2                  XVector_0.28.0
[125] stringr_1.4.0               VariantAnnotation_1.34.0
[127] digest_0.6.27               sctransform_0.3.2
[129] RcppAnnoy_0.0.18            graph_1.66.0
[131] spatstat.data_1.7-0         Biostrings_2.56.0
[133] leiden_0.3.6                fastmatch_1.1-0
[135] htmlTable_2.1.0             uwot_0.1.10
[137] curl_4.3                    shiny_1.5.0
[139] Rsamtools_2.4.0             lifecycle_0.2.0
[141] nlme_3.1-152                jsonlite_1.7.2
[143] Rhdf5lib_1.10.1             limma_3.44.3
[145] viridisLite_0.3.0           askpass_1.1
[147] BSgenome_1.56.0             pillar_1.4.7
[149] lattice_0.20-41             GGally_2.0.0
[151] fastmap_1.0.1               httr_1.4.2
[153] survival_3.2-7              glue_1.4.2
[155] qlcMatrix_0.9.7             FNN_1.1.3
[157] spatstat_1.64-1             png_0.1-7
[159] bit_4.0.4                   ggforce_0.3.2
[161] stringi_1.5.3               blob_1.2.1
[163] latticeExtra_0.6-29         memoise_1.1.0
[165] future.apply_1.6.0

@timoast
Copy link
Collaborator

timoast commented Feb 25, 2021

What is the gene annotation you're using here? I'd suggest just using the Ensembl annotation for hg38 as shown in the vignettes

@danielcgingerich
Copy link
Author

Annotation is from gencode release 32. In the past I have used EnsDb.Hsapiens.v86, which worked well. However, I am comparing my data to snRNA data, and need to make sure I am using the same annotations as the reference genome used for mapping snRNA reads. I looked at table(rownames(snRNA) %in% EnsDb.Hsapiens.v86$gene_name) and not all genes were included. Would it be ok to use EnsDb.Hsapiens.v86 annotation for TSSEnrichment, but use the gencode annotation for creating the gene activity matrix? Both annotations are hg38 I believe.

@timoast
Copy link
Collaborator

timoast commented Feb 25, 2021

This works for me:

library(Signac)
library(rtracklayer)

pbmc <- readRDS(...) # hg38-mapped object
hg38 <- import("~/gencode.v32.annotation.gtf.gz")

genome(hg38) <- "hg38"
seqlevelsStyle(hg38) <- "UCSC"
hg38$gene_biotype <- hg38$gene_type
Annotation(pbmc) <- hg38
pbmc <- TSSEnrichment(pbmc)

You need to ensure there's a column named "gene_biotype" in the annotation. I will add a check for this in TSSEnrichment or the annotation assignment function, so we can throw a more informative error message.

@danielcgingerich
Copy link
Author

Interesting, I dont know what went wrong yesterday, but it seems to be working fine now. About to run my last scripts for tss enrichment. Thank you @timoast

@danielcgingerich
Copy link
Author

maybe all I needed to do was specify the genome, seqlevelsStyle, and add gene_biotype. The other stuff I did to the genome was perhaps the problem? not sure

@timoast
Copy link
Collaborator

timoast commented Feb 26, 2021

Not sure, I didn't try running all those steps. gene_biotype is definitely needed, I'll leave this issue open until I push an update adding a check for it in the annotation

@strawberry789
Copy link

Using the TSSEnrichment function, I'm getting this error:

Error in strand[[1]]: subscript out of bounds

I have looked at issues #377 and #376, and it seemed like I didn't have a gene_biotype column in my granges object, so I added it as per above

gtf$gene_biotype <- gtf$gene_type

but I'm still getting the same error.

I used the UCSC fasta and gtf files to build the atac genome reference for running CellRanger, and I'm using the same gtf file to add gene information to my Signac object.

@danielcgingerich
Copy link
Author

@strawberry789 I would love to try and help if I can. what is your code for constructing the reference, and can I download the gtf from somewhere online?

@strawberry789
Copy link

@danielcgingerich Thanks for your reply. I used 10X Genomics' mkref function to construct the reference:

cellranger-atac mkref dm6_UCSC_atac --config dm6_UCSC_atac.config

where dm6_UCSC_atac.config contains:

GENOME_FASTA_INPUT: "path/dm6.fa",
GENE_ANNOTATION_INPUT: "path/dm6.ensGene.gtf.gz",
MOTIF_INPUT: "http://jaspar.genereg.net/download/CORE/JASPAR2020_CORE_insects_non-redundant_pfms_jaspar.txt",
ORGANISM: "Drosophila melanogaster",
PRIMARY_CONTIGS: ["chr2L", "chr2R", "chr3L", "chr3R", "chr4", "chrX", "chrY"],
NON_NUCLEAR_CONTIGS: ["chrM"]

The gtf was downloaded from: http://hgdownload.soe.ucsc.edu/goldenPath/dm6/bigZips/genes/dm6.ensGene.gtf.gz
The fasta was downloaded from: http://hgdownload.soe.ucsc.edu/goldenPath/dm6/bigZips/dm6.fa.gz

@timoast
Copy link
Collaborator

timoast commented Mar 16, 2021

This should be fixed now on the develop branch (improved error message)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants