Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddMotif returning error in scATAC-seq analysis #1381

Closed
KshitijDeoghar opened this issue Apr 18, 2023 · 3 comments
Closed

AddMotif returning error in scATAC-seq analysis #1381

KshitijDeoghar opened this issue Apr 18, 2023 · 3 comments
Labels
bug Something isn't working
Milestone

Comments

@KshitijDeoghar
Copy link

KshitijDeoghar commented Apr 18, 2023

@timoast Hi Tim, I am trying to perform motif analysis on a few scATAC-seq datasets, and have been getting the same error on all of my seurat objects made from scATAC-seq data

pfm <- getMatrixSet( x = JASPAR2020, opts = list(collection = "CORE", tax_group = 'vertebrates', all_versions = FALSE) )

After getting the PFM, when I try to addMotifs to the Seurat object,

Endo.integrated <- AddMotifs( object = Endo.integrated, genome = BSgenome.Mmusculus.UCSC.mm10, pfm = pfm)

This is the error I get -

Error in .getOneSeqFromBSgenomeMultipleSequences(x, name, start, NA, width, :
sequence GL456216.1 not found
In addition: Warning message:
In .merge_two_Seqinfo_objects(x, y) :
Each of the 2 combined objects has sequence levels not in the other:

  • in 'x': chrM, chr1_GL455991_alt, chr1_GL455992_alt, chr1_GL455993_alt, chr1_GL456005_alt, chr1_JH584315_alt, chr1_JH584320_alt, chr1_JH584321_alt, chr1_JH584322_alt, chr2_GL456024_alt, chr3_GL456006_alt, chr3_GL456007_alt, chr3_GL456008_alt, chr3_GL456042_alt, chr3_GL456044_alt, chr3_GL456045_alt, chr3_GL456048_alt, chr3_GL456049_alt, chr3_JH584323_alt, chr4_GL455994_alt, chr4_GL456009_alt, chr4_GL456010_alt, chr4_GL456053_alt, chr4_GL456064_alt, chr4_GL456075_alt, chr4_GL456076_alt, chr4_GL456077_alt, chr4_JH584268_alt, chr4_JH584269_alt, chr4_JH584324_alt, chr4_JH584325_alt, chr4_JH584326_alt, chr5_GL455995_alt, chr5_GL456011_alt, chr6_GL456012_alt, chr6_GL456025_alt, chr6_GL456026_alt, chr6_GL456054_alt, chr6_GL456065_alt, chr6_JH584264_alt, chr7_GL455989_alt, chr7_GL456013_alt, chr7_GL456014_alt, chr8_GL455996_alt, chr8_GL455997_alt, chr10_GL456015_alt, chr11_GL455998_alt, chr11_GL456016_alt, chr11_GL456060_al [... truncated]

The annotations I am using are UCSC and so I am wondering what this particular error is and how to solve this.

The session Info is as follows -

sessionInfo()

R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8
[2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base

other attached packages:
[1] motifmatchr_1.20.0 BSgenome.Mmusculus.UCSC.mm10_1.4.3
[3] BSgenome_1.66.3 rtracklayer_1.58.0
[5] Biostrings_2.66.0 XVector_0.38.0
[7] TFBSTools_1.36.0 JASPAR2020_0.99.10
[9] GO.db_3.16.0 org.Mm.eg.db_3.16.0
[11] fgsea_1.24.0 EnhancedVolcano_1.16.0
[13] ggrepel_0.9.3 MAST_1.24.1
[15] SingleCellExperiment_1.20.1 SummarizedExperiment_1.28.0
[17] MatrixGenerics_1.10.0 matrixStats_0.63.0
[19] lubridate_1.9.2 forcats_1.0.0
[21] stringr_1.5.0 purrr_1.0.1
[23] readr_2.1.4 tidyr_1.3.0
[25] tibble_3.2.1 tidyverse_2.0.0
[27] dplyr_1.1.1 harmony_0.1.1
[29] Rcpp_1.0.10 patchwork_1.1.2
[31] ggplot2_3.4.1 EnsDb.Mmusculus.v79_2.99.0
[33] ensembldb_2.22.0 AnnotationFilter_1.22.0
[35] GenomicFeatures_1.50.3 AnnotationDbi_1.60.0
[37] Biobase_2.58.0 GenomicRanges_1.50.2
[39] GenomeInfoDb_1.34.6 IRanges_2.32.0
[41] S4Vectors_0.36.1 BiocGenerics_0.44.0
[43] hdf5r_1.3.8 Signac_1.9.0
[45] SeuratObject_4.1.3 Seurat_4.3.0

loaded via a namespace (and not attached):
[1] utf8_1.2.3 R.utils_2.12.2
[3] spatstat.explore_3.1-0 reticulate_1.28
[5] tidyselect_1.2.0 poweRlaw_0.70.6
[7] RSQLite_2.3.0 htmlwidgets_1.6.2
[9] grid_4.2.3 BiocParallel_1.32.5
[11] Rtsne_0.16 munsell_0.5.0
[13] codetools_0.2-19 ragg_1.2.5
[15] ica_1.0-3 DT_0.27
[17] future_1.32.0 miniUI_0.1.1.1
[19] withr_2.5.0 spatstat.random_3.1-4
[21] colorspace_2.1-0 progressr_0.13.0
[23] filelock_1.0.2 rstudioapi_0.14
[25] ROCR_1.0-11 tensor_1.5
[27] listenv_0.9.0 labeling_0.4.2
[29] GenomeInfoDbData_1.2.9 polyclip_1.10-4
[31] bit64_4.0.5 farver_2.1.1
[33] parallelly_1.35.0 vctrs_0.6.1
[35] generics_0.1.3 timechange_0.2.0
[37] BiocFileCache_2.6.1 R6_2.5.1
[39] bitops_1.0-7 spatstat.utils_3.0-2
[41] cachem_1.0.7 DelayedArray_0.24.0
[43] promises_1.2.0.1 BiocIO_1.8.0
[45] scales_1.2.1 gtable_0.3.3
[47] globals_0.16.2 goftest_1.2-3
[49] seqLogo_1.64.0 rlang_1.1.0
[51] systemfonts_1.0.4 RcppRoll_0.3.0
[53] splines_4.2.3 lazyeval_0.2.2
[55] spatstat.geom_3.1-0 yaml_2.3.7
[57] reshape2_1.4.4 abind_1.4-5
[59] crosstalk_1.2.0 httpuv_1.6.9
[61] tools_4.2.3 ellipsis_0.3.2
[63] jquerylib_0.1.4 RColorBrewer_1.1-3
[65] ggridges_0.5.4 plyr_1.8.8
[67] progress_1.2.2 zlibbioc_1.44.0
[69] RCurl_1.98-1.12 prettyunits_1.1.1
[71] deldir_1.0-6 pbapply_1.7-0
[73] cowplot_1.1.1 zoo_1.8-11
[75] cluster_2.1.4 magrittr_2.0.3
[77] data.table_1.14.8 scattermore_0.8
[79] lmtest_0.9-40 RANN_2.6.1
[81] ProtGenerics_1.30.0 fitdistrplus_1.1-8
[83] hms_1.1.3 mime_0.12
[85] xtable_1.8-4 XML_3.99-0.14
[87] gridExtra_2.3 compiler_4.2.3
[89] biomaRt_2.54.1 KernSmooth_2.23-20
[91] crayon_1.5.2 R.oo_1.25.0
[93] htmltools_0.5.5 later_1.3.0
[95] tzdb_0.3.0 snow_0.4-4
[97] DBI_1.1.3 dbplyr_2.3.2
[99] MASS_7.3-58.2 rappdirs_0.3.3
[101] Matrix_1.5-4 cli_3.6.1
[103] R.methodsS3_1.8.2 parallel_4.2.3
[105] igraph_1.4.1 pkgconfig_2.0.3
[107] TFMPvalue_0.0.9 GenomicAlignments_1.34.0
[109] sp_1.6-0 plotly_4.10.1
[111] spatstat.sparse_3.0-1 xml2_1.3.3
[113] annotate_1.76.0 DirichletMultinomial_1.40.0
[115] bslib_0.4.2 digest_0.6.31
[117] pracma_2.4.2 sctransform_0.3.5
[119] RcppAnnoy_0.0.20 CNEr_1.34.0
[121] spatstat.data_3.0-1 leiden_0.4.3
[123] fastmatch_1.1-3 uwot_0.1.14
[125] restfulr_0.0.15 curl_5.0.0
[127] gtools_3.9.4 shiny_1.7.4
[129] Rsamtools_2.14.0 rjson_0.2.21
[131] lifecycle_1.0.3 nlme_3.1-162
[133] jsonlite_1.8.4 viridisLite_0.4.1
[135] fansi_1.0.4 pillar_1.9.0
[137] lattice_0.20-45 KEGGREST_1.38.0
[139] fastmap_1.1.1 httr_1.4.5
[141] survival_3.5-3 glue_1.6.2
[143] png_0.1-8 bit_4.0.5
[145] sass_0.4.5 stringi_1.7.12
[147] blob_1.2.4 textshaping_0.3.6
[149] caTools_1.18.2 memoise_2.0.1
[151] irlba_2.3.5.1 future.apply_1.10.0

@KshitijDeoghar KshitijDeoghar added the bug Something isn't working label Apr 18, 2023
@timoast
Copy link
Collaborator

timoast commented Apr 20, 2023

This is due to having chromosome names in the peak matrix that are not present in the BSgenome object, so the sequence for those regions of the genome cannot be retrieved. This tends to happen for hg38 cellranger-processed data. One solution is to remove peaks that fall on non-standard chromosomes before making the Seurat object.

Alternatively, since this can be a common issue, I have now updated the AddMotifs() function so that it will ignore regions that are not found in the BSgenome object and just give a warning. You can test this by installing from the develop branch: https://stuartlab.org/signac/articles/install.html#development-version

@timoast timoast added this to the 1.10.0 milestone Apr 20, 2023
@KshitijDeoghar
Copy link
Author

Hi @timoast , I was able to addMotifs using the developer version. Thanks for updating the AddMotifs function. However, when I move on the next step and use FindMotifs(),

enriched.motifs <- FindMotifs( object = Endo.integrated, features = top.da.peak )

I get another error:

Selecting background regions to match input sequence characteristics
Matching GC.percent distribution
Error in density.default(x = query.feature[[featmatch]], kernel = "gaussian", :
'x' contains missing values

I am opening another thread for the issue and would kindly request your help. I have seen there is a similar thread but I was not able to solve it.

@timoast
Copy link
Collaborator

timoast commented Apr 22, 2023

Explained in #1388

@timoast timoast closed this as completed Apr 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Q2 2023
Development

No branches or pull requests

2 participants