Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error adding gene names to the differentially accessible peaks #1337

Open
KshitijDeoghar opened this issue Feb 14, 2023 · 2 comments
Open

Error adding gene names to the differentially accessible peaks #1337

KshitijDeoghar opened this issue Feb 14, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@KshitijDeoghar
Copy link

I am frequently running into this problem where I want to add the gene names, gene biotypes and other features to the differentially accessible peaks in a single-cell ATAC-seq data analysis of two experimental groups.

When I use the following code

da_peaks_Spleen <- FindMarkers( object = data_Spleen, ident.1 = rownames(data_Spleen[[]][data_Spleen$orig.ident == "ATAC_SpleenD1",]), ident.2 = rownames(data_Spleen[[]][data_Spleen$orig.ident == "ATAC_SpleenD7",]), min.pct = 0.05, test.use = 'LR', latent.vars = 'peak_region_fragments' )

I get the object da_peaks_Spleen.

The next step I do is to find the closest features using the following code -

ClosestFeature(data_Spleen, regions = rownames(da_peaks_Spleen))

This returns me the following error message:

Warning message:
In ClosestFeature(data_Spleen, regions = rownames(da_peaks_Spleen)) :
The following seqlevels present in query regions are not present
in the supplied gene annotations and will be removed: GL456216.1, JH584295.1, JH584304.1

I get the same error message (as above) after each of the codes when I run the following codes to extract gene_name and gene_biotype

gene <- ClosestFeature(data_Spleen, regions = rownames(da_peaks_Spleen))$gene_name

biotype <- ClosestFeature(data_Spleen, regions = rownames(da_peaks_Spleen))$gene_biotype

Although I get a list of genes and biotypes, the main problem comes when I try to add this gene or biotype to the da_peaks_Spleen object using this code-

da_peaks_Spleen$gene <- gene

Error in $<-.data.frame(*tmp*, gene, value = c("Gm20546", "Gm26917", :
replacement has 2282 rows, data has 2288

I also get the same error when I am trying to add Biotype and other features.

sessionInfo()

R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8 LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8

attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] tibble_3.1.8 EnhancedVolcano_1.16.0 ggrepel_0.9.2
[4] MAST_1.24.1 SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0
[7] MatrixGenerics_1.10.0 matrixStats_0.63.0 patchwork_1.1.2
[10] ggplot2_3.4.0 EnsDb.Mmusculus.v79_2.99.0 ensembldb_2.22.0
[13] AnnotationFilter_1.22.0 GenomicFeatures_1.50.3 AnnotationDbi_1.60.0
[16] Biobase_2.58.0 GenomicRanges_1.50.2 GenomeInfoDb_1.34.6
[19] IRanges_2.32.0 S4Vectors_0.36.1 BiocGenerics_0.44.0
[22] hdf5r_1.3.8 Signac_1.9.0 SeuratObject_4.1.3
[25] Seurat_4.3.0

loaded via a namespace (and not attached):
[1] utf8_1.2.2 spatstat.explore_3.0-5 reticulate_1.27
[4] tidyselect_1.2.0 RSQLite_2.2.20 htmlwidgets_1.6.1
[7] grid_4.2.2 BiocParallel_1.32.5 Rtsne_0.16
[10] devtools_2.4.5 munsell_0.5.0 codetools_0.2-18
[13] ragg_1.2.5 ica_1.0-3 interp_1.1-3
[16] future_1.31.0 miniUI_0.1.1.1 withr_2.5.0
[19] spatstat.random_3.1-3 colorspace_2.1-0 progressr_0.13.0
[22] filelock_1.0.2 knitr_1.42 rstudioapi_0.14
[25] ROCR_1.0-11 tensor_1.5 listenv_0.9.0
[28] labeling_0.4.2 GenomeInfoDbData_1.2.9 polyclip_1.10-4
[31] bit64_4.0.5 farver_2.1.1 parallelly_1.34.0
[34] vctrs_0.5.2 generics_0.1.3 xfun_0.37
[37] biovizBase_1.46.0 BiocFileCache_2.6.0 R6_2.5.1
[40] bitops_1.0-7 spatstat.utils_3.0-1 cachem_1.0.6
[43] DelayedArray_0.24.0 assertthat_0.2.1 promises_1.2.0.1
[46] BiocIO_1.8.0 scales_1.2.1 nnet_7.3-18
[49] gtable_0.3.1 globals_0.16.2 processx_3.8.0
[52] goftest_1.2-3 rlang_1.0.6 systemfonts_1.0.4
[55] RcppRoll_0.3.0 splines_4.2.2 rtracklayer_1.58.0
[58] lazyeval_0.2.2 dichromat_2.0-0.1 checkmate_2.1.0
[61] spatstat.geom_3.0-5 BiocManager_1.30.19 yaml_2.3.7
[64] reshape2_1.4.4 abind_1.4-5 backports_1.4.1
[67] httpuv_1.6.8 Hmisc_4.7-2 usethis_2.1.6
[70] tools_4.2.2 ellipsis_0.3.2 RColorBrewer_1.1-3
[73] sessioninfo_1.2.2 ggridges_0.5.4 Rcpp_1.0.10
[76] plyr_1.8.8 base64enc_0.1-3 progress_1.2.2
[79] zlibbioc_1.44.0 purrr_1.0.1 RCurl_1.98-1.9
[82] ps_1.7.2 prettyunits_1.1.1 rpart_4.1.19
[85] deldir_1.0-6 pbapply_1.7-0 urlchecker_1.0.1
[88] cowplot_1.1.1 zoo_1.8-11 cluster_2.1.4
[91] fs_1.6.0 magrittr_2.0.3 data.table_1.14.6
[94] scattermore_0.8 lmtest_0.9-40 RANN_2.6.1
[97] ProtGenerics_1.30.0 fitdistrplus_1.1-8 pkgload_1.3.2
[100] hms_1.1.2 mime_0.12 xtable_1.8-4
[103] XML_3.99-0.13 jpeg_0.1-10 gridExtra_2.3
[106] compiler_4.2.2 biomaRt_2.54.0 KernSmooth_2.23-20
[109] crayon_1.5.2 htmltools_0.5.4 later_1.3.0
[112] Formula_1.2-4 tidyr_1.3.0 DBI_1.1.3
[115] dbplyr_2.3.0 MASS_7.3-58.1 rappdirs_0.3.3
[118] Matrix_1.5-4 cli_3.6.0 parallel_4.2.2
[121] igraph_1.3.5 pkgconfig_2.0.3 GenomicAlignments_1.34.0
[124] foreign_0.8-83 sp_1.6-0 plotly_4.10.1
[127] spatstat.sparse_3.0-0 xml2_1.3.3 XVector_0.38.0
[130] VariantAnnotation_1.44.0 stringr_1.5.0 callr_3.7.3
[133] digest_0.6.31 sctransform_0.3.5 RcppAnnoy_0.0.20
[136] spatstat.data_3.0-0 Biostrings_2.66.0 leiden_0.4.3
[139] fastmatch_1.1-3 htmlTable_2.4.1 uwot_0.1.14
[142] restfulr_0.0.15 curl_5.0.0 shiny_1.7.4
[145] Rsamtools_2.14.0 rjson_0.2.21 lifecycle_1.0.3
[148] nlme_3.1-160 jsonlite_1.8.4 BSgenome_1.66.2
[151] viridisLite_0.4.1 fansi_1.0.4 pillar_1.8.1
[154] lattice_0.20-45 KEGGREST_1.38.0 fastmap_1.1.0
[157] httr_1.4.4 pkgbuild_1.4.0 survival_3.4-0
[160] remotes_2.4.2 glue_1.6.2 png_0.1-8
[163] bit_4.0.5 profvis_0.3.7 stringi_1.7.12
[166] blob_1.2.3 textshaping_0.3.6 latticeExtra_0.6-30
[169] memoise_2.0.1 dplyr_1.0.10 irlba_2.3.5.1
[172] future.apply_1.10.0

@KshitijDeoghar KshitijDeoghar added the bug Something isn't working label Feb 14, 2023
@timoast
Copy link
Collaborator

timoast commented Mar 30, 2023

This returns me the following error message:

Warning message:
In ClosestFeature(data_Spleen, regions = rownames(da_peaks_Spleen)) :
The following seqlevels present in query regions are not present
in the supplied gene annotations and will be removed: GL456216.1, JH584295.1, JH584304.1

Ok, this is not an error but a warning saying that some seqlevels in the query regions are not present in the gene annotations. You cannot find a closest feature for these since they're on a chromosome or scaffold that you don't have information for

Although I get a list of genes and biotypes, the main problem comes when I try to add this gene or biotype to the da_peaks_Spleen object using this code-

da_peaks_Spleen$gene <- gene

Since some query regions are removed (the warning message above), there are rows missing from the output dataframe, so you can't just try to assign a column from that dataframe directly to the differential accessibility results. They have different length.

I wouldn't say this is a bug per se, but we could change the ClosestFeature function to still return values for regions that are not present in the annotations, putting NA.

@KshitijDeoghar
Copy link
Author

@timoast Thank you for the explanation. It is reassuring to know that it is not a bug, but could be modified to make the ClosestFeature returning NA values.

For the time being, I have used tibble to add the gene and biotype to the da_peaks object by merging them.

library('tibble')

da_peaks_Spleen <- da_peaks_Spleen %>% rownames_to_column(var="query_region")

da_peaks_Spleen <- merge(da_peaks_Spleen, features_data_Spleen, by = c("query_region"))

This approach works, but if ClosestFeature function is modified, it might be a bit more efficient.

Thanks a lot.

@timoast timoast added enhancement New feature or request and removed bug Something isn't working labels Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants