Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FindVariableFeatures uses incorrect binning with selection.method = "mvp" #4712

Closed
fjames003 opened this issue Jun 30, 2021 · 1 comment
Closed
Labels
bug Something isn't working

Comments

@fjames003
Copy link

The binning of data based on feature means is incorrect in FindVariableFeatures. The binning produced will always place the feature with the minimum mean in the first bin and only that feature. The remaining n - 1 features will then be distributed across the remaining bins. This results in that bin not having a valid dispersion value, resulting in a scaled dispersion value of NA, and of course all of the other bins are compromised as there are features that should be in bin 1 scattered about.

The fix for this is quite simple. The code in preprocessing.R should be:

# Original code
    data.x.breaks <- switch(
      EXPR = binning.method,
      'equal_width' = num.bin,
      'equal_frequency' = c(
        -1,
        quantile(
          x = feature.mean[feature.mean > 0],
          probs = seq.int(from = 0, to = 1, length.out = num.bin)
        )
      ),
      stop("Unknown binning method: ", binning.method)
    )
# Updated code
    data.x.breaks <- switch(
      EXPR = binning.method,
      'equal_width' = num.bin,
      'equal_frequency' = 
        quantile(
          x = feature.mean[feature.mean > 0],
          probs = seq.int(from = 0, to = 1, length.out = num.bin + 1)
        ),
      stop("Unknown binning method: ", binning.method)
    )

I have included a simple example to demonstrate this issue using SeuratData:

# Here is example demonstrating issue
data("pbmc3k")
variable_features <- FindVariableFeatures(pbmc_small, selection.method = "mvp", mean.cutoff = c(-Inf, Inf), dispersion.cutoff = c(-Inf, Inf), binning.method = "equal_frequency")
summary(variable_features@assays$RNA@meta.features$mvp.dispersion.scaled)

#   Output from summary:
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
#-1.7211 -0.7458 -0.1070  0.0000  0.5901  2.8493       1 

Session Info:
> sessioninfo()
Error in sessioninfo() : could not find function "sessioninfo"
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] pbmc3k.SeuratData_3.1.4 SeuratData_0.2.1 devtools_2.4.2 usethis_2.0.1
[5] matrixStats_0.59.0 dplyr_1.0.6 SeuratObject_4.0.1 Seurat_4.0.2

loaded via a namespace (and not attached):
[1] Rtsne_0.15 colorspace_2.0-1 deldir_0.2-10 ellipsis_0.3.2
[5] ggridges_0.5.3 rprojroot_2.0.2 fs_1.5.0 rstudioapi_0.13
[9] spatstat.data_2.1-0 leiden_0.3.8 listenv_0.8.0 remotes_2.4.0
[13] ggrepel_0.9.1 fansi_0.5.0 codetools_0.2-18 splines_4.1.0
[17] cachem_1.0.5 polyclip_1.10-0 pkgload_1.2.1 jsonlite_1.7.2
[21] ica_1.0-2 cluster_2.1.2 png_0.1-7 uwot_0.1.10
[25] shiny_1.6.0 sctransform_0.3.2 spatstat.sparse_2.0-0 compiler_4.1.0
[29] httr_1.4.2 Matrix_1.3-3 fastmap_1.1.0 lazyeval_0.2.2
[33] cli_2.5.0 later_1.2.0 htmltools_0.5.1.1 prettyunits_1.1.1
[37] tools_4.1.0 igraph_1.2.6 gtable_0.3.0 glue_1.4.2
[41] RANN_2.6.1 reshape2_1.4.4 rappdirs_0.3.3 Rcpp_1.0.6
[45] scattermore_0.7 vctrs_0.3.8 nlme_3.1-152 lmtest_0.9-38
[49] stringr_1.4.0 globals_0.14.0 ps_1.6.0 testthat_3.0.3
[53] mime_0.10 miniUI_0.1.1.1 lifecycle_1.0.0 irlba_2.3.3
[57] goftest_1.2-2 future_1.21.0 MASS_7.3-54 zoo_1.8-9
[61] scales_1.1.1 spatstat.core_2.1-2 promises_1.2.0.1 spatstat.utils_2.1-0
[65] parallel_4.1.0 RColorBrewer_1.1-2 curl_4.3.2 memoise_2.0.0
[69] reticulate_1.20 pbapply_1.4-3 gridExtra_2.3 ggplot2_3.3.3
[73] rpart_4.1-15 stringi_1.6.1 desc_1.3.0 pkgbuild_1.2.0
[77] rlang_0.4.11 pkgconfig_2.0.3 lattice_0.20-44 ROCR_1.0-11
[81] purrr_0.3.4 tensor_1.5 patchwork_1.1.1 htmlwidgets_1.5.3
[85] cowplot_1.1.1 tidyselect_1.1.1 processx_3.5.2 parallelly_1.26.0
[89] RcppAnnoy_0.0.18 plyr_1.8.6 magrittr_2.0.1 R6_2.5.0
[93] generics_0.1.0 pillar_1.6.1 withr_2.4.2 mgcv_1.8-35
[97] fitdistrplus_1.1-5 survival_3.2-11 abind_1.4-5 tibble_3.1.2
[101] future.apply_1.7.0 crayon_1.4.1 KernSmooth_2.23-20 utf8_1.2.1
[105] spatstat.geom_2.1-0 plotly_4.9.4 grid_4.1.0 data.table_1.14.0
[109] callr_3.7.0 digest_0.6.27 xtable_1.8-4 tidyr_1.1.3
[113] httpuv_1.6.1 munsell_0.5.0 viridisLite_0.4.0 sessioninfo_1.1.1

@fjames003 fjames003 added the bug Something isn't working label Jun 30, 2021
@andrewwbutler
Copy link
Collaborator

Hi @fjames003 ,

Thanks for the detailed report! This should be fixed now on the develop branch. Please see installation instructions here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants