Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group.by in DotPlot seems not working #158

Closed
jasonleongbio opened this issue Feb 19, 2024 · 6 comments
Closed

group.by in DotPlot seems not working #158

jasonleongbio opened this issue Feb 19, 2024 · 6 comments
Assignees
Labels
bug Something isn't working Not Working As Intended This doesn't seem right

Comments

@jasonleongbio
Copy link

Thank you for developing such a great tool and I really enjoy using scCustomize a lot.
I was trying to reproduce a "(multi-sample) data integration" example using SeuratV5 (and visualize the results using scCustomize).
I was basically following the tutorial by the Seurat team.
However, when I tried to visualize several selected differentially expressed genes in the control vs stimulated conditions, the DotPlot_scCustom function seems not to be able to show the two conditions separately, although the documentation says it can accept a group.by option.

library(Seurat)
library(SeuratData)
library(scCustomize)

# Reproducible example "ifnb" from SeuratData
options(timeout = 240)
InstallData("ifnb")
ifnb <- LoadData("ifnb")

# Split the data by the stimulation condition
ifnb[["RNA"]] <- split(ifnb[["RNA"]], f = ifnb$stim)

# Set the seed for reproducibility
set.seed(123)

# Normalization, etc.
ifnb <- NormalizeData(ifnb)
ifnb <- FindVariableFeatures(ifnb)
ifnb <- ScaleData(ifnb)
ifnb <- RunPCA(ifnb)

# Integration
ifnb <- IntegrateLayers(
    object = ifnb, 
    method = CCAIntegration, 
    orig.reduction = "pca", 
    new.reduction = "integrated.cca",
    verbose = TRUE)
ifnb[["RNA"]] <- JoinLayers(ifnb[["RNA"]])

# Identify clusters
ifnb <- FindNeighbors(ifnb, reduction = "integrated.cca", dims = 1:20)
ifnb <- FindClusters(ifnb, resolution = 1)
ifnb <- RunUMAP(ifnb, dims = 1:20, reduction = "integrated.cca")

# Visualize the 19 clusters
DimPlot_scCustom(
    ifnb, 
    reduction = "umap", 
    group.by = c("seurat_clusters", "stim"), 
    colors_use = DiscretePalette_scCustomize(
        num_colors = length(unique(Idents(ifnb))),
        palette = "varibow",
        shuffle = TRUE,
        seed = 4)
    ) & coord_fixed()

# Marker identification
# Not directly related here. Skip the code here.

Then, I tried to visualize the expression levels and percentage of expression of given genes using DotPlot_scCustom().

DotPlot_scCustom(
    seurat_object = ifnb, 
    features = c("HBB", "HBA1", "CD3D", "S100A9")
) 

This is just fine, and it will simply show a DotPlot of these selected genes in each identified cluster.
cluster_16_test_1

However, when I tried to visualize this for each condition separately, just like how the Seurat package does, it gave me the very same plot. I expected the y-axis now became "1_CTRL", "1_STIM", "2_CTRL", "2_STIM", etc., but what was shown instead were still 1, 2, 3, etc.

DotPlot_scCustom(
    seurat_object = ifnb, 
    features = c("HBB", "HBA1", "CD3D", "S100A9"),
    group.by = "stim"
)

In addition, I also tried with the option split.by, but it didn't work either.
I also tried to create a new column in the metadata that combines the cluster identity and the ctrl/stim conditions by the following code, and tried to set group.by to this newly created column. However, it didn't work either.

ifnb$seurat_clusters_stim <- paste0(ifnb$seurat_clusters, "_", ifnb$stim)
fnb_LayersJoined_plot@meta.data %>% head()
#                  orig.ident nCount_RNA nFeature_RNA stim seurat_annotations unintegrated.clusters seurat_clusters RNA_snn_res.1 seurat_clusters_stim
#AAACATACATTTCC.1 IMMUNE_CTRL       3017          877 CTRL          CD14 Mono                     0               3             3               3_CTRL
#AAACATACCAGAAA.1 IMMUNE_CTRL       2481          713 CTRL          CD14 Mono                     0               1             1               1_CTRL
#AAACATACCTCGCT.1 IMMUNE_CTRL       3420          850 CTRL          CD14 Mono                     0               3             3               3_CTRL
#AAACATACCTGGTA.1 IMMUNE_CTRL       3156         1109 CTRL                pDC                    17              14            14              14_CTRL
#AAACATACGATGAA.1 IMMUNE_CTRL       1868          634 CTRL       CD4 Memory T                     3               0             0               0_CTRL
#AAACATACGGCATT.1 IMMUNE_CTRL       1581          557 CTRL          CD14 Mono                     0               5             5               5_CTRL

DotPlot_scCustom(
    seurat_object = ifnb, 
    features = c("HBB", "HBA1", "CD3D", "S100A9"),
    group.by = "seurat_clusters_stim",
)

Therefore, these indicate the group.by or split.by options seem not to be working as expected.

However, surprisingly, the Clustered_DotPlot() function is fine.

Clustered_DotPlot(
    seurat_object = ifnb_LayersJoined_plot, 
    features = c("HBB", "HBA1", "CD3D", "S100A9"),
    group.by = "seurat_clusters_stim",
)

ClusterPlot

I'm pleased to provide further information if needed.
Thank you so much in advance!

Jason.

sessionInfo() output
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Asia/Tokyo
tzcode source: internal

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] panc8.SeuratData_3.0.2 ifnb.SeuratData_3.1.0  SeuratData_0.2.2.9001  scCustomize_2.0.1      harmony_1.2.0          Rcpp_1.0.12           
 [7] SeuratDisk_0.0.0.9021  Seurat_5.0.1           SeuratObject_5.0.1     sp_2.1-3               colorspace_2.1-0       RColorBrewer_1.1-3    
[13] viridis_0.6.5          viridisLite_0.4.2      cowplot_1.1.3          glue_1.7.0             lubridate_1.9.3        forcats_1.0.0         
[19] purrr_1.0.2            tibble_3.2.1           ggplot2_3.4.4          tidyverse_2.0.0        stringr_1.5.1          dplyr_1.1.4           
[25] tidyr_1.3.1            readr_2.1.5            renv_1.0.3            

loaded via a namespace (and not attached):
  [1] RcppAnnoy_0.0.22       splines_4.3.1          later_1.3.2            prismatic_1.1.1        polyclip_1.10-6        janitor_2.2.0         
  [7] fastDummies_1.7.3      lifecycle_1.0.4        Rdpack_2.6             doParallel_1.0.17      globals_0.16.2         lattice_0.22-5        
 [13] hdf5r_1.3.9            MASS_7.3-60.0.1        magrittr_2.0.3         limma_3.56.2           plotly_4.10.4          plotrix_3.8-4         
 [19] qqconf_1.3.2           httpuv_1.6.14          sn_2.1.1               sctransform_0.4.1      spam_2.10-0            spatstat.sparse_3.0-3 
 [25] reticulate_1.35.0      pbapply_1.7-2          multcomp_1.4-25        abind_1.4-5            Rtsne_0.17             presto_1.0.0          
 [31] BiocGenerics_0.48.1    TH.data_1.1-2          sandwich_3.1-0         rappdirs_0.3.3         circlize_0.4.15        S4Vectors_0.40.2      
 [37] IRanges_2.36.0         ggrepel_0.9.5          irlba_2.3.5.1          listenv_0.9.1          spatstat.utils_3.0-4   TFisher_0.2.0         
 [43] goftest_1.2-3          RSpectra_0.16-1        spatstat.random_3.2-2  fitdistrplus_1.1-11    parallelly_1.36.0      leiden_0.4.3.1        
 [49] codetools_0.2-19       tidyselect_1.2.0       shape_1.4.6            farver_2.1.1           stats4_4.3.1           matrixStats_1.2.0     
 [55] spatstat.explore_3.2-6 mathjaxr_1.6-0         jsonlite_1.8.8         GetoptLong_1.0.5       multtest_2.56.0        ellipsis_0.3.2        
 [61] progressr_0.14.0       iterators_1.0.14       ggridges_0.5.6         survival_3.5-7         systemfonts_1.0.5      foreach_1.5.2         
 [67] ragg_1.2.7             ica_1.0-3              mnormt_2.1.1           gridExtra_2.3          metap_1.9              numDeriv_2016.8-1.1   
 [73] withr_3.0.0            fastmap_1.1.1          fansi_1.0.6            digest_0.6.34          timechange_0.3.0       R6_2.5.1              
 [79] mime_0.12              ggprism_1.0.4          textshaping_0.3.7      scattermore_1.2        tensor_1.5             spatstat.data_3.0-4   
 [85] utf8_1.2.4             generics_0.1.3         data.table_1.15.0      httr_1.4.7             htmlwidgets_1.6.4      uwot_0.1.16           
 [91] pkgconfig_2.0.3        gtable_0.3.4           ComplexHeatmap_2.16.0  lmtest_0.9-40          htmltools_0.5.7        dotCall64_1.1-1       
 [97] clue_0.3-65            Biobase_2.62.0         scales_1.3.0           png_0.1-8              snakecase_0.11.1       rstudioapi_0.15.0     
[103] rjson_0.2.21           tzdb_0.4.0             reshape2_1.4.4         nlme_3.1-164           zoo_1.8-12             GlobalOptions_0.1.2   
[109] KernSmooth_2.23-22     parallel_4.3.1         miniUI_0.1.1.1         vipor_0.4.7            ggrastr_1.0.2          pillar_1.9.0          
[115] grid_4.3.1             vctrs_0.6.5            RANN_2.6.1             promises_1.2.1         xtable_1.8-4           cluster_2.1.6         
[121] beeswarm_0.4.0         paletteer_1.6.0        mvtnorm_1.2-4          cli_3.6.2              compiler_4.3.1         rlang_1.1.3           
[127] crayon_1.5.2           mutoss_0.1-13          future.apply_1.11.1    labeling_0.4.3         rematch2_2.1.2         plyr_1.8.9            
[133] ggbeeswarm_0.7.2       stringi_1.8.3          deldir_2.0-2           munsell_0.5.0          lazyeval_0.2.2         spatstat.geom_3.2-8   
[139] Matrix_1.6-5           RcppHNSW_0.6.0         hms_1.1.3              patchwork_1.2.0        bit64_4.0.5            future_1.33.1         
[145] shiny_1.8.0            rbibutils_2.2.16       ROCR_1.0-11            igraph_2.0.1.1         bit_4.0.5
@samuel-marsh
Copy link
Owner

Hi Jason,

Thanks for kind words and detailed issue report!! You are correct group.by is definitely not functioning correctly and I will work on that. I'm in the middle of CRAN submission for v2.1.0 so once that is settled I'll work on that fix.

In terms of split.by as part of v2.1.0 is the support in Clustered_DotPlot for split.by and group.by in the same plot while maintaining the expression information (which Seurat does not).

image

You can check that out in the "develop" branch and hopefully full release on CRAN in next few days.

Once the CRAN release goes through I'll work on the DotPlot_scCustom fix and post here when it's ready.

Best,
Sam

@samuel-marsh samuel-marsh self-assigned this Feb 19, 2024
@samuel-marsh samuel-marsh added bug Something isn't working Not Working As Intended This doesn't seem right labels Feb 19, 2024
@samuel-marsh
Copy link
Owner

Hi Jason,

The fix for this is now live in develop branch v2.1.0 and will be part of version 2.1.0 official release (hopefully approved later this week. If you have any issues after updating to develop branch or after official release please let me know and I'll reopen the issue here.

Thanks!
Sam

@jasonleongbio
Copy link
Author

Hi Sam @samuel-marsh ,

Thank you so much for the bug fix!
I have updated the package (using version 2.1.1 downloaded from CRAN), and tried to plot the figure again.

It seems that now the option group.by works, if I utilize the info in the "seurat_clusters_stim" column, which I created to combine the information from two separate columns "seurat_clusters" and "stim".

# scCustomize version 2.1.1
DotPlot_scCustom(
    seurat_object = ifnb, 
    features = c("HBB", "HBA1", "CD3D", "S100A9"),
    group.by = "seurat_clusters_stim",
)

markers clusters_per_stim

  • As a reference for other interested users: The order in the y-axis is a bit weird. Perhaps the factor level needs to be reset before generating the plot.

I've also tried to group.by other columns, and it worked.

However, group.by seems to only accept one column at a time. According to the documentation, the expected behavior of the option is

group.by
Name of one or more metadata columns to group (color) cells by (for example, orig.ident); default is the current active.ident of the object.

This line of code would return an error message

DotPlot_scCustom(
    seurat_object = ifnb, 
    features = c("HBB", "HBA1", "CD3D", "S100A9"),
    group.by = c("seurat_clusters", "stim"),
)
# Error in !is.null(x = group.by) && group.by != "ident" : 
#  'length = 2' in coercion to 'logical(1)'

I suggest either an example to generate an additional column (in the meta.data) to store the info from multiple columns should be added to the documentation or a fix to accept more than one column name would be needed.

Best,
Jason

@jasonleongbio
Copy link
Author

Hi Sam @samuel-marsh,

Sorry for the additional message but I'm just wondering if you have checked my follow-up post because this issue has already been set to Closed.

In brief, in the current version, it seems that the group.by option does not accept more than one metadata column name, which is not identical to what the documentation says. Therefore, it seems that the issue has not been completely fixed at the moment.

Thanks so much in advance.

Best,
Jason.

@samuel-marsh
Copy link
Owner

Hi @jasonleongbio,

I did see the original message but had not yet had time to respond (one person team here). After looking at it more closely you are right and the documentation is indeed wrong and I have just updated that in the develop branch (v2.1.2.9003).

If you do indeed want to plot two meta data variables you can use Clustered_DotPlot and specify both group.by and split.by parameters or create combined column as shown above).

Best,
Sam

@jasonleongbio
Copy link
Author

Hi Sam @samuel-marsh,

Thanks so much for the update! I have to apologize as I didn't mean to urge you on the issue, but I was just worried whether the message reached your mailbox or not because the issue has been set to Closed.

I do agree that the Clustered_Dotplot() function could be a better alternative (or even replacement) for analyzing multiple conditions. Thanks so much for developing this in the package!

Best,
Jason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Not Working As Intended This doesn't seem right
Projects
None yet
Development

No branches or pull requests

2 participants