running gpsFISH without spatial count and spatial cluster #18

justin512lee · 2023-03-29T17:52:28Z

Hi,

Is it possible to run gpsFISH without spatial count and spatial cluster datasets (or can it run with spatial count and cluster datasets provided in the tutorial)? I am working with the datasets from this paper and have sc_counts and sc_clusters. Any help would be appreciated!

YidaZhang0628 · 2023-03-30T00:39:28Z

Can you elaborate a bit more on what you plan to do? If you want to design a gene panel for MERFISH/osmFISH/DARTFISH, you can run gpsFISH with only single cell RNA-seq because we have already trained Bayesian models for those three platforms. If you are designing gene panels for other platforms and you don't have spatial data, you can run gpsFISH without simulation. This way, gene panels will be evaluated using scRNA-seq data only.

justin512lee · 2023-03-30T19:51:22Z

Hi @YidaZhang0628. Thank you for your quick response. I want to design a gene panel for MERFISH. So I need to run the Platform effect estimation tutorial to get simulation_params and then complete the Gene Panel Selection tutorial using my own scRNA-seq data. Is this correct?

YidaZhang0628 · 2023-03-30T20:26:04Z

If you have scRNA-seq data and MERFISH data from your target tissue, then that will be the suggestion. But if you don't, we have a pre-trained Bayesian model for MERFISH. You can find it here. With the pre-trained model, you can skip the platform effect estimation step and directly run gene panel selection.

justin512lee · 2023-03-30T22:02:45Z

Thank you for your help. I was able to run everything except the gpsFISH_optimize function. I'm getting this message: Error in plogis(x) : Non-numeric argument to mathematical function

Thanks again for your help.

pakiessling · 2023-03-31T10:14:14Z

@YidaZhang0628 do you recommend using the pre-trained models even for tissues not in the training dataset (i.e. non mouse hippocampus for MERFISH)? Do the encoded platform effects generalize well?

YidaZhang0628 · 2023-03-31T13:47:24Z

@justin512lee Can you provide a bit more information? Are you getting this error when running the tutorial or on your own data? The first thing I would check is to make sure all the input files have the same data type and format as in the tutorial. If the error is still there, can you provide your code and a small dataset so that I can reproduce the error?

YidaZhang0628 · 2023-03-31T13:49:23Z

@pakiessling This is something we are planning to check but right now, we don't know how generalizable the platform effects are for the same genes in the same platform but across different tissues. Therefore, if you can find paired scRNA-seq data and spatial data in your target tissue, it is still recommended to train the Bayesian model first.

Boehmin · 2024-01-31T17:39:46Z

I have snRNA-seq data and no spatial data so my use case falls into a similar category as above I believe. Until now everything ran reasonably smoothly following the tutorial. I am stuck at the gpsFISH_optimize step.

The below is my error message regarding NA in the probability vector.

GA = gpsFISH_optimize(earlyterm = 10,
+                       converge.cutoff = 0.01,
+                       n = dim(my_sc_count)[1],
+                       k = panel_size,
+                       ngen = 10,
+                       popsize = pop_size,
+                       verbose = 1,
+                       cluster = 40, # number of cores
+                       initpop = initpop,
+                       method = "NaiveBayes",
+                       metric = "Accuracy",
+                       nCV = 5,
+                       rate = 1,
+                       cluster_size_max = 50,
+                       cluster_size_min = 30,
+                       two_step_sampling_type = c("Subsampling_by_cluster", "Simulation"),
+                       simulation_model = "ZINB",
+                       sample_new_levels = "old_levels",
+                       use_average_cluster_profiles = FALSE,
+                       save.intermediate = TRUE,
+                       full_count_table = as.data.frame(t(my_sc_count)),
+                       cell_cluster_conversion = my_sc_cluster,       
+                       relative_prop = relative_prop,
+                       simulation_parameter = sim_parameters,
+                       gene2include.id = gene2include.id,
+                       gene.weight = gene.weight,
+                       weight_penalty = weight_penalty
+ )
Initial population best OF value =  0.5831973 . Population diversity =  75.98061 
Error in sample.int(length(x), size, replace, prob) : 
  NA in probability vector

I use the same values for pop_size, panel_size etc. as in the tutorial. The only difference is that I use our data for sc_count and sc_cluster. I tried to make sure the data is formatted as expected throughout.

Have you encountered this error previously? Does this have to do with the sim_parameters? Or my data? Thanks for your help in advance!

YidaZhang0628 · 2024-02-01T08:48:50Z

I have snRNA-seq data and no spatial data so my use case falls into a similar category as above I believe. Until now everything ran reasonably smoothly following the tutorial. I am stuck at the gpsFISH_optimize step.

The below is my error message regarding NA in the probability vector.

GA = gpsFISH_optimize(earlyterm = 10,
+                       converge.cutoff = 0.01,
+                       n = dim(my_sc_count)[1],
+                       k = panel_size,
+                       ngen = 10,
+                       popsize = pop_size,
+                       verbose = 1,
+                       cluster = 40, # number of cores
+                       initpop = initpop,
+                       method = "NaiveBayes",
+                       metric = "Accuracy",
+                       nCV = 5,
+                       rate = 1,
+                       cluster_size_max = 50,
+                       cluster_size_min = 30,
+                       two_step_sampling_type = c("Subsampling_by_cluster", "Simulation"),
+                       simulation_model = "ZINB",
+                       sample_new_levels = "old_levels",
+                       use_average_cluster_profiles = FALSE,
+                       save.intermediate = TRUE,
+                       full_count_table = as.data.frame(t(my_sc_count)),
+                       cell_cluster_conversion = my_sc_cluster,       
+                       relative_prop = relative_prop,
+                       simulation_parameter = sim_parameters,
+                       gene2include.id = gene2include.id,
+                       gene.weight = gene.weight,
+                       weight_penalty = weight_penalty
+ )
Initial population best OF value =  0.5831973 . Population diversity =  75.98061 
Error in sample.int(length(x), size, replace, prob) : 
  NA in probability vector

I use the same values for pop_size, panel_size etc. as in the tutorial. The only difference is that I use our data for sc_count and sc_cluster. I tried to make sure the data is formatted as expected throughout.

Have you encountered this error previously? Does this have to do with the sim_parameters? Or my data? Thanks for your help in advance!

Do you mind providing me the dimensions of your scRNA-seq data and also the number of cell types?

Boehmin · 2024-02-01T09:22:09Z

Thanks for such a quick reply!
These are my dimensions/nr of cells/nuclei, I have 18 class_labels:

> dim(my_sc_cluster)
[1] 81389     2
> dim(my_sc_count)
[1]  1356 81389

EDIT:
I found the error!

I already had to deal with NAs when normalising over the sum(weight.list), I did the following:
weight.list = weight.list / sum(weight.list, na.rm = TRUE)
I was still left with one NA that I did not find until just now (I assumed the above trick took care of it, my bad). I had followed the tutorial by censoring the probe count, using the probe_count file that came with the gpsFISH package. The NA appeared only after the normalisation. I now replaced the value with a value that occurred most regularly for other targets and was able to run the rest of the code without any issues at all! Thank you.

simomounir · 2024-02-05T13:55:45Z

I think this can go under the same issue above:

I have my own single-cell refence data (breast cancer dataset, ~70k cell) and I have my spatial data based on sm-FISH (from breast cancer tissues as well) with a gene expression matrix per segmented cell. How can I use gpsFISH to perform label transfer from reference single-cell data to spatial data? I see some confusion matrices, but they were calculated from predictions of cell types of single-cell data right? Should I also skip the Platform-estimation phase?

If there is any devised method to follow, let me know. Thanks in advance!

YidaZhang0628 · 2024-02-06T09:09:36Z

Thanks for such a quick reply! These are my dimensions/nr of cells/nuclei, I have 18 class_labels:
> dim(my_sc_cluster)
[1] 81389     2
> dim(my_sc_count)
[1]  1356 81389
EDIT: I found the error!

I already had to deal with NAs when normalising over the sum(weight.list), I did the following:
weight.list = weight.list / sum(weight.list, na.rm = TRUE)

I was still left with one NA that I did not find until just now (I assumed the above trick took care of it, my bad). I had followed the tutorial by censoring the probe count, using the probe_count file that came with the gpsFISH package. The NA appeared only after the normalisation. I now replaced the value with a value that occurred most regularly for other targets and was able to run the rest of the code without any issues at all! Thank you.

Thank you for letting me know! I was on vacation for the past few days and couldn't get back to you. I am happy to know that the problem has been solved. Please don't hesitate to reach out if you have any further questions.

One thing I noticed is that you mentioned you are using the probe_count file in the gpsFISH package. That information is specific to DARTFISH. Therefore, if you are designing your gene panel for a different technology (like MERFISH), that probe_count file is likely to be non-informative because the probe design mechanisms used by different technologies are different. I just want to point this out in case this information is relevant.

simomounir · 2024-02-06T09:12:00Z

Hi there,

Let me know if there is any update on this #18 (comment)

Thanks a lot!

YidaZhang0628 · 2024-02-06T09:18:50Z

I think this can go under the same issue above:

I have my own single-cell refence data (breast cancer dataset, ~70k cell) and I have my spatial data based on sm-FISH (from breast cancer tissues as well) with a gene expression matrix per segmented cell. How can I use gpsFISH to perform label transfer from reference single-cell data to spatial data? I see some confusion matrices, but they were calculated from predictions of cell types of single-cell data right? Should I also skip the Platform-estimation phase?

If there is any devised method to follow, let me know. Thanks in advance!

@simomounir To make sure I understand your goal correctly, you have scRNA-seq data that is already annotated and you have smFISH data that is not annotated. You want to annotate the smFISH data using your scRNA-seq data, is that correct? If yes, then there are many methods for this task, for example, label transfer from Seurat. What gpsFISH does is to select gene panels if you want to design a new spatial transcriptomics experiment. Both the platform effect estimation and gene panel selection steps rely on data with known labels. For example, the platform effect estimation step requires both input scRNA-seq and spatial data to be annotated. The gene panel selection step requires the input scRNA-seq data to be annotated. Let me know if this addresses your question.

simomounir · 2024-02-06T09:22:25Z

I think this can go under the same issue above:
I have my own single-cell refence data (breast cancer dataset, ~70k cell) and I have my spatial data based on sm-FISH (from breast cancer tissues as well) with a gene expression matrix per segmented cell. How can I use gpsFISH to perform label transfer from reference single-cell data to spatial data? I see some confusion matrices, but they were calculated from predictions of cell types of single-cell data right? Should I also skip the Platform-estimation phase?
If there is any devised method to follow, let me know. Thanks in advance!

@simomounir To make sure I understand your goal correctly, you have scRNA-seq data that is already annotated and you have smFISH data that is not annotated. You want to annotate the smFISH data using your scRNA-seq data, is that correct? If yes, then there are many methods for this task, for example, label transfer from Seurat. What gpsFISH does is to select gene panels if you want to design a new spatial transcriptomics experiment. Both the platform effect estimation and gene panel selection steps rely on data with known labels. For example, the platform effect estimation step requires both input scRNA-seq and spatial data to be annotated. The gene panel selection step requires the input scRNA-seq data to be annotated. Let me know if this addresses your question.

Yes indeed, that addresses my question perfectly. I was trying to see if I can harvest gpsFISH functionalities to predict unknown cell labels. I have a labeled smFISH dataset though, I was just trying to test the prediction module.

Thanks a lot though for your prompt answer! I will be using gpsFISH for designing some experiments.

evaknichols · 2024-02-18T21:29:57Z

Hello! Thank you for producing this tool, which I think has great potential for reviving an earlier thread in my research. I'm so excited to get it working! I am posting here because I think my question also falls under this open issue, and perhaps its discussion can also help others.

I'm working with a subset of snRNA-seq data from the Mouse Organogenesis Cell Atlas (MOCA) here. It has 265124 cells and 24552 genes. For now, I have no spatial transcriptomics dataset that I want to use; I simply want to see what selected gene panel looks like for the 11 main trajectory clusters (and later the 39 major cell types) so I can design FISH experiments.

I was able to reproduce code from the gpsFISH paper seamlessly with the posted .RMD file (which was fantastic!), so I started to follow along with my subset of snRNA-seq data. I skipped (for now) nonessential steps like constructing the weighted penalty matrix, including specific genes that might have been filtered out, and including information about available probes. After filtering genes, 829 remain (which feels really low, but I move on for now...).

Final dimensions of my sc_count are 829 x 265124. Calculating DEGs for each cell type worked fine; I started having problems with the gpsFISH_optimize step. I used default params as in the example .RMD file, except I set gene2include.id, gene.weight, and weight_penalty to NULL.

After running for about a full ten seconds, this error message popped up:
Error in plogis(x) : Non-numeric argument to mathematical function

When googling this error, it seems to happen when a mathematical function is applied to non-numerical data. I figured the problem might be with my own data since the code worked fine in the demo .RMD file. I noticed that my sc_count object was a sparse rather than dense matrix (and earlier throughout I had gotten warnings speaking to this), so perhaps that was causing a problem? Especially since in the Platform Effects demo code it looked as though that was a dense matrix. However, converting it to a dense object produced the same error. I next verified that no NAs were found in my sc_count or sc_cluster objects, and that rownames of the full_count_table matched that of sc_cluster.

Is this a familiar error, and might you have recommendations on how I can correct it? Thank you! I'm hopeful that, despite the size and complexity of the MOCA data, I can use this tool to design cool FISH experiments.

Below I include my output of sessionInfo().

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] monocle3_1.2.9 SingleCellExperiment_1.18.0 SummarizedExperiment_1.26.1 GenomicRanges_1.48.0
[5] GenomeInfoDb_1.32.3 IRanges_2.30.1 S4Vectors_0.34.0 MatrixGenerics_1.8.1
[9] matrixStats_1.2.0 Biobase_2.56.0 BiocGenerics_0.42.0 cowplot_1.1.1
[13] dplyr_1.1.4 ggdendro_0.1.23 SeuratObject_4.1.3 Seurat_4.3.0
[17] reshape2_1.4.4 pheatmap_1.0.12 boot_1.3-28 splitTools_1.0.1
[21] naivebayes_0.9.7 pROC_1.18.5 caret_6.0-94 lattice_0.20-45
[25] ggplot2_3.4.4 ranger_0.16.0 pagoda2_1.0.11 igraph_1.5.1
[29] Matrix_1.5-3 data.table_1.14.10 gpsFISH_0.1.0

loaded via [1] utf8_1.2.4 [6] tidyselect_1.2.0 [11] Rtsne_0.17 [16] miniUI_0.1.1.1 [21] knitr_1.42 [26] urltools_1.7.3 [31] parallelly_1.36.0 [36] timechange_0.2.0 [41] cachem_1.0.7 [46] globals_0.16.2 [51] splines_4.2.1 [56] yaml_2.3.7 [61] sccore_1.0.4 [66] plyr_1.8.9 [71] rpart_4.1.16 [76] cluster_2.1.4 [81] triebeard_0.4.1 [86] mime_0.12 [91] gridExtra_2.3 [96] minqa_1.2.4 [101] tidyr_1.3.0 [106] R.methodsS3_1.8.2 [111] plotly_4.10.1.9000 [116] XVector_0.36.0 [121] RcppAnnoy_0.0.21 [126] leiden_0.4.3 [131] rjson_0.2.21 [136] fansi_1.0.6 [141] httr_1.4.5 [146] bit_4.0.4 [151] irlba_2.3.5.1 a namespace (and not attached):
spatstat.explore_3.0-6 reticulate_1.28 R.utils_2.12.3 lme4_1.1-30
RSQLite_2.2.16 AnnotationDbi_1.58.0 htmlwidgets_1.6.1 grid_4.2.1
munsell_0.5.0 codetools_0.2-18 ica_1.0-3 future_1.33.1
withr_3.0.0 spatstat.random_3.1-3 colorspace_2.1-0 progressr_0.14.0
rstudioapi_0.14 ROCR_1.0-11 tensor_1.5 listenv_0.9.1
GenomeInfoDbData_1.2.8 polyclip_1.10-4 topGO_2.48.0 bit64_4.0.5
vctrs_0.6.5 generics_0.1.3 ipred_0.9-14 xfun_0.37
R6_2.5.1 DelayedArray_0.22.0 bitops_1.0-7 spatstat.utils_3.0-1
promises_1.2.0.1 scales_1.3.0 nnet_7.3-17 gtable_0.3.4
goftest_1.2-3 drat_0.2.4 timeDate_4032.109 rlang_1.1.3
lazyeval_0.2.2 ModelMetrics_1.2.2.2 spatstat.geom_3.0-6 brew_1.0-10
abind_1.4-5 httpuv_1.6.9 tools_4.2.1 lava_1.7.3
ellipsis_0.3.2 RColorBrewer_1.1-3 ggridges_0.5.6 Rcpp_1.0.11
zlibbioc_1.42.0 RCurl_1.98-1.8 purrr_1.0.2 dendsort_0.3.4
deldir_1.0-6 pbapply_1.7-0 zoo_1.8-11 ggrepel_0.9.4
magrittr_2.0.3 scattermore_0.8 SparseM_1.81 lmtest_0.9-40
RANN_2.6.1 fitdistrplus_1.1-8 pkgload_1.3.0 patchwork_1.1.2
evaluate_0.20 xtable_1.8-4 RMTstat_0.3.1 N2R_1.0.1
compiler_4.2.1 tibble_3.2.1 KernSmooth_2.23-20 crayon_1.5.2
R.oo_1.26.0 htmltools_0.5.4 mgcv_1.8-40 later_1.3.0
lubridate_1.9.3 DBI_1.1.3 MASS_7.3-58.1 cli_3.6.2
gower_1.0.1 pkgconfig_2.0.3 sp_1.6-0 terra_1.6-7
spatstat.sparse_3.0-0 recipes_1.0.9 foreach_1.5.2 hardhat_1.3.1
prodlim_2023.08.28 stringr_1.5.1 digest_0.6.33 sctransform_0.3.5
graph_1.74.0 spatstat.data_3.0-0 Biostrings_2.64.1 rmarkdown_2.20
Rook_1.2 uwot_0.1.16 shiny_1.7.4 nloptr_2.0.3
lifecycle_1.0.4 nlme_3.1-159 jsonlite_1.8.8 viridisLite_0.4.2
pillar_1.9.0 GO.db_3.15.0 KEGGREST_1.36.3 fastmap_1.1.1
survival_3.4-0 glue_1.6.2 png_0.1-8 iterators_1.0.14
class_7.3-20 stringi_1.8.3 blob_1.2.3 memoise_2.0.1
future.apply_1.11.1

YidaZhang0628 · 2024-02-19T08:12:47Z

Hello! Thank you for producing this tool, which I think has great potential for reviving an earlier thread in my research. I'm so excited to get it working! I am posting here because I think my question also falls under this open issue, and perhaps its discussion can also help others.

I'm working with a subset of snRNA-seq data from the Mouse Organogenesis Cell Atlas (MOCA) here. It has 265124 cells and 24552 genes. For now, I have no spatial transcriptomics dataset that I want to use; I simply want to see what selected gene panel looks like for the 11 main trajectory clusters (and later the 39 major cell types) so I can design FISH experiments.

I was able to reproduce code from the gpsFISH paper seamlessly with the posted .RMD file (which was fantastic!), so I started to follow along with my subset of snRNA-seq data. I skipped (for now) nonessential steps like constructing the weighted penalty matrix, including specific genes that might have been filtered out, and including information about available probes. After filtering genes, 829 remain (which feels really low, but I move on for now...).

Final dimensions of my sc_count are 829 x 265124. Calculating DEGs for each cell type worked fine; I started having problems with the gpsFISH_optimize step. I used default params as in the example .RMD file, except I set gene2include.id, gene.weight, and weight_penalty to NULL.

After running for about a full ten seconds, this error message popped up: Error in plogis(x) : Non-numeric argument to mathematical function

When googling this error, it seems to happen when a mathematical function is applied to non-numerical data. I figured the problem might be with my own data since the code worked fine in the demo .RMD file. I noticed that my sc_count object was a sparse rather than dense matrix (and earlier throughout I had gotten warnings speaking to this), so perhaps that was causing a problem? Especially since in the Platform Effects demo code it looked as though that was a dense matrix. However, converting it to a dense object produced the same error. I next verified that no NAs were found in my sc_count or sc_cluster objects, and that rownames of the full_count_table matched that of sc_cluster.

Is this a familiar error, and might you have recommendations on how I can correct it? Thank you! I'm hopeful that, despite the size and complexity of the MOCA data, I can use this tool to design cool FISH experiments.

Below I include my output of sessionInfo().

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] monocle3_1.2.9 SingleCellExperiment_1.18.0 SummarizedExperiment_1.26.1 GenomicRanges_1.48.0
[5] GenomeInfoDb_1.32.3 IRanges_2.30.1 S4Vectors_0.34.0 MatrixGenerics_1.8.1
[9] matrixStats_1.2.0 Biobase_2.56.0 BiocGenerics_0.42.0 cowplot_1.1.1
[13] dplyr_1.1.4 ggdendro_0.1.23 SeuratObject_4.1.3 Seurat_4.3.0
[17] reshape2_1.4.4 pheatmap_1.0.12 boot_1.3-28 splitTools_1.0.1
[21] naivebayes_0.9.7 pROC_1.18.5 caret_6.0-94 lattice_0.20-45
[25] ggplot2_3.4.4 ranger_0.16.0 pagoda2_1.0.11 igraph_1.5.1
[29] Matrix_1.5-3 data.table_1.14.10 gpsFISH_0.1.0
loaded via a namespace (and not attached):
[1] utf8_1.2.4 spatstat.explore_3.0-6 reticulate_1.28 R.utils_2.12.3 lme4_1.1-30
[6] tidyselect_1.2.0 RSQLite_2.2.16 AnnotationDbi_1.58.0 htmlwidgets_1.6.1 grid_4.2.1
[11] Rtsne_0.17 munsell_0.5.0 codetools_0.2-18 ica_1.0-3 future_1.33.1
[16] miniUI_0.1.1.1 withr_3.0.0 spatstat.random_3.1-3 colorspace_2.1-0 progressr_0.14.0
[21] knitr_1.42 rstudioapi_0.14 ROCR_1.0-11 tensor_1.5 listenv_0.9.1
[26] urltools_1.7.3 GenomeInfoDbData_1.2.8 polyclip_1.10-4 topGO_2.48.0 bit64_4.0.5
[31] parallelly_1.36.0 vctrs_0.6.5 generics_0.1.3 ipred_0.9-14 xfun_0.37
[36] timechange_0.2.0 R6_2.5.1 DelayedArray_0.22.0 bitops_1.0-7 spatstat.utils_3.0-1
[41] cachem_1.0.7 promises_1.2.0.1 scales_1.3.0 nnet_7.3-17 gtable_0.3.4
[46] globals_0.16.2 goftest_1.2-3 drat_0.2.4 timeDate_4032.109 rlang_1.1.3
[51] splines_4.2.1 lazyeval_0.2.2 ModelMetrics_1.2.2.2 spatstat.geom_3.0-6 brew_1.0-10
[56] yaml_2.3.7 abind_1.4-5 httpuv_1.6.9 tools_4.2.1 lava_1.7.3
[61] sccore_1.0.4 ellipsis_0.3.2 RColorBrewer_1.1-3 ggridges_0.5.6 Rcpp_1.0.11
[66] plyr_1.8.9 zlibbioc_1.42.0 RCurl_1.98-1.8 purrr_1.0.2 dendsort_0.3.4
[71] rpart_4.1.16 deldir_1.0-6 pbapply_1.7-0 zoo_1.8-11 ggrepel_0.9.4
[76] cluster_2.1.4 magrittr_2.0.3 scattermore_0.8 SparseM_1.81 lmtest_0.9-40
[81] triebeard_0.4.1 RANN_2.6.1 fitdistrplus_1.1-8 pkgload_1.3.0 patchwork_1.1.2
[86] mime_0.12 evaluate_0.20 xtable_1.8-4 RMTstat_0.3.1 N2R_1.0.1
[91] gridExtra_2.3 compiler_4.2.1 tibble_3.2.1 KernSmooth_2.23-20 crayon_1.5.2
[96] minqa_1.2.4 R.oo_1.26.0 htmltools_0.5.4 mgcv_1.8-40 later_1.3.0
[101] tidyr_1.3.0 lubridate_1.9.3 DBI_1.1.3 MASS_7.3-58.1 cli_3.6.2
[106] R.methodsS3_1.8.2 gower_1.0.1 pkgconfig_2.0.3 sp_1.6-0 terra_1.6-7
[111] plotly_4.10.1.9000 spatstat.sparse_3.0-0 recipes_1.0.9 foreach_1.5.2 hardhat_1.3.1
[116] XVector_0.36.0 prodlim_2023.08.28 stringr_1.5.1 digest_0.6.33 sctransform_0.3.5
[121] RcppAnnoy_0.0.21 graph_1.74.0 spatstat.data_3.0-0 Biostrings_2.64.1 rmarkdown_2.20
[126] leiden_0.4.3 Rook_1.2 uwot_0.1.16 shiny_1.7.4 nloptr_2.0.3
[131] rjson_0.2.21 lifecycle_1.0.4 nlme_3.1-159 jsonlite_1.8.8 viridisLite_0.4.2
[136] fansi_1.0.6 pillar_1.9.0 GO.db_3.15.0 KEGGREST_1.36.3 fastmap_1.1.1
[141] httr_1.4.5 survival_3.4-0 glue_1.6.2 png_0.1-8 iterators_1.0.14
[146] bit_4.0.4 class_7.3-20 stringi_1.8.3 blob_1.2.3 memoise_2.0.1
[151] irlba_2.3.5.1 future.apply_1.11.1

Thank you for your detailed question! plogis is not a function explicitly called in the package. It must be called by some other functions in the package. To learn more about what is going on, can you provide your code and a small dataset so that I can reproduce the error?

evaknichols · 2024-02-19T21:07:59Z

Hi @YidaZhang0628 , thank you so much for your fast response and willingness to look into this error. This morning I worked to try and get the code and downsampled data object within Github's allowable limits, but failed. Please see this link to a publicly available google drive folder that has all contents.

The full data cds object (derived from Monocle3) has 265124 cells and 24552 genes; the downsampled version has just 261 cells and 24552 genes.

Importantly, though the downsampled object reproduced the same error described above, working with the downsampled data produced a new error at line 163 (initialize_population step):

Error in base::order(DEG.table$AUC, decreasing = TRUE) : 
  argument 1 is not a vector

Please let me know if I can clarify anything or provide code/data in a better alternative way.

YidaZhang0628 · 2024-02-20T13:55:50Z

@evaknichols Thank you for the data and code! They are very helpful. I can reproduce the error using your code. The problem is indeed the format of sc_count. Although you converted it from a sparse matrix to a dense matrix, its data type isdgCMatrix instead of matrix array. When I add sc_count = as.matrix(sc_count) after sc_count = exprs(cds), the problem is gone. Although sc_count is not used directly as input for gpsFISH_optimize, it is used when calculating relative_prop. The format of sc_count will affect the format of relative_prop, which caused the error. Let me know if this helps.

evaknichols · 2024-02-20T19:35:17Z

Hi @YidaZhang0628 , brilliant, that worked for me! Thank you so much. It's really exciting to see the panels.

This is not urgent, since the step is optional. I'm curious if you also reproduced the new error during the initialize_population step with my data/code?:
Error in base::order(DEG.table$AUC, decreasing = TRUE) : argument 1 is not a vector

In general is there an advantage to initializing gene panels first versus randomly later on during the gpsFISH_optimize step?

YidaZhang0628 · 2024-02-21T03:23:52Z

Hi @evaknichols The new error is because some cell types don't have any marker genes, i.e., there is nothing in DEG.table. In your case, 5 out of 8 cell types have no differentially expressed genes based on the downsampled data.

In general, including some cell type marker genes can speed up the optimization (too many can lead to local optimum). But In your case, after downsampling, there are no significant transcriptional differences between most cell types. Therefore, it probably won't make much difference compared to a random initialization. But most importantly, the goal of gpsFISH is to find genes that can identify pre-defined cell types. If the input doesn't have much cell type difference, then it does not make sense to perform gene selection on such a dataset (e.g., the downsampled version in your case) and very likely won't work well.

evaknichols · 2024-02-21T17:40:07Z

Hi @YidaZhang0628 , thank you so much for this explanation--it makes sense to use our cluster to process the fuller dataset.

Hopefully one last question before I go. I'm trying to understand the structure of the GA object as the output of gpsFISH_optimize step. The step in the demo below produces a list (marker_panel) of 100 genes. This makes sense since I asked for 10 marker genes and I have 10 major classes.

marker_panel = rownames(sc_count)[GA$bestsol]
marker_panel

This might be a basic/naive question but, how can I access the 10 genes on a per cluster label basis from the GA object?

Thanks again for your responsiveness. Of the tools I've dabbled in over the years, gpsFISH has been among the most accessible for someone like me (primarily wet lab experimentalist :') ).

YidaZhang0628 · 2024-02-23T02:09:30Z

Hi @evaknichols , It is great to know that you enjoyed using gpsFISH! And don't hesitate to let me know if there is anything I can do to improve the experience.

Back to your question, the reason why marker_panel includes 100 genes in the demo is that the panel_size parameter is set to 100. The way gpsFISH works is to find a given number of genes (determined by panel_size) that can best recognize all cell types in the input data. As to how many genes are needed for each individual cell type, it depends on whether the cell type is easy to identify or not. For obvious cell types, it may only require 1-2 genes. But for sutle cell types, it may require many genes. The method is not trying to select the same amount of genes for each cell type.

For your case, if you only need 10 marker genes, you should set panel_size to 10 and the output will be just 10 genes. As to whether 10 genes are enough to recognize your 10 major classes, it depends on the input data and the separatability of the 10 major classes.

evaknichols · 2024-02-23T17:45:22Z

Hi @YidaZhang0628 , thanks so much for your clarification! I might have a fundamental misunderstanding about gpsFISH. For my experiment, I'm trying to design cell-type specific RNA FISH panels for a multiplexed experiment in tissue slices such that each class has a different color lighting up. So, having some understanding of which genes in marker_panel correspond to which class would be the best (whether 2, 10, or 20 are sufficient is fine). Is this still possible to do with gpsFISH?

(also, please let me know if this discussion makes more sense to move to DM/email--I wouldn't want to misuse Github's issues pages! Much appreciated).

YidaZhang0628 · 2024-02-25T02:14:17Z

Hi @evaknichols It is possible to get the information you want especially when your gene panel size is small. After having the gene panel, you can generate the average expression per cell type for selected genes plot. From the heatmap, you should be able to tell which gene is highly expressed in which class. Of note, some genes may be highly expressed in multiple classes.

evaknichols · 2024-02-28T01:18:42Z

Hi @YidaZhang0628 , thank you for the additional information. I think I have a wrong idea about how gpsFISH is working. I was hopeful that it might penalize genes that are highly expressed in multiple classes (maybe through the weighted penalty matrix?) to enrich for genes that are maximally specific for a class. I'll read the paper more deeply and experiment more to find out how gpsFISH's designed gene panel could be different from one I might manually design from something like Seurat's FindAllMarkers function. Thanks so much for your kind attention!

YidaZhang0628 · 2024-02-28T03:31:44Z

Hi @evaknichols, if there are very specific marker genes for the cell types you are interested in, then they should be identified by gpsFISH. However, in reality, some cell types may not have unique marker genes. Therefore, things like FindAllMarkers won't work. For these cell types, they can only be identified in a combinatorial way (combinatorial expression of multiple genes). gpsFISH is able to account for this situation, which is a major advantage. But this situation can lead to genes that are not specific to only one cell type. With that being said, if we enforce the genes selected to be specific to one cell type, we will not be able to achieve good performance for cell types that don't have unique marker genes.

evaknichols · 2024-03-03T16:17:12Z

Hi @YidaZhang0628 --thank you for this clarification! This makes sense and it sounds like gpsFISH is doing the best it can to strike a good balance. Since I'm more interested in "Level 1" cell type mapping, in a mouse embryogenesis context, I probably won't have a problem with not having enough marker genes. I'll use my best judgement to evaluate when a selected gene could be "leaky" (usually, I rely on DotPlots and looking at gene expression in UMAP space). Thanks again for your support.

nikofleischer · 2024-10-30T13:51:11Z

Hi @YidaZhang0628, Reiterating back to the initial question in this thread: How do we start the gpsFISH_optimize without having paired spatial/scRNAseq data and thus a trained model for platform effects? Whenever I try to specify NULL or FALSE for the simulation_paramter the execution fails.

Best, Niko

YidaZhang0628 · 2024-11-08T20:36:12Z

Hi @YidaZhang0628, Reiterating back to the initial question in this thread: How do we start the gpsFISH_optimize without having paired spatial/scRNAseq data and thus a trained model for platform effects? Whenever I try to specify NULL or FALSE for the simulation_paramter the execution fails.

Best, Niko

Hi Niko,

For your case, you can download a pre-trained Bayesian model for your spatial transcriptomic platform here. After you input the file into R as simulation_params, you can use the same code here. Basically, NULL or FALSE cannot be used to specify simulation_paramter. It has to be either a pre-trained model or a model you trained using your own data.

nikofleischer · 2024-11-09T00:19:52Z

Hi, thanks for your reply! But there is no pretrained model available for my usecase (snRNAseq 10x Chromium v2 and Xenium v1). So there is no possibility to set this part of the algorithm to not have an effect on the calculations? Best Niko On 8. Nov 2024, at 21:36, YidaZhang0628 ***@***.***> wrote: Hi @YidaZhang0628, Reiterating back to the initial question in this thread: How do we start the gpsFISH_optimize without having paired spatial/scRNAseq data and thus a trained model for platform effects? Whenever I try to specify NULL or FALSE for the simulation_paramter the execution fails. Best, Niko Hi Niko, For your case, you can download a pre-trained Bayesian model for your spatial transcriptomic platform here. After you input the file into R as simulation_params, you can use the same code here. Basically, NULL or FALSE cannot be used to specify simulation_paramter. It has to be either a pre-trained model or a model you trained using your own data. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

YidaZhang0628 · 2024-11-09T15:24:21Z

Hi, thanks for your reply! But there is no pretrained model available for my usecase (snRNAseq 10x Chromium v2 and Xenium v1). So there is no possibility to set this part of the algorithm to not have an effect on the calculations? Best Niko On 8. Nov 2024, at 21:36, YidaZhang0628 @.> wrote: Hi @YidaZhang0628, Reiterating back to the initial question in this thread: How do we start the gpsFISH_optimize without having paired spatial/scRNAseq data and thus a trained model for platform effects? Whenever I try to specify NULL or FALSE for the simulation_paramter the execution fails. Best, Niko Hi Niko, For your case, you can download a pre-trained Bayesian model for your spatial transcriptomic platform here. After you input the file into R as simulation_params, you can use the same code here. Basically, NULL or FALSE cannot be used to specify simulation_paramter. It has to be either a pre-trained model or a model you trained using your own data. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.>

I see what you mean now. If you are using the development version of gpsFISH, you won't be able to do it. But if you are using the stable version, there is a way to select the gene panel without considering platform effects. You can do this by setting two_step_sampling_type to c("Subsampling_by_cluster", "No_simulation"). Then you should be ok to set simulation_model and simulation_parameter to NULL. Can you give that a try and let me know if that works?

nikofleischer · 2024-11-09T19:47:44Z

Thanks, that’s the Information I needed! Using the stable version, but couldn’t find this parameter in either the vignette or the help function, thanks! Will test once I am back in the office. On 9. Nov 2024, at 16:24, YidaZhang0628 ***@***.***> wrote: Hi, thanks for your reply! But there is no pretrained model available for my usecase (snRNAseq 10x Chromium v2 and Xenium v1). So there is no possibility to set this part of the algorithm to not have an effect on the calculations? Best Niko On 8. Nov 2024, at 21:36, YidaZhang0628 @.> wrote: Hi @YidaZhang0628, Reiterating back to the initial question in this thread: How do we start the gpsFISH_optimize without having paired spatial/scRNAseq data and thus a trained model for platform effects? Whenever I try to specify NULL or FALSE for the simulation_paramter the execution fails. Best, Niko Hi Niko, For your case, you can download a pre-trained Bayesian model for your spatial transcriptomic platform here. After you input the file into R as simulation_params, you can use the same code here. Basically, NULL or FALSE cannot be used to specify simulation_paramter. It has to be either a pre-trained model or a model you trained using your own data. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.> I see what you mean now. If you are using the development version of gpsFISH, you won't be able to do it. But if you are using the stable version, there is a way to select the gene panel without considering platform effects. You can do this by setting two_step_sampling_type to c("Subsampling_by_cluster", "No_simulation"). Then you should be ok to set simulation_model and simulation_parameter to NULL. Can you give that a try and let me know if that works? —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running gpsFISH without spatial count and spatial cluster #18

running gpsFISH without spatial count and spatial cluster #18

justin512lee commented Mar 29, 2023

YidaZhang0628 commented Mar 30, 2023

justin512lee commented Mar 30, 2023

YidaZhang0628 commented Mar 30, 2023

justin512lee commented Mar 30, 2023

pakiessling commented Mar 31, 2023 •

edited

Loading

YidaZhang0628 commented Mar 31, 2023

YidaZhang0628 commented Mar 31, 2023

Boehmin commented Jan 31, 2024

YidaZhang0628 commented Feb 1, 2024

Boehmin commented Feb 1, 2024 •

edited

Loading

simomounir commented Feb 5, 2024

YidaZhang0628 commented Feb 6, 2024

simomounir commented Feb 6, 2024

YidaZhang0628 commented Feb 6, 2024

simomounir commented Feb 6, 2024

evaknichols commented Feb 18, 2024 •

edited

Loading

YidaZhang0628 commented Feb 19, 2024

evaknichols commented Feb 19, 2024

YidaZhang0628 commented Feb 20, 2024

evaknichols commented Feb 20, 2024

YidaZhang0628 commented Feb 21, 2024

evaknichols commented Feb 21, 2024

YidaZhang0628 commented Feb 23, 2024 •

edited

Loading

evaknichols commented Feb 23, 2024

YidaZhang0628 commented Feb 25, 2024

evaknichols commented Feb 28, 2024

YidaZhang0628 commented Feb 28, 2024

evaknichols commented Mar 3, 2024

nikofleischer commented Oct 30, 2024

YidaZhang0628 commented Nov 8, 2024

nikofleischer commented Nov 9, 2024 via email

YidaZhang0628 commented Nov 9, 2024

nikofleischer commented Nov 9, 2024 via email

running gpsFISH without spatial count and spatial cluster #18

running gpsFISH without spatial count and spatial cluster #18

Comments

justin512lee commented Mar 29, 2023

YidaZhang0628 commented Mar 30, 2023

justin512lee commented Mar 30, 2023

YidaZhang0628 commented Mar 30, 2023

justin512lee commented Mar 30, 2023

pakiessling commented Mar 31, 2023 • edited Loading

YidaZhang0628 commented Mar 31, 2023

YidaZhang0628 commented Mar 31, 2023

Boehmin commented Jan 31, 2024

YidaZhang0628 commented Feb 1, 2024

Boehmin commented Feb 1, 2024 • edited Loading

simomounir commented Feb 5, 2024

YidaZhang0628 commented Feb 6, 2024

simomounir commented Feb 6, 2024

YidaZhang0628 commented Feb 6, 2024

simomounir commented Feb 6, 2024

evaknichols commented Feb 18, 2024 • edited Loading

YidaZhang0628 commented Feb 19, 2024

evaknichols commented Feb 19, 2024

YidaZhang0628 commented Feb 20, 2024

evaknichols commented Feb 20, 2024

YidaZhang0628 commented Feb 21, 2024

evaknichols commented Feb 21, 2024

YidaZhang0628 commented Feb 23, 2024 • edited Loading

evaknichols commented Feb 23, 2024

YidaZhang0628 commented Feb 25, 2024

evaknichols commented Feb 28, 2024

YidaZhang0628 commented Feb 28, 2024

evaknichols commented Mar 3, 2024

nikofleischer commented Oct 30, 2024

YidaZhang0628 commented Nov 8, 2024

nikofleischer commented Nov 9, 2024 via email

YidaZhang0628 commented Nov 9, 2024

nikofleischer commented Nov 9, 2024 via email

pakiessling commented Mar 31, 2023 •

edited

Loading

Boehmin commented Feb 1, 2024 •

edited

Loading

evaknichols commented Feb 18, 2024 •

edited

Loading

YidaZhang0628 commented Feb 23, 2024 •

edited

Loading