-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
running gpsFISH without spatial count and spatial cluster #18
Comments
Can you elaborate a bit more on what you plan to do? If you want to design a gene panel for MERFISH/osmFISH/DARTFISH, you can run gpsFISH with only single cell RNA-seq because we have already trained Bayesian models for those three platforms. If you are designing gene panels for other platforms and you don't have spatial data, you can run gpsFISH without simulation. This way, gene panels will be evaluated using scRNA-seq data only. |
Hi @YidaZhang0628. Thank you for your quick response. I want to design a gene panel for MERFISH. So I need to run the Platform effect estimation tutorial to get |
If you have scRNA-seq data and MERFISH data from your target tissue, then that will be the suggestion. But if you don't, we have a pre-trained Bayesian model for MERFISH. You can find it here. With the pre-trained model, you can skip the platform effect estimation step and directly run gene panel selection. |
Thank you for your help. I was able to run everything except the Thanks again for your help. |
@YidaZhang0628 do you recommend using the pre-trained models even for tissues not in the training dataset (i.e. non mouse hippocampus for MERFISH)? Do the encoded platform effects generalize well? |
@justin512lee Can you provide a bit more information? Are you getting this error when running the tutorial or on your own data? The first thing I would check is to make sure all the input files have the same data type and format as in the tutorial. If the error is still there, can you provide your code and a small dataset so that I can reproduce the error? |
@pakiessling This is something we are planning to check but right now, we don't know how generalizable the platform effects are for the same genes in the same platform but across different tissues. Therefore, if you can find paired scRNA-seq data and spatial data in your target tissue, it is still recommended to train the Bayesian model first. |
I have snRNA-seq data and no spatial data so my use case falls into a similar category as above I believe. Until now everything ran reasonably smoothly following the tutorial. I am stuck at the gpsFISH_optimize step. The below is my error message regarding NA in the probability vector.
I use the same values for pop_size, panel_size etc. as in the tutorial. The only difference is that I use our data for sc_count and sc_cluster. I tried to make sure the data is formatted as expected throughout. Have you encountered this error previously? Does this have to do with the sim_parameters? Or my data? Thanks for your help in advance! |
Do you mind providing me the dimensions of your scRNA-seq data and also the number of cell types? |
Thanks for such a quick reply!
EDIT:
|
I think this can go under the same issue above: I have my own single-cell refence data (breast cancer dataset, ~70k cell) and I have my spatial data based on sm-FISH (from breast cancer tissues as well) with a gene expression matrix per segmented cell. How can I use gpsFISH to perform label transfer from reference single-cell data to spatial data? I see some confusion matrices, but they were calculated from predictions of cell types of single-cell data right? Should I also skip the Platform-estimation phase? If there is any devised method to follow, let me know. Thanks in advance! |
Thank you for letting me know! I was on vacation for the past few days and couldn't get back to you. I am happy to know that the problem has been solved. Please don't hesitate to reach out if you have any further questions. One thing I noticed is that you mentioned you are using the probe_count file in the gpsFISH package. That information is specific to DARTFISH. Therefore, if you are designing your gene panel for a different technology (like MERFISH), that probe_count file is likely to be non-informative because the probe design mechanisms used by different technologies are different. I just want to point this out in case this information is relevant. |
Hi there, Let me know if there is any update on this #18 (comment) Thanks a lot! |
@simomounir To make sure I understand your goal correctly, you have scRNA-seq data that is already annotated and you have smFISH data that is not annotated. You want to annotate the smFISH data using your scRNA-seq data, is that correct? If yes, then there are many methods for this task, for example, label transfer from Seurat. What gpsFISH does is to select gene panels if you want to design a new spatial transcriptomics experiment. Both the platform effect estimation and gene panel selection steps rely on data with known labels. For example, the platform effect estimation step requires both input scRNA-seq and spatial data to be annotated. The gene panel selection step requires the input scRNA-seq data to be annotated. Let me know if this addresses your question. |
Yes indeed, that addresses my question perfectly. I was trying to see if I can harvest gpsFISH functionalities to predict unknown cell labels. I have a labeled smFISH dataset though, I was just trying to test the prediction module. Thanks a lot though for your prompt answer! I will be using gpsFISH for designing some experiments. |
Hello! Thank you for producing this tool, which I think has great potential for reviving an earlier thread in my research. I'm so excited to get it working! I am posting here because I think my question also falls under this open issue, and perhaps its discussion can also help others. I'm working with a subset of snRNA-seq data from the Mouse Organogenesis Cell Atlas (MOCA) here. It has 265124 cells and 24552 genes. For now, I have no spatial transcriptomics dataset that I want to use; I simply want to see what selected gene panel looks like for the 11 main trajectory clusters (and later the 39 major cell types) so I can design FISH experiments. I was able to reproduce code from the gpsFISH paper seamlessly with the posted .RMD file (which was fantastic!), so I started to follow along with my subset of snRNA-seq data. I skipped (for now) nonessential steps like constructing the weighted penalty matrix, including specific genes that might have been filtered out, and including information about available probes. After filtering genes, 829 remain (which feels really low, but I move on for now...). Final dimensions of my sc_count are 829 x 265124. Calculating DEGs for each cell type worked fine; I started having problems with the After running for about a full ten seconds, this error message popped up: When googling this error, it seems to happen when a mathematical function is applied to non-numerical data. I figured the problem might be with my own data since the code worked fine in the demo .RMD file. I noticed that my Is this a familiar error, and might you have recommendations on how I can correct it? Thank you! I'm hopeful that, despite the size and complexity of the MOCA data, I can use this tool to design cool FISH experiments. Below I include my output of
|
Thank you for your detailed question! |
Hi @YidaZhang0628 , thank you so much for your fast response and willingness to look into this error. This morning I worked to try and get the code and downsampled data object within Github's allowable limits, but failed. Please see this link to a publicly available google drive folder that has all contents. The full data Importantly, though the downsampled object reproduced the same error described above, working with the downsampled data produced a new error at line 163 (
Please let me know if I can clarify anything or provide code/data in a better alternative way. |
@evaknichols Thank you for the data and code! They are very helpful. I can reproduce the error using your code. The problem is indeed the format of |
Hi @YidaZhang0628 , brilliant, that worked for me! Thank you so much. It's really exciting to see the panels. This is not urgent, since the step is optional. I'm curious if you also reproduced the new error during the In general is there an advantage to initializing gene panels first versus randomly later on during the |
Hi @evaknichols The new error is because some cell types don't have any marker genes, i.e., there is nothing in In general, including some cell type marker genes can speed up the optimization (too many can lead to local optimum). But In your case, after downsampling, there are no significant transcriptional differences between most cell types. Therefore, it probably won't make much difference compared to a random initialization. But most importantly, the goal of gpsFISH is to find genes that can identify pre-defined cell types. If the input doesn't have much cell type difference, then it does not make sense to perform gene selection on such a dataset (e.g., the downsampled version in your case) and very likely won't work well. |
Hi @YidaZhang0628 , thank you so much for this explanation--it makes sense to use our cluster to process the fuller dataset. Hopefully one last question before I go. I'm trying to understand the structure of the
This might be a basic/naive question but, how can I access the 10 genes on a per cluster label basis from the Thanks again for your responsiveness. Of the tools I've dabbled in over the years, gpsFISH has been among the most accessible for someone like me (primarily wet lab experimentalist :') ). |
Hi @evaknichols , It is great to know that you enjoyed using gpsFISH! And don't hesitate to let me know if there is anything I can do to improve the experience. Back to your question, the reason why For your case, if you only need 10 marker genes, you should set |
Hi @YidaZhang0628 , thanks so much for your clarification! I might have a fundamental misunderstanding about gpsFISH. For my experiment, I'm trying to design cell-type specific RNA FISH panels for a multiplexed experiment in tissue slices such that each class has a different color lighting up. So, having some understanding of which genes in (also, please let me know if this discussion makes more sense to move to DM/email--I wouldn't want to misuse Github's issues pages! Much appreciated). |
Hi @evaknichols It is possible to get the information you want especially when your gene panel size is small. After having the gene panel, you can generate the average expression per cell type for selected genes plot. From the heatmap, you should be able to tell which gene is highly expressed in which class. Of note, some genes may be highly expressed in multiple classes. |
Hi @YidaZhang0628 , thank you for the additional information. I think I have a wrong idea about how gpsFISH is working. I was hopeful that it might penalize genes that are highly expressed in multiple classes (maybe through the weighted penalty matrix?) to enrich for genes that are maximally specific for a class. I'll read the paper more deeply and experiment more to find out how gpsFISH's designed gene panel could be different from one I might manually design from something like Seurat's |
Hi @evaknichols, if there are very specific marker genes for the cell types you are interested in, then they should be identified by gpsFISH. However, in reality, some cell types may not have unique marker genes. Therefore, things like |
Hi @YidaZhang0628 --thank you for this clarification! This makes sense and it sounds like gpsFISH is doing the best it can to strike a good balance. Since I'm more interested in "Level 1" cell type mapping, in a mouse embryogenesis context, I probably won't have a problem with not having enough marker genes. I'll use my best judgement to evaluate when a selected gene could be "leaky" (usually, I rely on DotPlots and looking at gene expression in UMAP space). Thanks again for your support. |
Hi @YidaZhang0628, Reiterating back to the initial question in this thread: How do we start the gpsFISH_optimize without having paired spatial/scRNAseq data and thus a trained model for platform effects? Whenever I try to specify NULL or FALSE for the simulation_paramter the execution fails. Best, Niko |
Hi Niko, For your case, you can download a pre-trained Bayesian model for your spatial transcriptomic platform here. After you input the file into R as |
Hi, thanks for your reply! But there is no pretrained model available for my usecase (snRNAseq 10x Chromium v2 and Xenium v1). So there is no possibility to set this part of the algorithm to not have an effect on the calculations? Best Niko On 8. Nov 2024, at 21:36, YidaZhang0628 ***@***.***> wrote:
Hi @YidaZhang0628, Reiterating back to the initial question in this thread: How do we start the gpsFISH_optimize without having paired spatial/scRNAseq data and thus a trained model for platform effects? Whenever I try to specify NULL or FALSE for the simulation_paramter the execution fails.
Best, Niko
Hi Niko,
For your case, you can download a pre-trained Bayesian model for your spatial transcriptomic platform here. After you input the file into R as simulation_params, you can use the same code here. Basically, NULL or FALSE cannot be used to specify simulation_paramter. It has to be either a pre-trained model or a model you trained using your own data.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
I see what you mean now. If you are using the development version of gpsFISH, you won't be able to do it. But if you are using the stable version, there is a way to select the gene panel without considering platform effects. You can do this by setting |
Thanks, that’s the Information I needed! Using the stable version, but couldn’t find this parameter in either the vignette or the help function, thanks! Will test once I am back in the office. On 9. Nov 2024, at 16:24, YidaZhang0628 ***@***.***> wrote:
Hi, thanks for your reply! But there is no pretrained model available for my usecase (snRNAseq 10x Chromium v2 and Xenium v1). So there is no possibility to set this part of the algorithm to not have an effect on the calculations? Best Niko On 8. Nov 2024, at 21:36, YidaZhang0628 @.> wrote: Hi @YidaZhang0628, Reiterating back to the initial question in this thread: How do we start the gpsFISH_optimize without having paired spatial/scRNAseq data and thus a trained model for platform effects? Whenever I try to specify NULL or FALSE for the simulation_paramter the execution fails. Best, Niko Hi Niko, For your case, you can download a pre-trained Bayesian model for your spatial transcriptomic platform here. After you input the file into R as simulation_params, you can use the same code here. Basically, NULL or FALSE cannot be used to specify simulation_paramter. It has to be either a pre-trained model or a model you trained using your own data. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.>
I see what you mean now. If you are using the development version of gpsFISH, you won't be able to do it. But if you are using the stable version, there is a way to select the gene panel without considering platform effects. You can do this by setting two_step_sampling_type to c("Subsampling_by_cluster", "No_simulation"). Then you should be ok to set simulation_model and simulation_parameter to NULL. Can you give that a try and let me know if that works?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: ***@***.***>
|
Hi,
Is it possible to run gpsFISH without spatial count and spatial cluster datasets (or can it run with spatial count and cluster datasets provided in the tutorial)? I am working with the datasets from this paper and have sc_counts and sc_clusters. Any help would be appreciated!
The text was updated successfully, but these errors were encountered: