-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xfuse on Visium data #10
Comments
Hi, Thanks for your interest in our paper! I'll try to answer point-wise below:
|
Many Thanks for the clear explanation!
|
Happy to help! :) The outputs depend on what you specify in the
The last two types of analyses require an annotation file (see #3). Interop with other packages is unfortunately quite limited at this time. It is of course possible to load the png and csv files listed above in R, but those files are just summarizations of the results. Storing the gene expression maps in a count matrix would be possible but result in a very large output file. Ideas on new output types better suited for data exchange are welcome! |
Thank you! |
Hi there,
Many Thanks:) |
Hi, The gene maps are based on the Matplotlib inferno color map, so it would be possible to create a color bar by running, for example import matplotlib as mpl
from imageio import imsave
w = 100
h = 5
colorbar = np.broadcast_to(mpl.cm.inferno(np.linspace(0, 1, w)), (h, w, 4))
imsave("colorbar.png", colorbar) A caveat is, of course, that it is unlabeled and does not show the range of expression values. PR #27 will make it possible to save gene maps as numpy arrays, which could be stacked post-hoc to produce genes x pixels matrices. You can try it out if you like on the [analyses.gene_maps]
type = "gene_maps"
[analyses.gene_maps.options]
gene_regex = ".*"
num_samples = 1
genes_per_batch = 10
predict_mean = true
normalize = false
mask_tissue = true
scale = 1.0
writer = "tensor" to the config file. This is still a bit experimental, so I can't vouch for stability. Alternatively, you could modify these lines of code on xfuse/xfuse/analyze/gene_maps.py Lines 118 to 127 in 6680e39
Replace them with something like filename = os.path.join(
output_dir, slide_name, f"{gene_name}.pt",
)
os.makedirs(os.path.dirname(filename), exist_ok=True)
torch.save(gene_map, filename) to save the gene maps as pytorch tensors. I'm not very familiar with SCTransform, so I can't comment about that transformation in particular. It is possible to use non-integer count data on the positive real line, though, so if you'd like to try that out, I'd be interested in hearing the results. However, it's unclear if the expression modeling we're using is a good fit. Are you using SCTransform to normalize for sequencing depth or section-wise batch effects? If the latter, have you tried using section-wise covariates? |
HI, many thanks for your suggestions! It did looks like inferno color scale to me.. genes x pixels matrices outputs would be extremely useful for downstream stuff.. |
Let me know how it goes! Yep, STUtility is developed by our data scientists. They know a lot more about normalization than I do ;) Noted. A problem is, as you mentioned, memory usage. A genes x pixels matrix for a 2000x2000 image over 10k genes takes up roughly 150 GB. Though, downsampling or selecting a limited number of genes should be doable. |
Hey, Sorry for taking a while to get back. Looking forward to hearing about the results! :) Here's a brief description of the options:
The The file # samples is a numpy array of dimension (num_samples, H, W) containing the monte carlo samples for a given gene
# mean gene map:
mean = samples.mean(0)
# stdv gene map:
stdv = samples.std(0)
# invcv+ gene map
lfc = np.log2(samples.transpose(1, 2, 0)) - np.log2(samples.mean((1, 2)))
invcv = lfc.mean(-1).clip(0) / lfc.std(-1) Documentation is still lacking for these analysis routines, it is definitely something we want to improve in the future. Hope at least that the above can shed some light :) |
Hey, No worries. |
Hey there, I added a small repo as an example of using xfuse on one Visium heart section as an example.
As you will see, 2nd approach works well for your model to predict expression.. For some genes the prediction is probably more precise, for others it's hard to say which way is better. Eg MYL2 expression looks better on original counts rather than on SCT-corrected, I think this is related to questions:
|
Thanks for the results, great stuff! Love the 3D kernel plots btw, look really neat--would be cool to implement something like that for the xfuse gene map analysis in the future :) The gene maps based on SCT-transformed data look a bit crisper, right? I wonder if the blurriness in the results on untransformed data may be due to the low-count regions. As far as I can tell, the number of reads in those regions are abnormally low (i.e., technical artifacts) and not related to histology. Thus, the correspondence between histology and expression that the method learns becomes distorted, which may impact predictions in other areas as well. The SCT transformation corrects the sequencing depth, which restores the correspondence and results in better predictions. I think the best way to improve results on the untransformed data may therefore be to drop observations in the low-count regions. The easiest way to do that would be to specify a threshold for the minimum number of reads in each spot. Looking at the raw data, an appropriate threshold would be something like 3000. This can be specified by adding the following slide options: [slides.section1.options]
min_counts = 3000
always_filter = []
always_keep = [1] This specifies that all spots with less than 3000 reads will be dropped with the exception ( I think I've seen this kind of blurriness some times when using a too high learning rate. But the current value 0.0003 should be fine, so I don't think that's the case here. The image size also looks good to me, and a higher resolution will probably only make inference more difficult. The |
Many Thanks! I'm glad it was also useful for you and may be others..
Yes, the gene-maps results on SCT-transformed data looks more sharper, but for some genes the predicted expression pattern looks a bit different (eg MYL2), relatively to expression on spots Untransformed data example: I will try to retrain as suggested on Untransformed data using Thanks again for your suggestions! |
Awesome, thanks for the links and pointers! Didn't know plotly existed for Python. Will definitely try this out 👍 I agree that the MYL2 prediction looks too flat with the SCT-transformed data. Looking at the H&E image, I noticed the histology is very homogenous. My experience is that the model is quicker to learn expression patterns associated with clear morphological signatures. It could be that the model hasn't fully converged yet or that the expression pattern of MYL2 in the ground truth data is too idiosyncratic to drive the creation of a distinct metagene. I wonder if the more accurate prediction using untransformed data is a scaling effect? You could try setting the option |
Great!! Plotly is just amazing :) Here is the result on using the counts param that you suggested to "correct" UMIs.
And I used I agree with you, I will try |
Thanks for the update and great to hear about the improvements! :) In terms of convergence, my experience is that the number of update steps is usually slightly sublinear with respect to the number of sections used for training. With only one section, I think it could make sense to increase the number of epochs to perhaps something like 50k. It is best to check convergence by tracking the ELBO, though, and terminate training when the ELBO flattens out. By default, xfuse emits a Tensorboard log file that gets written to the You can launch Tensorboard by running Using a high number of Good luck with the analysis and let me know how it goes :) |
Hi there, again thanks for all the feedbacks on using Xfuse!
|
Hi! It is currently not possible to use prespecified metagenes, although this is something that definitely would be cool to support in the future. The best you can do right now is to train the model using only the spatially variable markers (by setting the I've found sometimes that it is easier to learn subtle gene expression patterns when the training patches show larger swathes of the tissue, so that as much context is given to the recognition network as possible. This can be achieved by either downsampling the tissue images more (lower Let me know how it goes and if you have any questions :) |
Thanks a lot for the suggestions! :) |
Hi again,
Many Thanks! |
Hello,
Your approach looks very interesting! And I hope your pre-print will get accepted.
I would like to run xfuse on GPU cluster using one dataset from Visium. Few questions:
xfuse run
xfuse convert visium --counts s1_filtered_feature_bc_matrix.h5 --image tissue_hires_image.png --transformation-matrix tissue_positions_list.csv --scale 0.3 --output-file section1.h5
.h5, .pnd, .csv are supported?
my-config.toml
[analyses.gene_maps] normalize = false
what does that exactly mean, no normalization of counts?xfuse
?Many Thanks
The text was updated successfully, but these errors were encountered: