Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The test data runs differently than the example #21

Open
lifan2022 opened this issue Mar 20, 2024 · 8 comments
Open

The test data runs differently than the example #21

lifan2022 opened this issue Mar 20, 2024 · 8 comments

Comments

@lifan2022
Copy link

lifan2022 commented Mar 20, 2024

Hello,

Thank you for bringing such a good piece of software, I'm having a little problem with your software.

I ran TOSICA with test data, but in the new_adata after the prediction, 2874 cells were predicted to be different from the original celltype.

#ref data
ref_adata = sc.read('./demo_train.h5ad')
ref_adata = ref_adata[:,ref_adata.var_names]
print(ref_adata)
print(ref_adata.obs.Celltype.value_counts())

#query data
query_adata = sc.read('./demo_test.h5ad')
query_adata = query_adata[:,ref_adata.var_names]
print(query_adata)
print(query_adata.obs.Celltype.value_counts())

#Training
TOSICA.train(ref_adata, gmt_path='./GO_bp.gmt', label_name='Celltype',epochs=3,project='hGOBP_demo')

#Prediction
model_weight_path = './hGOBP_demo/model-0.pth'
new_adata = TOSICA.pre(query_adata, model_weight_path = model_weight_path,project='hGOBP_demo')

c14d364b5bf24aef50a8d404436fd39

@lifan2022
Copy link
Author

Here's the result of my final visualization
e6d4d7bd6ad68dfc96205d3b42aad9f

@apologize66
Copy link

Hello!
I encountered an error when running the 9th cell, which said "items in new_categories are not the same as in old categories." When I tried to change the order of the celltype defined by the original author to match the new_categories in order to solve this problem, I found that the result was the same as the one you obtained in the running result. Did you encounter the same error as well?
And,if you are also a Chinese student, perhaps we can further communicate .

1
2
3
4

@lifan2022
Copy link
Author

Hello! I encountered an error when running the 9th cell, which said "items in new_categories are not the same as in old categories." When I tried to change the order of the celltype defined by the original author to match the new_categories in order to solve this problem, I found that the result was the same as the one you obtained in the running result. Did you encounter the same error as well? And,if you are also a Chinese student, perhaps we can further communicate .

1 2 3 4

Yes, I'm getting the same error

@IvyYang00
Copy link

Hi! I encountered similar error as you guys. Solved as what [apologize66] did, I got a different result but still very different from the original celltype with relatively low accuracy.
image

@IvyYang00
Copy link

Hi! I encountered similar error as you guys. Solved as what [apologize66] did, I got a different result but still very different from the original celltype with relatively low accuracy. image

I tried to useTOSICA to train my own model with human lung scRNA-seq dataset using epoch=20. The validate accuracy is 0.993 when training the model. But when I used the model to predict internal test dataset, the accuracy is only about 0.29. I don't know why.

image
image

@JiaweiChenGo
Copy link
Collaborator

Here's the result of my final visualization e6d4d7bd6ad68dfc96205d3b42aad9f

Thank you for your interest in TOSICA.
Unfortunately I cannot judge where the problem is from what has been shown here. If I encounter this problem, first, I will check whether the var_names of the ref_adata and query_adata are consistent and in the same order. Then I will check whether the pre-trained model is loaded correctly.
Besides, I noticed that different cell types were correctly separated in the attention space and there is no cell were predict to be alpha cell which is the most abundant cell type and should have the highest prediction accuracy. So I'm worried if there's something wrong with label_dictionary.csv.
If the prediction is still terrible and you are willing to share your demo dataset and code, I would be happy to help you analyze and examine what happened here!

@JiaweiChenGo
Copy link
Collaborator

Hello! I encountered an error when running the 9th cell, which said "items in new_categories are not the same as in old categories." When I tried to change the order of the celltype defined by the original author to match the new_categories in order to solve this problem, I found that the result was the same as the one you obtained in the running result. Did you encounter the same error as well? And,if you are also a Chinese student, perhaps we can further communicate .

1 2 3 4

Maybe, you masked alpha cells in the traing process, which resulted in the categories of predicted cell types being different from those in the tutorial.ipynb.
I am glad to have more communications, here is my email: jiaweichen@pku.edu.cn and wechat: chenjiawei9667

@JiaweiChenGo
Copy link
Collaborator

Hi! I encountered similar error as you guys. Solved as what [apologize66] did, I got a different result but still very different from the original celltype with relatively low accuracy. image

Thank you for your interest in TOSICA.
Similarly, I noticed that different cell types were correctly separated and there is no cell were predict to be alpha cell which is the most abundant cell type and should have the highest prediction accuracy. perhaps you masked alpha cells in the traing process, but the default cutoff of the predction is 0.1 which will resulte in a low accuracy.
As for the human lung scRNA-seq dataset, I am glad to help you analyze and examine what happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants