Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to obtain test results for accuracy? #4

Closed
BearBiscuit05 opened this issue Oct 9, 2023 · 6 comments
Closed

How to obtain test results for accuracy? #4

BearBiscuit05 opened this issue Oct 9, 2023 · 6 comments

Comments

@BearBiscuit05
Copy link

When running the PA dataset and preparing to test the model's correctness, I encountered some issues. I stored the model at the end of the entry function in the run_ducati.py file. Subsequently, I tested the trained model, but it seems that my testing has some problems, and the accuracy I obtained is not correct. I would like to know how to set the parameters to achieve results similar to the paper. If you could provide me with an update to the testing code, I would greatly appreciate it. The parameters I set are fanout [15, 15, 15], and epoch: 20.

@initzhang
Copy link
Owner

Hi @BearBiscuit05 , sorry for the late reply, I was really busy with another project for the last month...

If I guess it right, you got around 0.49 accuracy on PA right? Then the problem is on data preprocessing. As mentioned in the accuracy section of the paper, we add bidirectional edges to PA for the accuracy part because this is a common practice in leaderboard which significantly boosts accuracy. You can simply use graph = dgl.to_bidirected(graph) during the preprocessing to fix this problem.

BTW, as validated by the code here, the sampling output of DUCATI is identical to DGL. And DUCATI does not change GNN model/optimizer. Therefore, the accuracy of DUCATI is the same to DGL and any other frameworks that use plain neighbour sampling.

@BearBiscuit05
Copy link
Author

Thank you very much for your response. Currently, I don't have a machine with sufficient memory to implement this. So, I would like to inquire whether a simple three-layer Sage model is sufficient to obtain results. Because I observed on the OGB leaderboard that they seem to have added MLP operations.

@initzhang
Copy link
Owner

Yes, the results reported in the paper are obtained with a standard three-layer GraphSAGE model whose layer is a standard dglnn.SAGEConv as defined here

@BearBiscuit05
Copy link
Author

Thanks very much for your reply, I totally understand.

@BearBiscuit05
Copy link
Author

Hi, I'm sorry to ask again. With the help you provided, I successfully ran the accuracy on the transformed undirected graph PA today. However, I could only measure an accuracy of 0.55-0.56. I would like to understand the reasons for this situation. Currently, my parameters are set as follows: batch size: 1024, fanout: [10, 10, 10], hidden size: 256, dropout: 0.5.

@initzhang
Copy link
Owner

Below gives all the hyper-parameters

Namespace(adj_budget=1.75, bs=1000, dataset='ogbn-papers100M', dropout=0.0, epochs=20, fanouts='10,10,10', lr=0.003, metric='acc', model='sage', nfeat_budget=5.25, num_hidden=256, pre_batches=100, pre_epochs=2, valbs=100, valfan='10,10,10')

Note that:
(1) According to common practice, the evaluation should be made in a per-epoch fashion. You should evaluate the model on the validation set right after each epoch's training. And you finally choose the model with the highest validation accuracy throughout this procedure for later use.
(2) the validation part use another sampler other than the training sampler, for simplicity, you can use dgl.dataloading.NeighborSampler(valfan)
(3) the accuracy is calculated with torchmetrics.functional.accuracy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants