Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency with BOND paper #83

Closed
CaptainJuice opened this issue Aug 4, 2023 · 6 comments
Closed

Inconsistency with BOND paper #83

CaptainJuice opened this issue Aug 4, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@CaptainJuice
Copy link

I run main.py multiple times with DOMINANT (from https://github.com/pygod-team/pygod/tree/main/benchmark).
I find out that although the hyperparameter setting is consistent with the BOND paper (https://arxiv.org/pdf/2206.10071.pdf), the results on inj_cora (AUC: 0.7566±0.0332 (0.7751)) and inj_amazon (AUC: 0.7147±0.0006 (0.7152)) are significantly different from what you show in table 3 from the BOND paper (https://arxiv.org/pdf/2206.10071.pdf), which are 82.7±5.6 (84.3) on inj_cora and 81.3±1.0 (82.2) for inj_amazon.
Is there any advice that you can provide about how to reproduce the results of the BOND paper?

@kayzliu
Copy link
Member

kayzliu commented Aug 6, 2023

Thank you for your report. For a short answer, you can reproduce the results by downgrading to the original version used by the benchmark v0.3.1 via pip install pygod=0.3.1.

After our invistigation, we found that the difference in the performance is mainly caused by the changes of parameter weight (alpha in v0.3.1) of DOMINANT. In the original benchmark, we apply a heuristic method to select the weight for DOMINANT (and some other methods) on injected datasets. The experimental results, as you may observed, are indeed better than random selection. But later we found that this heuristic method cannot be generalized to other datasets. To avoid misleading users, we remove this heuristic method from PyGOD in the later release.

Also, current benchmark script is out of date, ignoring the selection of weight. Could you please help us update model initialization in utils?

  • Remove heuristic selection of weight.
  • Update the parameter name for detectors (DOMINANT, GAAN, and CONAD) from alpha to weight.

If you have any further questions, please feel free to let us know.

@kayzliu kayzliu added the bug Something isn't working label Aug 6, 2023
@ParthaPratimBanik
Copy link
Contributor

Hi @kayzliu,
I am trying to update benchmark/utils.py, to regenerate the Table-3 result.

I have some questions:

  • Remove heuristic selection of weight.

does it mean alpha=choice(alpha) to remove? I think you mean the choice() function by "heuristic selection".

Please correct me, if I am wrong.

@kayzliu
Copy link
Member

kayzliu commented Aug 16, 2023

Hi Partha! For removing the heuristic selection, I think you only need to change the following lines. Remove the if-else and set the alpha to [0.8, 0.5, 0.2] without condition. But note that it may not reproduce the results in Table 3. For exact reproduction of the results, please downgrade to 0.3.1.

pygod/benchmark/utils.py

Lines 37 to 41 in 987776c

if args.dataset[:3] == 'inj' or args.dataset[:3] == 'gen':
# auto balancing on injected dataset
alpha = [None]
else:
alpha = [0.8, 0.5, 0.2]

@ParthaPratimBanik
Copy link
Contributor

Hi @kayzliu ,
Thank you for your reply.

Hi Partha! For removing the heuristic selection, I think you only need to change the following lines. Remove the if-else and set the alpha to [0.8, 0.5, 0.2] without condition.

Yes, I got your point. I will come back after removing if-else block.

But note that it may not reproduce the results in Table 3. For exact reproduction of the results, please downgrade to 0.3.1.

Yes, you may right. I need to downgrade to 0.3.1.

On 1.0.0, I already start testing on all kinds of possible following combinations for DOMINANT model on inj_coar dataset.

dropout = [0.0, 0.1, 0.3, 0.5, 0.7]
lr = [0.004, 0.01, 0.05, 0.10] # lr brings change on AUC
weight_decay = [0.0, 0.01]
hid_dim = [8, 12, 16, 32, 48, 64, 128, 256, 512, 1024]
weight = [None, 0.5]

But I did not find any AUC >=0.80 on inj_coar dataset, maximum is ~0.7691.
I did not check other models yet.

You can check on benchmark/results folder on ParthaPratimBanik/update_bm_utils repo.
testing code: benchmark/test_dominant.py

My target is to get the generalized hyper-parameter range for every model with every dataset, which could be version independent. I don't know, is it quite possible or not? It is quite uncertain for me, until now.

What do you think? Will it be effective or not? Should I continue or stop this testing?

Sorry, for this long comment.

@ParthaPratimBanik
Copy link
Contributor

Due to my typo mistake wight in PR #86, commit 63ae71a, the DOMINANT result was not improved. It was always within 0.75-0.77.

After fixing the issue, now I got a maximum of ~0.8367 and a maximum average of ~0.78 on v1.0.0.

@kayzliu
Copy link
Member

kayzliu commented Aug 17, 2023

fixed in #86.

@kayzliu kayzliu closed this as completed Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants