Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kindly need help to for several reproduction results #24

Open
ShuaiWang97 opened this issue Dec 5, 2022 Discussed in #23 · 3 comments
Open

kindly need help to for several reproduction results #24

ShuaiWang97 opened this issue Dec 5, 2022 Discussed in #23 · 3 comments

Comments

@ShuaiWang97
Copy link

Discussed in #23

Originally posted by ShuaiWang97 December 5, 2022
To the community,

Hope you had a great weekend.
Thank you so much for building this package! I am quite interested in hypergraph and learned a lot from the tutorial and source code. I tried to use the method and dataset from the package to reproduce several results. The performance on co-authorship dataset seems good but the performance on cocitation datasets seems a bit low. I checked the implements several times but did not find any problem. Can anyone please help me a bit?

The accuracy score on node classification of several cocitation datasets (CocitationCora,CocitationCiteseer,CocitationPubmed) by HGNN, HyperGCN, HGNN+ are followed and the code is attached. The way I change datasets and methods are just to change data and net variable. Any ideas are incredibly welcome. Thanks in advance.

image

import time
from copy import deepcopy

import torch
import torch.optim as optim
import torch.nn.functional as F

from dhg import Hypergraph,Graph
from dhg.data import Cooking200, CoauthorshipCora,CocitationCora,CocitationCiteseer,CoauthorshipDBLP, CocitationPubmed,\
                     Citeseer,Cora,Pubmed
from dhg.models import HGNN, HyperGCN, HGNNP
from dhg.random import set_seed
from dhg.metrics import HypergraphVertexClassificationEvaluator as Evaluator

from data import data
#from config import config



def train(net, X, A, lbls, train_idx, optimizer, epoch):
    net.train()

    st = time.time()
    optimizer.zero_grad()

    # import the data["features"] X and Graph structure G 
    outs = net(X, A)
    outs, lbls = outs[train_idx], lbls[train_idx]
    loss = F.cross_entropy(outs, lbls)
    #loss = F.nll_loss(outs, lbls) # decrease performance a lot
    loss.backward()
    optimizer.step()
    print(f"Epoch: {epoch}, Time: {time.time()-st:.5f}s, Loss: {loss.item():.5f}")
    return loss.item()


@torch.no_grad()
def infer(net, X, A, lbls, idx, test=False):
    net.eval()
    outs = net(X, A)
    outs, lbls = outs[idx], lbls[idx]
    if not test:
        res = evaluator.validate(lbls, outs)
    else:
        res = evaluator.test(lbls, outs)
    return res



if __name__ == "__main__":
    set_seed(2021)
    #args = config.parse()
    device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
    evaluator = Evaluator(["accuracy", "f1_score", {"f1_score": {"average": "micro"}}])
    
    # Load dataset of CocitationCiteseer, CocitationCora, CocitationPubmed
    #data = CocitationCora()
    data = CocitationCiteseer()

    # Build the hypergraph dataloader
    X, lbl = data["features"], data["labels"]
    HG = Hypergraph(data["num_vertices"], data["edge_list"])

    #net = HGNNP(data["dim_features"], 16, data["num_classes"], use_bn=False)
    net = HGNN(data["dim_features"], 16, data["num_classes"], use_bn=False)

    print("net is: ", net)
    optimizer = optim.Adam(net.parameters(), lr=0.01, weight_decay=0.0005)

    train_mask = data["train_mask"]
    val_mask = data["val_mask"]
    test_mask = data["test_mask"]
    print(f"length of train is : {sum(train_mask)}, length of val is: {sum(val_mask)},length of test is: {sum(test_mask)}")

    X, lbl = X.to(device), lbl.to(device)
    HG = HG.to(device)
    net = net.to(device)

    best_state = None
    best_epoch, best_val = 0, 0
    for epoch in range(200):
        # train
        train(net, X, HG, lbl, train_mask, optimizer, epoch)
        # validation
        if epoch % 10 == 0:
            with torch.no_grad():
                val_res = infer(net, X, HG, lbl, val_mask)
                print("val acc is: ",infer(net, X, HG, lbl, val_mask,test=True)["accuracy"])

                print("val_res is: ",val_res)
            if val_res > best_val:
                print(f"update best: {val_res:.5f}")
                best_epoch = epoch
                best_val = val_res
                best_state = deepcopy(net.state_dict())
    print("\ntrain finished!")
    print(f"best val: {best_val:.5f}")
    # test
    print("test...")
    net.load_state_dict(best_state)
    res = infer(net, X, HG, lbl, test_mask, test=True)
    print(f"final result: epoch: {best_epoch}")
    print(res)

Best,
Shuai

@yifanfeng97
Copy link
Member

Thanks for your attention. I try to debug it!

@ShuaiWang97
Copy link
Author

Thank you for the prompt response!
Most structure of the code is from the HGNN node classification example like def train, def infer and if __name__ == "__main__". I also think the several co-citation datasets are same from the ones used in HyperGCN repo . I only changed the net and data variable.
Please let me know if I can provide more information. Thanks in advance!

@bokveizen
Copy link

bokveizen commented Mar 19, 2024

As I can see, some nodes never appear in the edge list. I am not sure whether that is intended or not. For example, in CocitationPubmed, the number of nodes is 19717, but there are only 3840 nodes in the edge list.

I think we need to reorder the vertices to consecutive integers.

data_list = [
    (CoauthorshipCora, "cora_coauth"),
    (CoauthorshipDBLP, "dblp_coauth"),
    (CocitationCora, "cora_cocite"),
    (CocitationCiteseer, "citeseer_cocite"),
    (CocitationPubmed, "pubmed_cocite"),
]

for data_func, data_name in data_list:
    nodes_in_edges = set()
    for edge in data_func()["edge_list"]:
        nodes_in_edges.update(edge)
    print(data_name, len(nodes_in_edges), data_func()["num_vertices"])
cora_coauth 2388 2708
dblp_coauth 41302 41302
cora_cocite 1434 2708
citeseer_cocite 1458 3312
pubmed_cocite 3840 19717

Also, for CocitationPubmed, the training set is strangely small, while the val and test sets are identical.

print(CocitationPubmed()["train_mask"].sum())
print(CocitationPubmed()["val_mask"].sum())
print(CocitationPubmed()["test_mask"].sum())
print(torch.all(CocitationPubmed()["val_mask"] == CocitationPubmed()["test_mask"]))

tensor(78)
tensor(19639)
tensor(19639)
tensor(True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants