How to train the encoder for our own data? (A Knowledge graph and sample query) #16

rd27995 · 2021-04-19T16:41:50Z

Hi,

I have a target graph in the form of a directed networkx graphs with 14M nodes and 54M edges.
I wanted to know how can I make use of this target graph along with another query graph (of size 30 Nodes 33 Edges) to train the encoder?

I can only see options to make use of inbuilt datasets in PyTorch gemetric. Is there any simpler way I can use my own datasets?

jessxphil · 2021-04-22T06:52:16Z

I have the same question.

sML-90 · 2021-04-28T09:51:13Z

+1

qema · 2021-05-08T22:02:12Z

Thanks for the question and sorry for the late reply. There is not currently a user-facing mechanism to incorporate custom datasets due to the need to define things like train/test split and subgraph sampling -- in general one can create a new DataSource (see common/data.py) to handle new datasets. Note that a pretrained model (such as the one provided in the repo) may be able to handle testing on new datasets, in which case subgraph_matching/alignment.py can load in new graphs to evaluate on.

If the goal is to train on new datasets, as a bit of a hack, one could append an "elif" after this line:

neural-subgraph-learning-GNN/common/data.py

Line 55 in 4d074cb

dataset = [g for g in nx.graph_atlas_g()[1:] if nx.is_connected(g)]

with a spec for a new dataset:
elif name == 'newdataset': dataset = [list of networkx or pytorch geometric graphs]

and train using the command line option --dataset=newdataset-balanced and test with --dataset=newdataset-imbalanced.

rd27995 · 2021-05-10T16:30:48Z

Thanks @qema, I was able to train the network using my custom datasets, however, I get only around 70 % validation accuracy.
Any suggestions to improve the model accuracy or finetune it?
I am using all default model parameters.
The second plot depicts validation metrics.

qema · 2021-06-12T09:45:35Z

Hi @rd27995, please see the new experimental branch which supports node features and harder negative sampling. For now, the above procedure to add new datasets is still needed. However, one can now train with --dataset=newdataset-basis and test with --dataset=newdataset-imbalanced (-basis being the new data source with harder negative examples). Also, note that testing on the imbalanced dataset (which samples random pairs of graphs) may give a more realistic picture of model performance than validation (which uses an artificial 50-50 label split as well as artificially-generated negative examples).

rd27995 changed the title ~~How to train the encoder for our own data? (A Knowledge graph and sample query0~~ How to train the encoder for our own data? (A Knowledge graph and sample query) Apr 19, 2021

This was referenced Jun 28, 2021

Accessing Embedded Space & Decoder Inquiry #20

Closed

[Question] - About edge labels #21

Closed

wenchiangh mentioned this issue Apr 15, 2022

Does SPMiner works with node label and how? #25

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train the encoder for our own data? (A Knowledge graph and sample query) #16

How to train the encoder for our own data? (A Knowledge graph and sample query) #16

rd27995 commented Apr 19, 2021 •

edited

Loading

jessxphil commented Apr 22, 2021

sML-90 commented Apr 28, 2021

qema commented May 8, 2021 •

edited

Loading

rd27995 commented May 10, 2021 •

edited

Loading

qema commented Jun 12, 2021

How to train the encoder for our own data? (A Knowledge graph and sample query) #16

How to train the encoder for our own data? (A Knowledge graph and sample query) #16

Comments

rd27995 commented Apr 19, 2021 • edited Loading

jessxphil commented Apr 22, 2021

sML-90 commented Apr 28, 2021

qema commented May 8, 2021 • edited Loading

rd27995 commented May 10, 2021 • edited Loading

qema commented Jun 12, 2021

rd27995 commented Apr 19, 2021 •

edited

Loading

qema commented May 8, 2021 •

edited

Loading

rd27995 commented May 10, 2021 •

edited

Loading