<a href="https://colab.research.google.com/github/fani-lab/OpeNTF/blob/main/ipynb/gnn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

`OpeNTF-GNN` via `PyG`

`OpeNTF` previously used traditional embedding methods (non-graph based) like `doc2vec` to learn skill embeddings as an input alternative to the `1-hot` encoded skills. With graph neural networks (gnn) in `PyG`, we now have integrated graph-based skill embeddings. The gnns capture the synergistic collaborative ties within our transformed graph data to provide with significantly better embeddings for skills, or even direct recommendation of experts for a team via link prediction.

**Expert (Member) Graph Structures**

<p align="center"><img src='https://raw.githubusercontent.com/fani-lab/OpeNTF/refs/heads/main/docs/graph_structures.png' width="400" ></p>

`OpeNTF` applied with gnn aims to cover as many variations in graph structures for a given set of team instances. Currently, it implemented `heterogeneous`, `directed`, `unweighted` graph structures including `[[[skill, to, member]], sm]` bipartite, `[[[skill, to, team], [member, to, team]], stm]` tripartite and `[[[skill, to, team], [member, to, team], [loc, to, team]], stml]`, as seen in the figure, and can be set like:

`"+data.embedding.model.gnn.graph.structure=[[[skill, to, team], [member, to, team], [loc, to, team]], stml]"`

(see [`src/mdl/emb/__config__.yaml`](https://github.com/fani-lab/OpeNTF/blob/main/src/mdl/emb/__config__.yaml#L27) for more details)






**Transfer vs. End-to-End Learning with GNN**

Gnn methods on an expert graph can be used in either of following ways:

<p align="center"><img src='https://raw.githubusercontent.com/fani-lab/OpeNTF/refs/heads/main/docs/transfer.png' width="500" ></p>

1.  **Transfer-based [[WISE24](https://doi.org/10.1007/978-981-96-0567-5_15), [SIGIR21](https://doi.org/10.1145/3404835.3463105)]**: A gnn method is mainly trained to learn `skill` embeddings, overlooking the embeddings for other node types, and then fed (transfer) into an underlying multilabel classifier, e.g., non-variational feedforward neural net ([`src/mdl/fnn.py`](https://github.com/fani-lab/OpeNTF/blob/main/src/mdl/fnn.py)) or variational Bayesian ([`src/mdl/bnn.py`](https://github.com/fani-lab/OpeNTF/blob/main/src/mdl/bnn.py)). In this case, `OpeNTF` runs in embedding mode by setting `data.embedding.class_method` like

    `data.embedding.class_method=mdl.emb.gnn.Gnn_n2v` for [Node2Vec](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.models.Node2Vec.html)
    `data.embedding.class_method=mdl.emb.gnn.Gnn_m2v` for [MetaPath2Vec](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.models.MetaPath2Vec.html)
    `data.embedding.class_method=mdl.emb.gnn.Gnn_gs` for [GraphSAGE](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.conv.SAGEConv.html)


    (see [`src/__config__.yaml#L44`](https://github.com/fani-lab/OpeNTF/blob/main/src/__config__.yaml#L44) for more options)


    and the classifier model(s) is set by `models.instances` like

    `"models.instances=[mdl.fnn.Fnn, mdl.bnn.Bnn]"`

    (see [`src/__config__.yaml#L57`](https://github.com/fani-lab/OpeNTF/blob/main/src/__config__.yaml#L57) for more options)




In [None]:
!python main.py "cmd=[prep,train,test,eval]" \
                "models.instances=[mdl.fnn.Fnn, mdl.bnn.Bnn]" \
                data.domain=cmn.publication.Publication \
                data.source=../data/dblp/toy.dblp.v12.json \
                data.output=../output/dblp/toy.dblp.v12.json \
                ~data.filter \
                data.embedding.class_method=mdl.emb.gnn.Gnn_gs \
                "+data.embedding.model.gnn.graph.structure=[[[skill, to, team], [member, to, team], [loc, to, team]], stml]"

<p align="center"><img src='https://raw.githubusercontent.com/fani-lab/OpeNTF/refs/heads/main/docs/e2e.png' width="500" ></p>

2.   **End-to-End [[WSDM26, Under Review](https://)]**: A gnn method is used to directly predict expert-team links to recommend top-k expert members of a team, skipping the underlying multilabel classifier, as shown above. In this case, `OpeNTF` runs in embedding mode by setting `data.embedding.class_method` like in transfer-based but the classifier model is set `fixed` by `"models.instances=[mdl.emb.gnn.Gnn]"`



In [None]:
!python main.py "cmd=[prep,train,test,eval]" \
                "models.instances=[mdl.emb.gnn.Gnn]" \
                data.domain=cmn.publication.Publication \
                data.source=../data/dblp/toy.dblp.v12.json \
                data.output=../output/dblp/toy.dblp.v12.json \
                ~data.filter \
                data.embedding.class_method=mdl.emb.gnn.Gnn_gs \
                "+data.embedding.model.gnn.graph.structure=[[[skill, to, team], [member, to, team], [loc, to, team]], stml]"

**Hyperparameters**

`OpeNTF`'s codebase collect gnn's hyperparameters ...

We employ `mini-batching` strategy to extract smaller subgraphs as batches to accomodate large-scale dataset. The subgraphs are sampled based on surrounding neighborhood.


**Setup & Quickstart**



from quickstart script

The embedding generation pipeline consists of the models``d2v (Doc2Vec), m2v (Metapath2Vec), gs (GraphSAGE), gat (GraphAttention), gatv2 (GraphAttentionV2),
han (Heterogeneous Attention Network), gin (Graph Isomorphism Network) and gine (GIN-Edge feature enhanced).``



# Additional Resources

- [`WSDM` paper](https://)
- [`WISE` paper](https://)
- [`Radin sigir` paper](https://)
- [`Sagar ` paper](https://)
