You're absolutely right, just looking at the edge_index dimensions in the printed batch it's ambiguous whether it contains only within batch edges or all edges.

To confirm what is happening, we need to check the actual code implementation of the `LinkNeighborLoader`.

Looking at the [source code ↗](https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/loader/link_neighbor_loader.py#L142), we can see that it extracts the edge indices per edge type from the original graph:

```python 
subgraph = Subgraph(graph=data, edge_types=edge_types, ...)

input_edge_index, input_edge_attr = subgraph.edge_label_index[edge_label_index]
output_edge_index = subgraph.edge_label[edge_label_index[0]]
```

And then only keeps the indices that correspond to nodes present in the sampled batch: 

```python
_, input_node_idx, output_node_idx = subgraph(
    subset_n_id,
    relabel_nodes=True,
    input_nodes=input_node_idx, 
    output_nodes=output_node_idx)
   
input_edge_index = input_edge_index[:, input_node_idx]  
output_edge_index = input_edge_index[:, output_node_idx]
```

So we can conclude that `LinkNeighborLoader` is only including the within-batch edges in each `edge_index`, not the full graph edges.

This allows efficient and scalable batch-wise message passing during training.

Let me know if this helps explain what's happening under the hood! Analyzing the source code is often necessary to understand the full picture.

In [None]:
  # # log epoch metrics in tqdm progress bar
        # print("Epoch Number:", epoch)   
        # print("Epoch time:", time.time() - start_time)
        # print("Train loss:", train_loss)
        # print("Train accuracy:", train_accuracy)
        # print("Train precision:", train_precision)
        # print("Train recall:", train_recall)
        # print("Train f1:", train_f1)
        # print("Train roc_auc:", train_roc_auc)
        # print("Train pr_auc:", train_pr_auc)
        # print("Train average_precision:", train_average_precision)
        
        # print("Val loss:", val_loss)
        # print("Val accuracy:", val_accuracy)
        # print("Val precision:", val_precision)
        # print("Val recall:", val_recall)
        # print("Val f1:", val_f1)
        # print("Val roc_auc:", val_roc_auc)
        # print("Val pr_auc:", val_pr_auc)
        # print("Val average_precision:", val_average_precision)
        # print("Best val f1:", best_val_metrics)
        # print()

        ## log metrics in tqdm progress bar with continuous updates        
        # tqdm.tqdm.write("Epoch Number: {}".format(epoch))
        # tqdm.tqdm.write("Learning rate: {}".format(lr))
        # tqdm.tqdm.write("Epoch time: {}".format(time.time() - start_time))
        # tqdm.tqdm.write("Train loss: {}".format(train_loss))
        # tqdm.tqdm.write("Train accuracy: {}".format(train_accuracy))
        # tqdm.tqdm.write("Train precision: {}".format(train_precision))
        # tqdm.tqdm.write("Train recall: {}".format(train_recall))
        # tqdm.tqdm.write("Train f1: {}".format(train_f1))
        # tqdm.tqdm.write("Train roc_auc: {}".format(train_roc_auc))
        # tqdm.tqdm.write("Train pr_auc: {}".format(train_pr_auc))
        # tqdm.tqdm.write("Train average_precision: {}".format(train_average_precision))

        # tqdm.tqdm.write("Val loss: {}".format(val_loss))
        # tqdm.tqdm.write("Val accuracy: {}".format(val_accuracy))
        # tqdm.tqdm.write("Val precision: {}".format(val_precision))
        # tqdm.tqdm.write("Val recall: {}".format(val_recall))
        # tqdm.tqdm.write("Val f1: {}".format(val_f1))
        # tqdm.tqdm.write("Val roc_auc: {}".format(val_roc_auc))
        # tqdm.tqdm.write("Val pr_auc: {}".format(val_pr_auc))
        # tqdm.tqdm.write("Val average_precision: {}".format(val_average_precision))

        # tqdm.tqdm.write("Best val f1: {}".format(best_val_metrics))
        
        # Update tqdm progress bars with values

Good catch! Even with disjoint=False (the default value), the LinkNeighborLoader can still produce duplicated edges in the sampled graph.

This happens because of the way sampling works in LinkNeighborLoader:

It first samples a batch of nodes (the "center nodes")
Then samples neighbors of those center nodes
Duplicated edges can occur if two center nodes share the same neighbor
For example:

asciidoc
Copy
Batch nodes: [A, B] 
Neighbors of A: [C, D]
Neighbors of B: [E, C] 

= Edges: 
   A-C, A-D, B-E, B-C
Nodes C and D are neighbors of both A and B, creating duplicate C-A and C-B edges.

To prevent this, disjoint=True makes LinkNeighborLoader only sample neighbors that are not already a center node.

Some ways to get disjoint sampling:

Set disjoint=True in LinkNeighborLoader
Use NeighborSampler with return_e_id=True
Post-process to filter duplicate edges
So in summary, the default disjoint=False can produce duplicates, while disjoint=True enforces edge sets to be disjoint between center nodes. This prevents overlap and duplicated edges in the sampled graph batches.