Add e2e link prediction example with temporal information for the distributed solution #8820
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This PR purpose is to add e2e link prediction example to run distributed training on a MovieLens dataset. It is characterized by the presence of temporal information for the edges.
This example is strongly inspired on: distributed_cpu.py and temporal_link_pred.py
IMPORTANT INFORMATION: This script depends on Enable distributed link hetero sampling PR. It will not work without the changes made in that PR. So that one should be merged first.
Script information:
How to run:
The example should be evaluated on at least 2 machines. Before running the example a prework needs to be done. It is necessary to generate a partitions of the MovieLens dataset using a partition_graph.py script using the command below:
python partition_graph.py --MovieLens --num_partitions {number of machines}
Example commands to run the script:
Node 0:
python ./distributed_link_temporal_cpu.py --num_nodes 2 --node_rank 0 --batch_size 1024 --master_addr {ip address of one of the machines} --ddp_port 11111 --train_loader_port 11112 --test_loader_port 11113
Node 1:
python ./distributed_link_temporal_cpu.py --num_nodes 2 --node_rank 1 --batch_size 1024 --master_addr {ip address of one of the machines} --ddp_port 11111 --train_loader_port 11112 --test_loader_port 11113
(As you can see the only difference is the
--node_rank
argument.)