Training with a huge dataset class #9227
Unanswered
Charles-Ca
asked this question in
Q&A
Replies: 1 comment 2 replies
-
In general this is a valid approach. Two things:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
Thank you for the amazing framework; I really love it. I am currently testing to scale up a GNN on a huge dataset. I have a single graph that contains approximately 700 million edges and a lot of nodes. I am trying to train a graphSAGE on my data. My graph is heterogeneous with two types of nodes.
My graph can't fit into memory, so I made my own dataset class that works great. My dataset contains approximately 150 subgraphs, each with 5 million edges. The data is similar to the Aminer dataset (authors write papers), except that my nodes have features. This is why I am trying to use graphSAGE to do a link prediction task.
It should be okay to "break" my graph at a certain point, which is why I have used the dataset class, but maybe I am also wrong here?
I'm performing the following operations:
data = Dataset_Large(root=os.path.join(path, 'data'))
graph_loader = DataLoader(data, batch_size=1, shuffle=True)
for epoch in range(1, 15) :
total_loss = total_examples = 0
for batch_graph in graph_loader :
I am performing a kind of "batch of batch," and I am wondering if what I'm doing is correct or if it may be memory or computationally inefficient.
Thank you for your advice.
Beta Was this translation helpful? Give feedback.
All reactions