Memory leak for dataloader #20

jumxglhf · 2022-04-23T04:13:41Z

The amount of RAM the program takes quickly increments until the memory is overflown. I'm running this code with multi-processing setup and I don't know if the problem applies to single processing.

After a good while of investigation I noticed that the problem comes from the getitem function of the customized Dataset. I changed it to the following:

`
def getitem(self, index):

    return {
        'index' : index,
        'question' : self.question_prefix + " " + self.data[index]['question'],
        'target' : self.get_target(self.data[index]),
        'passages' : [self.f.format(c['title'], c['text']) for c in self.data[index]['ctxs'][:self.n_context]],
        'scores' : torch.tensor([float(c['score']) for c in self.data[index]['ctxs'][:self.n_context]]),
        'graph' :self.data[index]['graph'],
        'node_indices':self.data[index]['node_indices']
    }

`

and the memory issue is gone for me. It seems like as long as I define any local variable inside this function, the RAM will get blown eventually.

This operation might disable certain functionalities of the original code and makes certain corner cases crashing the training loop. Any other suggestions?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

jdf-prog · 2022-09-26T16:14:51Z

Well, I think you might miss some logic when rewriting the code to eliminate the local variable.
The following code works fine for me. Hope this can help you.

    return {
        'index' : index,
        'question' : self.question_prefix + " " + self.data[index]['question'],
        'target' : self.get_target(self.data[index]),
        'passages' : [(self.title_prefix + " {} " + self.passage_prefix + " {}").format(c['title'], c['text']) for c in self.data[index]['ctxs'][:self.n_context]] if ('ctxs' in self.data[index] and self.n_context is not None) else None,
        'scores' : torch.tensor([float(c['score']) for c in self.data[index]['ctxs'][:self.n_context]]) if ('ctxs' in self.data[index] and self.n_context is not None) else None,
    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak for dataloader #20

Memory leak for dataloader #20

jumxglhf commented Apr 23, 2022 •

edited

jdf-prog commented Sep 26, 2022 •

edited

Memory leak for dataloader #20

Memory leak for dataloader #20

Comments

jumxglhf commented Apr 23, 2022 • edited

jdf-prog commented Sep 26, 2022 • edited

jumxglhf commented Apr 23, 2022 •

edited

jdf-prog commented Sep 26, 2022 •

edited