Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak for dataloader #20

Open
jumxglhf opened this issue Apr 23, 2022 · 1 comment
Open

Memory leak for dataloader #20

jumxglhf opened this issue Apr 23, 2022 · 1 comment

Comments

@jumxglhf
Copy link

jumxglhf commented Apr 23, 2022

The amount of RAM the program takes quickly increments until the memory is overflown. I'm running this code with multi-processing setup and I don't know if the problem applies to single processing.

After a good while of investigation I noticed that the problem comes from the getitem function of the customized Dataset. I changed it to the following:

`
def getitem(self, index):

    return {
        'index' : index,
        'question' : self.question_prefix + " " + self.data[index]['question'],
        'target' : self.get_target(self.data[index]),
        'passages' : [self.f.format(c['title'], c['text']) for c in self.data[index]['ctxs'][:self.n_context]],
        'scores' : torch.tensor([float(c['score']) for c in self.data[index]['ctxs'][:self.n_context]]),
        'graph' :self.data[index]['graph'],
        'node_indices':self.data[index]['node_indices']
    }

`

and the memory issue is gone for me. It seems like as long as I define any local variable inside this function, the RAM will get blown eventually.

This operation might disable certain functionalities of the original code and makes certain corner cases crashing the training loop. Any other suggestions?

Thanks in advance!

@jdf-prog
Copy link

jdf-prog commented Sep 26, 2022

Well, I think you might miss some logic when rewriting the code to eliminate the local variable.
The following code works fine for me. Hope this can help you.

    return {
        'index' : index,
        'question' : self.question_prefix + " " + self.data[index]['question'],
        'target' : self.get_target(self.data[index]),
        'passages' : [(self.title_prefix + " {} " + self.passage_prefix + " {}").format(c['title'], c['text']) for c in self.data[index]['ctxs'][:self.n_context]] if ('ctxs' in self.data[index] and self.n_context is not None) else None,
        'scores' : torch.tensor([float(c['score']) for c in self.data[index]['ctxs'][:self.n_context]]) if ('ctxs' in self.data[index] and self.n_context is not None) else None,
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants