Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many copies of the neural networks are created during inference? #812

Closed
rmehta1987 opened this issue Feb 24, 2023 · 6 comments
Closed

Comments

@rmehta1987
Copy link

I'm trying to figure ow many copies of the neural networks are created during inference as the memory requirements after 20 rounds go from 6 GB (rounds 1-10) to 24 during round 30. I have a embedding net which takes up space since the in-features are 100k and the last layer is dimension of 1x200. Though I am not sure if that is the problem.

@manuelgloeckler
Copy link
Contributor

manuelgloeckler commented Feb 27, 2023

Hey,

generally for each round a copy is saved as you can see here.

Unfortunatly as far as I know, there is no nice way to disabling caching all previous models. But after each round you can always manually set

inference._model_bank = inference._model_bank[-1:]

Note you need to keep at least the model of the last round in the model bank as, depending on the algorithm, it might be required to compute the loss.

Kind regards,
Manuel

@rmehta1987
Copy link
Author

Hi Manuel,

A deepcopy of the density estimator is also done during training, https://github.com/mackelab/sbi/blob/main/sbi/inference/snpe/snpe_base.py#L424 . Is this also necessary?

I don't think a deepcopy of the posterior would be costly memory wise as the it only depends on the dimensionality of the prior (I think).

@manuelgloeckler
Copy link
Contributor

Hey,

This deepcopy just ensures that any modification on the returned density estmator is not propagated to the _neural_net attributed, which is managed by the NeuralInference class. You do not have to use it and can delete it right away, you can still build the posterior without passing the density estimator the class then just uses the _neural_net attribute. See here, which also seems to not use an deepcopy.

A deepcopy of the posterior is equally (or more) costly as all of the different posterior classes also have the _neural_net attribute now renamed as posterior_estimator.

Kind regards,
Manuel

@rmehta1987
Copy link
Author

I commented out the line, https://github.com/mackelab/sbi/blob/main/sbi/inference/snpe/snpe_base.py as it seems to be only used for SNPE-B which has not yet been implemented.

The other memory usage comes from storing the simulated data after each round. As the simulated data is of large size with dim (1x111709) the number of simulations in each round and then subsequently the number of total rounds leads to a large number of tensors holding the dataset. For example if I create 50 simulations and perform a 100 rounds, the total memory usage of the dataset will be approximately 2.1 GB. Therefore if I wanted to increase the number of simulations to improve inference, most of the GPU memory would be taken up by the dataset.

For now I put the data onto the CPU and when reload to GPU during retraining after multiple rounds. Any other suggestions would be awesome!

Thank you for answering all the questions!

@manuelgloeckler
Copy link
Contributor

manuelgloeckler commented Mar 3, 2023

Hey,
it is generally common/recommended to keep the whole dataset on the CPU for high dimensional data (or even on disk if it does not fit into RAM). In these cases one typically only uploads the current batch of data required to compute the loss at each iteration of optimization. The batch size can be freely chosen such that the data will always fit into GPU memory.

This is also the behavior implemented in SBI i.e. even if your data is saved on CPU, the current batch required to compute the loss, will be moved to GPU (see here). As long as you have enough RAM I would recommend to store data always on CPU, the memory cost on GPU should then be constant across rounds. If it is still to large you might have to reduce the training_batch_size.

As you can see here you can set a data_device, which can be different the from the compute device. As you can see here if you do not pass it defaults to the compute device thus saving all data on e.g. GPU. So I would recommend to use something like

infer = SNPE_C(..., device="cuda")
infer.append_simulations(theta, x, data_device="cpu")

Maybe this is what you are already doing :)

Kind regards,
Manuel

@janfb janfb added this to the Hackathon 2024 milestone Feb 9, 2024
@janfb
Copy link
Contributor

janfb commented Jul 22, 2024

Manuel's answer is a valid solution for the presented issue.

In the long run, we will want to enable passing a custom dataloader that does all the data and memory handling.

@janfb janfb closed this as completed Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants