Assistance for multi-GPU usage with GPN-MSA #24

YBoulaimen · 2024-02-16T13:03:51Z

YBoulaimen
Feb 16, 2024

Hello,

I hope this message finds you well. I am currently working with the GPN-MSA model, and I am eager to utilize it on multiple GPUs for parallelized computation. However, I've encountered a challenge due to the lack of support for Flash Attention 2.0 in the current model.

I attempted to use GPU by adding the following line in run_inference:

inference.model.to(training_args.device)
This is the error message I encounter:

ValueError: GPNRoFormerForMaskedLM does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co/songlab/gpn-msa-sapiens/discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new

I have submitted the request, but while awaiting a resolution, I am reaching out to seek your guidance and assistance. If you could provide any insights, recommendations, or suggest alternative approaches for parallelization, it would be immensely helpful.

I appreciate your time and expertise.

Best regards.

gonzalobenegas · 2024-02-19T23:54:52Z

gonzalobenegas
Feb 19, 2024
Maintainer

Hello! Our model, based on Huggingface Roformer implementation, does not support flash attention yet. Do you have a way to turn flash attention off?

I'm planning to adapt it to allow flash attention, but probably not very soon.

0 replies

YBoulaimen · 2024-02-20T09:21:19Z

YBoulaimen
Feb 20, 2024
Author

Hello,
Thank you for your prompt answer. I forgot to mention that I am working on infernece.py
Yes I can turn flash attention off. However the inference is too slow. It takes 30+ hours no matter the batch size.
Adding the dataloader_num_workers makes the model freeze (as you mentioned in the comments) and n_gpu doesn't seem to be taken into account in inference.py
That's why I'm having trouble parallelizing the inference.

0 replies

gonzalobenegas · 2024-03-02T23:31:09Z

gonzalobenegas
Mar 2, 2024
Maintainer

Sorry I haven't been able to implement this. BTW, if you just want the regular variant effect prediction scores (log-likelihood ratio) from our model, we have precomputed scores for the whole genome:
https://huggingface.co/datasets/songlab/gpn-msa-hg38-scores
But maybe you want to use another model/another kind of inference.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assistance for multi-GPU usage with GPN-MSA #24

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Assistance for multi-GPU usage with GPN-MSA #24

YBoulaimen Feb 16, 2024

Replies: 3 comments

gonzalobenegas Feb 19, 2024 Maintainer

YBoulaimen Feb 20, 2024 Author

gonzalobenegas Mar 2, 2024 Maintainer

YBoulaimen
Feb 16, 2024

gonzalobenegas
Feb 19, 2024
Maintainer

YBoulaimen
Feb 20, 2024
Author

gonzalobenegas
Mar 2, 2024
Maintainer