Assistance for multi-GPU usage with GPN-MSA #24
Replies: 3 comments
-
Hello! Our model, based on Huggingface Roformer implementation, does not support flash attention yet. Do you have a way to turn flash attention off? I'm planning to adapt it to allow flash attention, but probably not very soon. |
Beta Was this translation helpful? Give feedback.
-
Hello, |
Beta Was this translation helpful? Give feedback.
-
Sorry I haven't been able to implement this. BTW, if you just want the regular variant effect prediction scores (log-likelihood ratio) from our model, we have precomputed scores for the whole genome: |
Beta Was this translation helpful? Give feedback.
-
Hello,
I hope this message finds you well. I am currently working with the GPN-MSA model, and I am eager to utilize it on multiple GPUs for parallelized computation. However, I've encountered a challenge due to the lack of support for Flash Attention 2.0 in the current model.
I attempted to use GPU by adding the following line in run_inference:
inference.model.to(training_args.device)
This is the error message I encounter:
ValueError: GPNRoFormerForMaskedLM does not support Flash Attention 2.0 yet. Please request to add support where the model is hosted, on its model hub page: https://huggingface.co/songlab/gpn-msa-sapiens/discussions/new or in the Transformers GitHub repo: https://github.com/huggingface/transformers/issues/new
I have submitted the request, but while awaiting a resolution, I am reaching out to seek your guidance and assistance. If you could provide any insights, recommendations, or suggest alternative approaches for parallelization, it would be immensely helpful.
I appreciate your time and expertise.
Best regards.
Beta Was this translation helpful? Give feedback.
All reactions