https://github.com/microsoft/DeepSpeedExamples/blob/e7c8cb767acddba8ad5d2c41fe18e30de7870d30/model_compression/bert/huggingface_transformer/modeling_bert.py#L383
In example of model compression, it says only change is line 383 "where we output attention_scores instead of attention_prob.". But this line is the same as hugging face and I think it does not output attention_scores.  Am i wrong or is there a typo?
By the way if only one line need to be changed, is it possible to apply deepspeed compression on deberta v2(hugging face https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py)?