Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ring Attention is simply inefficient #2

Closed
nathanturk2 opened this issue Feb 14, 2024 · 2 comments
Closed

Ring Attention is simply inefficient #2

nathanturk2 opened this issue Feb 14, 2024 · 2 comments

Comments

@nathanturk2
Copy link

Hello there! Architecture innovator here! Everything preceding my model seems very inefficient.

@forhaoliu
Copy link
Member

Look forward to it :) !!

@Treemann
Copy link

Treemann commented Apr 7, 2024

Hi @lhao499 @nathanturk2 ,
With the default setting in run_eval_needle.sh, the inference seemed fast

{'context_length': 1000, 'depth_percent': 0.0, 'response': 'The special magic Jakarta number is 8394266.', 'answer': '8394266', 'correct': True, 'seed': 0}
{'context_length': 1000, 'depth_percent': 0.0, 'response': 'The special magic Damascus number is 1125686.', 'answer': '1125686', 'correct': True, 'seed': 1}
3%|████ | 2/60 [00:35<17:05, 17.69s/it]
{'context_length': 1000, 'depth_percent': 0.0, 'response': 'The special magic Belgrade number is 1585963.', 'answer': '1585963', 'correct': True, 'seed': 2}
{'context_length': 1000, 'depth_percent': 5.0, 'response': 'The special magic Los Angeles number is 2408249.', 'answer': '2408249', 'correct': True, 'seed': 0
}
7%|████████▏ | 4/60 [00:56<12:36, 13.52s/it]
{'context_length': 1000, 'depth_percent': 5.0, 'response': 'The special magic Cairo number is 6949798.', 'answer': '6949798', 'correct': True, 'seed': 1}
{'context_length': 1000, 'depth_percent': 5.0, 'response': 'The special magic Bangkok number is 1292394.', 'answer': '1292394', 'correct': True, 'seed': 2}

However, when I tried to use RingAttention with the following modification, it seemed very slow. Is it normal with the current implementation of RingAttention?
mesh_dim: '!1,-1,4,1' -> '!1,-1,4,2'
scan_attention / scan_mlp: True

{'context_length': 2421, 'depth_percent': 16.0, 'response': '', 'answer': '8394266', 'correct': False, 'seed': 1}
2%|██▍ | 1/50 [08:35<7:00:55, 515.43s/it]{'context_length': 2421, 'depth_percent': 16.0, 'response': '', 'answer': '1125686', 'correct': False, 'seed': 2}
4%|████▊ | 2/50 [16:47<6:41:31, 501.90s/it]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants