Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is model split in OPT TP mode? #120

Open
larry-fuy opened this issue Dec 13, 2022 · 2 comments
Open

Is model split in OPT TP mode? #120

larry-fuy opened this issue Dec 13, 2022 · 2 comments

Comments

@larry-fuy
Copy link

I tried HF OPT-13b on a 4 GPU machine with tensor-parallel: 4. One observation is all GPUs used the same amount of memory (~25G). It is consistent with other users report. And I also found the memory is as same as the memory used when tensor-parallel: 2. So my question is whether the model is split after it is loaded into CPU memory as said in this thread? My understanding is the memory should be a fourth if the model is split when tensor-parallel: 4 and a second when tensor-paralle: 2.

By the way, I also didn't really find latency reduction when increasing tensor parallel number (the latency only has 2 or 3 ms difference).

@mrwyattii
Copy link
Contributor

Hi @larry-fuy thanks for using MII! I'll assume you're checking memory using with nvidia-smi - if that's the case, you are likely seeing that the total memory usage per GPU includes cached memory that can be freed. This is due to how we are loading and splitting a model across GPUs. I've created a PR that will empty the torch cache after splitting the model and the correct amount of memory usage is now reported. Please try it out: #121

As for the latency, could you share some more details to help me understand your setup?

  • what GPUs are you running on and how many?
  • how are you measuring latency?
  • what are the exact measurements you have?

@wangshankun
Copy link

@mrwyattii how to get latency using mii api?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants