Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #4

Open
4 of 6 tasks
tjtanaa opened this issue Nov 1, 2023 · 4 comments
Open
4 of 6 tasks

Roadmap #4

tjtanaa opened this issue Nov 1, 2023 · 4 comments

Comments

@tjtanaa
Copy link

tjtanaa commented Nov 1, 2023

  1. Port vllm/main feature to ROCm
  1. Benchmark
  • Long-Input-Long-Output benchmarking.
@HAN-oQo
Copy link

HAN-oQo commented Dec 1, 2023

Hi, @tjtanaa
I wonder how the roadmap is going on.
I quite excited to use AWQ quantized format, when can it be supported?

@tjtanaa
Copy link
Author

tjtanaa commented Dec 11, 2023

Hi, @tjtanaa I wonder how the roadmap is going on. I quite excited to use AWQ quantized format, when can it be supported?

@HAN-oQo Hi, vLLM authors said they are working on more efficient AWQ implementation on triton. So, we will address the AWQ on ROCm after they have released their new kernel.

@HAN-oQo
Copy link

HAN-oQo commented Dec 12, 2023

Thank you for answer! @tjtanaa
I also wonder why safetensor format is not supported, and do you have a plan to support it!

Thank you for offering the nice project.

@tjtanaa
Copy link
Author

tjtanaa commented Dec 13, 2023

Thank you for answer! @tjtanaa I also wonder why safetensor format is not supported, and do you have a plan to support it!

Thank you for offering the nice project.

@HAN-oQo The loading of safetensors is buggy on ROCm platform. The memory management during loading of safetensors might be causing the issue on ROCm platform. It often encounters this issue when tensor-parallelism is larger than 1; however, loading from pt is totally fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants