Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support lazy loading the lora module for reducing the loading p… #434

Merged

Conversation

thincal
Copy link
Contributor

@thincal thincal commented Apr 23, 2024

What does this PR do?

Fixes #433

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Was this discussed/approved via a Github issue or the discord / slack channel? Please add a link
    to it if that's the case.
  • Did you write any new necessary tests?

Who can review?

@tgaddair

@thincal thincal marked this pull request as draft April 23, 2024 10:55
@thincal thincal marked this pull request as ready for review April 23, 2024 13:36
@thincal
Copy link
Contributor Author

thincal commented Apr 24, 2024

It seems that caching the handle from safe_open might be a better solution, but need to consider the file handle reference management that used by multiple layers, I will refine it later.

@thincal
Copy link
Contributor Author

thincal commented May 19, 2024

It seems that caching the handle from safe_open might be a better solution, but need to consider the file handle reference management that used by multiple layers, I will refine it later.

Still keep cache the filenames instead of filehandles, since that 1) safe_open needs the device info which differs during loading the lora modules, 2) safe_open is lazy loading until specific tensor loaded by get_tensor invoked, which is already the optimized behavior for our case.

@thincal
Copy link
Contributor Author

thincal commented May 19, 2024

@tgaddair could you help review this change ?

Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @thincal, thanks for the PR, and apologies for the slow review!

I had one question about the file handle, but happy to land this and iterate on it to see if there's any room to further optimize.

server/lorax_server/utils/adapter.py Outdated Show resolved Hide resolved
@thincal
Copy link
Contributor Author

thincal commented May 25, 2024

Looks great @thincal, thanks for the PR, and apologies for the slow review!

I had one question about the file handle, but happy to land this and iterate on it to see if there's any room to further optimize.

It is fine to land it firstly, since the safe_open is already lazy behavior and main overhead is about reading out the specific tensor.

@tgaddair
Copy link
Contributor

@thincal I noticed there's a failing test:

FAILED server/tests/adapters/test_medusa.py::test_batched_medusa_weights - safetensors_rust.SafetensorError: device cpu is invalid

Would you be able to take a look before we merge? We should be good to go once that's resolved.

@thincal
Copy link
Contributor Author

thincal commented Jun 11, 2024

@thincal I noticed there's a failing test:

FAILED server/tests/adapters/test_medusa.py::test_batched_medusa_weights - safetensors_rust.SafetensorError: device cpu is invalid

Would you be able to take a look before we merge? We should be good to go once that's resolved.

OK, I will finish it today, thanks.

@thincal thincal force-pushed the feat/support-lazy-loading-lora-module branch from 9b1ac96 to 6d91a88 Compare June 11, 2024 01:17
@thincal
Copy link
Contributor Author

thincal commented Jun 11, 2024

@tgaddair fix passed for server/tests/adapters/test_medusa.py::test_batched_medusa_weights, the remained errors seem related with repo access failure, could you help have a check ?

@thincal
Copy link
Contributor Author

thincal commented Jun 18, 2024

@tgaddair ping, sorry for my late response. Could you help review the revised commit? Thanks.

@thincal thincal closed this Aug 2, 2024
@thincal thincal reopened this Aug 2, 2024
@tgaddair
Copy link
Contributor

tgaddair commented Aug 2, 2024

Hey @thincal, very sorry for the delay here. Let me take a look now.

Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR and again really sorry for the delay in getting it landed! Sanity checks look good on my side. May run some more benchmarking later, but good to go ahead and land.

@tgaddair tgaddair merged commit 2e47e77 into predibase:main Aug 2, 2024
1 check failed
@thincal thincal deleted the feat/support-lazy-loading-lora-module branch August 4, 2024 02:24
@thincal
Copy link
Contributor Author

thincal commented Aug 4, 2024

@tgaddair it's glad to see this change be merged, and thanks for your support :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the latency of load_batched_adapter_weights
2 participants