feat: support lazy loading the lora module for reducing the loading p… #434

thincal · 2024-04-23T07:13:25Z

What does this PR do?

Fixes #433

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Was this discussed/approved via a Github issue or the discord / slack channel? Please add a link
to it if that's the case.
Did you write any new necessary tests?

Who can review?

thincal · 2024-04-24T02:40:48Z

It seems that caching the handle from safe_open might be a better solution, but need to consider the file handle reference management that used by multiple layers, I will refine it later.

thincal · 2024-05-19T02:25:54Z

It seems that caching the handle from safe_open might be a better solution, but need to consider the file handle reference management that used by multiple layers, I will refine it later.

Still keep cache the filenames instead of filehandles, since that 1) safe_open needs the device info which differs during loading the lora modules, 2) safe_open is lazy loading until specific tensor loaded by get_tensor invoked, which is already the optimized behavior for our case.

thincal · 2024-05-19T02:27:18Z

@tgaddair could you help review this change ?

tgaddair

Looks great @thincal, thanks for the PR, and apologies for the slow review!

I had one question about the file handle, but happy to land this and iterate on it to see if there's any room to further optimize.

server/lorax_server/utils/adapter.py

thincal · 2024-05-25T12:22:43Z

Looks great @thincal, thanks for the PR, and apologies for the slow review!

I had one question about the file handle, but happy to land this and iterate on it to see if there's any room to further optimize.

It is fine to land it firstly, since the safe_open is already lazy behavior and main overhead is about reading out the specific tensor.

tgaddair · 2024-05-25T20:13:23Z

@thincal I noticed there's a failing test:

FAILED server/tests/adapters/test_medusa.py::test_batched_medusa_weights - safetensors_rust.SafetensorError: device cpu is invalid

Would you be able to take a look before we merge? We should be good to go once that's resolved.

thincal · 2024-06-11T01:08:00Z

@thincal I noticed there's a failing test:
FAILED server/tests/adapters/test_medusa.py::test_batched_medusa_weights - safetensors_rust.SafetensorError: device cpu is invalid
Would you be able to take a look before we merge? We should be good to go once that's resolved.

OK, I will finish it today, thanks.

…lace

thincal · 2024-06-11T01:42:18Z

@tgaddair fix passed for server/tests/adapters/test_medusa.py::test_batched_medusa_weights, the remained errors seem related with repo access failure, could you help have a check ?

thincal · 2024-06-18T09:51:14Z

@tgaddair ping, sorry for my late response. Could you help review the revised commit? Thanks.

tgaddair · 2024-08-02T22:42:17Z

Hey @thincal, very sorry for the delay here. Let me take a look now.

tgaddair

Thanks for this PR and again really sorry for the delay in getting it landed! Sanity checks look good on my side. May run some more benchmarking later, but good to go ahead and land.

thincal · 2024-08-04T02:26:13Z

@tgaddair it's glad to see this change be merged, and thanks for your support :)

thincal marked this pull request as draft April 23, 2024 10:55

thincal marked this pull request as ready for review April 23, 2024 13:36

tgaddair mentioned this pull request May 3, 2024

Improve async load for adapters to avoid main thread lockups in server #457

Open

thincal force-pushed the feat/support-lazy-loading-lora-module branch from bad816f to 9b1ac96 Compare May 19, 2024 02:03

tgaddair approved these changes May 23, 2024

View reviewed changes

server/lorax_server/utils/adapter.py Outdated Show resolved Hide resolved

LS added 3 commits June 11, 2024 09:17

feat: support lazy loading the lora module for reducing the loading p…

e01df4a

…lace

fix: store the layer:filename pair in module_map for lazy loading

66784f5

fix: add missing imports

6d91a88

thincal force-pushed the feat/support-lazy-loading-lora-module branch from 9b1ac96 to 6d91a88 Compare June 11, 2024 01:17

fix: work with cpu device object

86085df

thincal closed this Aug 2, 2024

thincal reopened this Aug 2, 2024

tgaddair added 3 commits August 2, 2024 15:46

Merge

5e6bf0a

Fix import cycle

1061611

Formatting

a078b84

tgaddair approved these changes Aug 2, 2024

View reviewed changes

tgaddair merged commit 2e47e77 into predibase:main Aug 2, 2024
1 check failed

thincal deleted the feat/support-lazy-loading-lora-module branch August 4, 2024 02:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support lazy loading the lora module for reducing the loading p… #434

feat: support lazy loading the lora module for reducing the loading p… #434

thincal commented Apr 23, 2024

thincal commented Apr 24, 2024 •

edited

Loading

thincal commented May 19, 2024 •

edited

Loading

thincal commented May 19, 2024

tgaddair left a comment

thincal commented May 25, 2024

tgaddair commented May 25, 2024

thincal commented Jun 11, 2024

thincal commented Jun 11, 2024

thincal commented Jun 18, 2024

tgaddair commented Aug 2, 2024

tgaddair left a comment

thincal commented Aug 4, 2024

feat: support lazy loading the lora module for reducing the loading p… #434

feat: support lazy loading the lora module for reducing the loading p… #434

Conversation

thincal commented Apr 23, 2024

What does this PR do?

Before submitting

Who can review?

thincal commented Apr 24, 2024 • edited Loading

thincal commented May 19, 2024 • edited Loading

thincal commented May 19, 2024

tgaddair left a comment

Choose a reason for hiding this comment

thincal commented May 25, 2024

tgaddair commented May 25, 2024

thincal commented Jun 11, 2024

thincal commented Jun 11, 2024

thincal commented Jun 18, 2024

tgaddair commented Aug 2, 2024

tgaddair left a comment

Choose a reason for hiding this comment

thincal commented Aug 4, 2024

thincal commented Apr 24, 2024 •

edited

Loading

thincal commented May 19, 2024 •

edited

Loading