Cache repeated parameter datatype queries for the same model #4976

isidentical · 2023-09-11T14:44:39Z

What does this PR do?

Fixes #4975. Adds a generic LRU cache for the expensive parameter datatype computation function with an optional fallback to the uncached implemnetation in cases where the underlying module is not hashable (since this can be any torch module subclass, it's possible that it might have overriden the __hash__ or added properties which make it unsafe to hash). From my local testing, the perf increase is very noticable (2X to 3X) more benchmarks are below.

Benchmarks

For the following script which loads the LoRA state dict into memory and then runs load_lora_weights/unload_lora_weights cycle 5 times, the results are as following:

	min	max	mean	median	mean speed-up
baseline (main)	5.44s	6.46s	6.02s	6.31s	1.0x
This PR	1.84s	2.05s	1.96s	1.94s	3.2x

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

patrickvonplaten · 2023-09-12T16:58:48Z

I'm sadly not seeing a big speed-up anymore (after having merged: #4994). Could you maybe double check once quickly if you still see the same speed-up with the current version of main?

isidentical · 2023-09-12T17:20:11Z

Yeah, my own benchmarks also show no noticable speed up with this PR on (I think this was because of non-meta devices and my sequence of optimizations also reduced the need for this). Closing this PR as its currently very good as is!

isidentical marked this pull request as ready for review September 11, 2023 15:14

This was referenced Sep 11, 2023

Avoid unnecessary copies when loading LoRAs #4979

Closed

Fast LoRA initialization by skipping redundant linearizations #4980

Closed

isidentical force-pushed the optimize-lora-loading branch from b05b130 to 0045efc Compare September 11, 2023 17:53

Cache repeated parameter datatype queries for the same model

160ed4b

isidentical force-pushed the optimize-lora-loading branch from 289b64f to 160ed4b Compare September 12, 2023 17:08

isidentical closed this Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache repeated parameter datatype queries for the same model #4976

Cache repeated parameter datatype queries for the same model #4976

Uh oh!

isidentical commented Sep 11, 2023 •

edited

Loading

Uh oh!

patrickvonplaten commented Sep 12, 2023

Uh oh!

isidentical commented Sep 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Cache repeated parameter datatype queries for the same model #4976

Cache repeated parameter datatype queries for the same model #4976

Uh oh!

Conversation

isidentical commented Sep 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Benchmarks

Before submitting

Who can review?

Uh oh!

patrickvonplaten commented Sep 12, 2023

Uh oh!

isidentical commented Sep 12, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

isidentical commented Sep 11, 2023 •

edited

Loading