Lonnnnnnnnng context load time before generation #34

generic-username0718 · 2023-03-13T04:14:40Z

I'm running llama 65b on dual 3090s and at longer contexts I'm noticing seriously long context load times (the time between sending a prompt and tokens actually being received/streamed). It seems my CPU is only using a single core and maxing it out to 100%... Is there something it's doing that's heavily serialized? ... Any way to parallelize the workflow?

qwopqwop200 · 2023-03-13T05:31:01Z

What code did you run?

USBhost · 2023-03-13T23:49:03Z

I would like to confirm this issue as well. It really becomes noticeable when they're running chat vs normal/notebook. Chat with nothing set runs really fast but once you start putting context etc... start up speed just takes a nose dive.

4bit 65b on my A6000

plhosk · 2023-03-14T05:27:47Z

In the case of llama.cpp, when a long prompt is given you can see it output the provided prompt word by word at a slow rate even before it starts generating anything new. It's directly evident that it takes a longer time to to get through larger prompts. I guess a similar thing is happening here.

USBhost · 2023-03-20T03:00:50Z

So I compared B&B 8bit and GPTQ 8bit and GPTQ was the only one that had a start delay. Something is causing a delay before anything starts generating.

Digitous · 2023-03-20T21:49:25Z

Runs pretty well once it starts.. not sure if it's loading something, reading layers then inferencing. It's definitely got its quirks of new tech, might just be a case of "well that's how it works"

aljungberg · 2023-03-29T09:17:43Z

Probably fixed now, see #30.

qwopqwop200 · 2023-04-02T02:44:23Z

I think this issue has been resolved.

BugReporterZ mentioned this issue Mar 18, 2023

Strange generation speed oobabooga/text-generation-webui#391

Closed

qwopqwop200 closed this as completed Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lonnnnnnnnng context load time before generation #34

Lonnnnnnnnng context load time before generation #34

generic-username0718 commented Mar 13, 2023

qwopqwop200 commented Mar 13, 2023

USBhost commented Mar 13, 2023

plhosk commented Mar 14, 2023 •

edited

USBhost commented Mar 20, 2023

Digitous commented Mar 20, 2023

aljungberg commented Mar 29, 2023

qwopqwop200 commented Apr 2, 2023

Lonnnnnnnnng context load time before generation #34

Lonnnnnnnnng context load time before generation #34

Comments

generic-username0718 commented Mar 13, 2023

qwopqwop200 commented Mar 13, 2023

USBhost commented Mar 13, 2023

plhosk commented Mar 14, 2023 • edited

USBhost commented Mar 20, 2023

Digitous commented Mar 20, 2023

aljungberg commented Mar 29, 2023

qwopqwop200 commented Apr 2, 2023

plhosk commented Mar 14, 2023 •

edited