Some output (`/search`) is not throttled; some LLM use (reranking, etc.) is throttled but should not be #714

dominiccooney · 2023-08-17T02:10:25Z

Forked from discussion in #704. After #699 is fixed so the scrolling behavior is back, that scrolling is too fast because /search sends a lot of content "at once."

We have "Typewriter" wired to chat clients here but:

The layer that intervention happens at is wrong. For example, it sits behind the BotResponseMultiplexer for recipes. The BRM will buffer and undo some of the typewriter effect; some BRM subscribers are code and don't need a typewriter effect. For the LLM reranker, the typewriter effect will slow down the reranker.
The effect doesn't work for search, because that doesn't use the chat client.

It would be good to move this incremental display closer to the web UI (maybe not into it, but closer to it) so that everything that a human sees uses it (and it's harmonized with the incremental Markdown handling) and things destined for machines can happen as fast as possible.

The text was updated successfully, but these errors were encountered:

arafatkatze · 2023-08-17T04:09:05Z

@dominiccooney Thanks for the information. I would love to work on this issue and look deeper into this.

dominiccooney · 2023-08-17T09:37:13Z

@arafatkatze Super! Thanks for your help with this.

I have a diff which moves the typewriter into MessageProvider which I'll post soon—I'm just working on updating tests. This improves the Typewriter (I added a method to wait for the typewriter to finish without forcibly stopping it and getting a huge chunk) and gets it out of the reranker and keyword expander path.

I haven't integrated Typewriter with the "search" output. So you might like to crib from the improved Typewriter in my diff and integrate it there, or move the typewriter out of the MessageProvider and closer to the chat view.

arafatkatze · 2023-08-19T09:58:11Z

@dominiccooney Thanks a lot for such a detailed response and guidance in this comment,
I was able to make the default typewriter work for the search scenario but it was obviously too fast as you had mentioned
What you need here will involve the typewriter "slowing down" the interaction from ContextSearch.

cody/lib/shared/src/chat/typewriter.ts

Lines 59 to 60 in 9066035

    
                        * Note: For particularly large chunks, this will result in a character chunk size that is far bigger than you would expect for a typing effect. 
        
                        * This is an accepted trade-off in order to ensure we stay in sync with the incoming text.

Basically I wanted to change this ^ a bit and after some tweaks to the typewriter max_delay I was able to make it "slow"

Screen.Recording.2023-08-19.at.2.09.57.AM.mov

Obviously doing the case statements for search case seems hacky to me and the way that I am doing it would probably have to make the max delay somehow a function of the length of the response that I got. Maybe for the short term it might be okay to do this but I am thinking of a better way.

I am trying to understand how to do a refactoring to solve these things cleanly and also take into account your work in explain-two.

…719) The Typewriter was integrated at a convenient point for other agents possibly, but it didn't work well for Cody-VScode. VScode's BotResponseMultiplexer buffers content to parse tags in the output, which chunks the incremental effect a bit. The typewriter also slowed down consumers like inline fixups which just want the complete code and don't care if it appears "typed." Since then, Cody-VScode has added LLM turns for keyword expansion and context re-ranking. The typewriter effect also applied to these, needlessly slowing them down. For me these turns are about twice as fast now the typewriter is out of their path. This moves the typewriter up to the MessageProvider, and only applies the typewriter effect to output destined for the chat sidebar. This makes all the code consumers faster because there's no animation delay. (The "typing" is very smooth now—maybe too smooth—because there's no coincidental chunking from the BotResponseMultiplexer.) This adds a way for the consumer to drain the buffered content out of the typewriter while still "animating" it, `flush`. It indicates how much backpressure there was on the typewriter that we didn't really notice this "chunk at the end" behavior. See #714. ## Test plan ``` pnpm test:unit && pnpm test:integration && pnpm test:e2e ``` Manual test: 1. Open Cody chat sidebar. 2. Ask it to enumerate the 50 states of the US in order of accession with their state flowers in a numbered list The output should appear incrementally and completely. --------- Co-authored-by: Naman Kumar <naman@outlook.in>

dominiccooney · 2023-08-21T01:27:34Z

@arafatkatze Yeah, the typewriter throttle really comes way off when there's a lot of pressure behind it. I got some feedback from @toolmantim that the typing effect would be better word-by-word anyway. So let me simplify it a bit to do that and change the heuristic to not open the throttle so much. Then the integration you've done should work well.

Don't be slowed down by what I'm suggesting in explain-two, that needs work, it may be a while.

arafatkatze · 2023-08-21T01:40:07Z

@dominiccooney Right now I was doing this as a clean way to typewrite. It's half finished(commit) but essentially my goal was to morph the typewritter interface to support longer delaytimes or changing the algo somehow so that context search stuff also gets gracefully printed. I am happy to wait for your algo change to merge so that I can merge my stuff on top of it.

arafatkatze · 2023-08-21T01:46:04Z

@dominiccooney The other thing that I was thinking of but didn't pursue further was. Would it be a good idea to have some sort of a queue based system where the queue would take in all the incoming input and always "typewrite" it at the same graceful speed so that the algo doesn't have to have complex algo scenarios where the typing speed is somehow a function of

Length of the input(Incase of context-search)
Speed of incoming input(In case of LLM buffer inflow)

The typing speed is the same but the input can wary. Obviously there are many risky race conditions that can happen with this as i tried this in my attempts here doesn't work well for many reasons but I figured I would give it a shot.

arafatkatze · 2023-08-21T05:27:27Z

I completely understand explain-two was an experimental work but the idea of two interactions seemed very interesting. My only recommendation and you probably already thought of it anyway would be to have an interaction manager interface just to separate out the functions for getting different interactions, showing different displays and executing different cases.

github-actions · 2024-01-25T01:49:36Z

This issue is marked as stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed automatically in 5 days.

arafatkatze · 2024-01-26T08:25:39Z

We have moved to a different API for this so I am closing this issue

dominiccooney added the clients/vscode label Aug 17, 2023

dominiccooney mentioned this issue Aug 17, 2023

Use CSS scroll anchoring for most chat scroll pinning #704

Merged

dominiccooney mentioned this issue Aug 17, 2023

Move typewriter so it does not throttle keyword expansion, reranker #719

Merged

dominiccooney mentioned this issue Aug 21, 2023

Typewriter outputs larger chunks, up to short phrases #755

Closed

github-actions bot added the Stale label Jan 25, 2024

arafatkatze closed this as completed Jan 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some output (`/search`) is not throttled; some LLM use (reranking, etc.) is throttled but should not be #714

Some output (`/search`) is not throttled; some LLM use (reranking, etc.) is throttled but should not be #714

dominiccooney commented Aug 17, 2023

arafatkatze commented Aug 17, 2023

dominiccooney commented Aug 17, 2023

arafatkatze commented Aug 19, 2023 •

edited

Loading

dominiccooney commented Aug 21, 2023

arafatkatze commented Aug 21, 2023

arafatkatze commented Aug 21, 2023

arafatkatze commented Aug 21, 2023

github-actions bot commented Jan 25, 2024

arafatkatze commented Jan 26, 2024

Some output (/search) is not throttled; some LLM use (reranking, etc.) is throttled but should not be #714

Some output (/search) is not throttled; some LLM use (reranking, etc.) is throttled but should not be #714

Comments

dominiccooney commented Aug 17, 2023

arafatkatze commented Aug 17, 2023

dominiccooney commented Aug 17, 2023

arafatkatze commented Aug 19, 2023 • edited Loading

dominiccooney commented Aug 21, 2023

arafatkatze commented Aug 21, 2023

arafatkatze commented Aug 21, 2023

arafatkatze commented Aug 21, 2023

github-actions bot commented Jan 25, 2024

arafatkatze commented Jan 26, 2024

Some output (`/search`) is not throttled; some LLM use (reranking, etc.) is throttled but should not be #714

Some output (`/search`) is not throttled; some LLM use (reranking, etc.) is throttled but should not be #714

arafatkatze commented Aug 19, 2023 •

edited

Loading