Skip to content

add a param to control cache in streamer when return output #36505

@ExcitingFrog

Description

@ExcitingFrog

Feature request

# After the symbol for a new line, we flush the cache.
put() under class TextStreamer will save serval tokens in cache, then return serveral token str in response. need a paramer to control token cache on or off.

code example

        >>> generation_kwargs = dict(inputs, streamer=streamer, max_new_tokens=20)
        >>> thread = Thread(target=model.generate, kwargs=generation_kwargs)
        >>> thread.start()
        >>> generated_text = ""
        >>> for new_text in streamer:
        ...     generated_text += new_text
        >>> generated_text

Motivation

special token and common token will return in same time in my project, for example 123<|obervation|> return in one time.we need to handle this situation. we create a new class and remove part of cache code.
i wonder to know if necessary to add a param to control streamer cache or another way to handle it.

Your contribution

i can submit a pr , it it is necessary to change it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions