Skip to content

[Bug] Interactive mode allows input greater than context size leading to a crash #1768

@KerfuffleV2

Description

@KerfuffleV2

Steps to Reproduce

Very easy to reproduce. Use --interactive --interactive-first -c 32 and paste in anything over 32 tokens. After processing the prompt, it'll crash as soon as it generates the first token. It's just a segfault when compiled without BLAS, with cuBLAS:

CUDA error 12 at ggml-cuda.cu:1567: invalid pitch argument

It appears the normal prompt size checking logic doesn't apply to input from interactive mode (or it's not functioning correctly). I verified that using -f and a prompt from a file does gracefully fail as expected:

main: error: prompt is too long (185 tokens, max 124)

I didn't fill in the rest of the issue primarily because I'm a jerk but also because I don't think this issue could have anything to anything local.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions