[Bug] Interactive mode allows input greater than context size leading to a crash

# Steps to Reproduce

Very easy to reproduce. Use `--interactive --interactive-first -c 32` and paste in anything over 32 tokens. After processing the prompt, it'll crash as soon as it generates the first token. It's just a segfault when compiled without BLAS, with cuBLAS:

    CUDA error 12 at ggml-cuda.cu:1567: invalid pitch argument

It appears the normal prompt size checking logic doesn't apply to input from interactive mode (or it's not functioning correctly). I verified that using `-f` and a prompt from a file does gracefully fail as expected:

    main: error: prompt is too long (185 tokens, max 124)

I didn't fill in the rest of the issue primarily because I'm a jerk but also because I don't think this issue could have anything to anything local.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Interactive mode allows input greater than context size leading to a crash #1768

Steps to Reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Interactive mode allows input greater than context size leading to a crash #1768

Description

Steps to Reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions