[Feature Request]: Old consumer-grade high-end graphics cards, take 2080ti 11GB video memory as an example. Can video memory optimization be achieved?

### Self Checks

- [x] I have searched for existing issues [search for existing issues](https://github.com/infiniflow/ragflow/issues), including closed ones.
- [x] I confirm that I am using English to submit this report ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) ([Language Policy](https://github.com/infiniflow/ragflow/issues/5910)).
- [x] Please do not modify this template :) and fill in all the required fields.

### Is your feature request related to a problem?

```Markdown
2080ti 11GB, after loading deepseek-r1:14b, it reaches 9GB, after accessing the embedding model of the knowledge base. The graphics card memory is almost full, and it is no longer possible to communicate with the chatbot and perform operations.
```

### Describe the feature you'd like

Can the embedded model and the computational model be temporarily stored in the running memory, and then called into the video memory for computation whenever needed? This is a good method for consumer users with small graphics card memory but large running memory.

### Describe implementation you've considered

_No response_

### Documentation, adoption, use case

```Markdown

```

### Additional information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Old consumer-grade high-end graphics cards, take 2080ti 11GB video memory as an example. Can video memory optimization be achieved? #6424

Self Checks

Is your feature request related to a problem?

Describe the feature you'd like

Describe implementation you've considered

Documentation, adoption, use case

Additional information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request]: Old consumer-grade high-end graphics cards, take 2080ti 11GB video memory as an example. Can video memory optimization be achieved? #6424

Description

Self Checks

Is your feature request related to a problem?

Describe the feature you'd like

Describe implementation you've considered

Documentation, adoption, use case

Additional information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions