Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Good ideas from llama.cpp #15

Closed
2 of 6 tasks
setzer22 opened this issue Mar 15, 2023 · 11 comments
Closed
2 of 6 tasks

Good ideas from llama.cpp #15

setzer22 opened this issue Mar 15, 2023 · 11 comments
Labels
issue:enhancement New feature or request

Comments

@setzer22
Copy link
Collaborator

setzer22 commented Mar 15, 2023

I've been tracking the llama.cpp repo. I'll use this issue to list any good ideas / things we should be aware of to keep up with in Rust land:

@philpax
Copy link
Collaborator

philpax commented Mar 16, 2023

Suggest pinning this issue :>

@Narsil
Copy link

Narsil commented Mar 16, 2023

For the tokenizer item I suggest using https://github.com/huggingface/tokenizers/

Should work out of the box once converted (when this PR lands: huggingface/transformers#21955 it should become a simple let tokenizer = Tokenizer::from_file("filename") ) Cheers!

@philpax
Copy link
Collaborator

philpax commented Mar 16, 2023

RMS norm landed, but they've reported regressions. Need to keep an eye on that.

@dongs0104
Copy link

@Narsil Llamatokenizer need to byte fallback option.🥹

For the tokenizer item I suggest using https://github.com/huggingface/tokenizers/

Should work out of the box once converted (when this PR lands: huggingface/transformers#21955 it should become a simple let tokenizer = Tokenizer::from_file("filename") ) Cheers!

@Narsil
Copy link

Narsil commented Mar 17, 2023

Good news everyone !

huggingface/tokenizers#1183

(If this goes, I'll try to make a release soon after)

@philpax
Copy link
Collaborator

philpax commented Mar 17, 2023

Awesome! Looking forward to it :D

@dnlmlr
Copy link

dnlmlr commented Mar 18, 2023

A small comment on the parallel loading: It is definitely possible to improve IO reads by parallelizing. This is much more effective on SSDs but still works on HDDs due to caching at different layers. However this should be configurable since the performance can start to degrade at certain points of parallelism, depending on the storage medium and also stuff like the kernel and buffer sizes

@Narsil
Copy link

Narsil commented Mar 20, 2023

@dnlmlr Do you have bench to back that up ? I didn't found that to be the case whenever I tried.

Memory-mapping was always consistently better than reading a file (Provided you need the whole file) and it doesn't require parallism (at user-level that is, no idea how the kernel is handling it)

@setzer22 setzer22 pinned this issue Mar 20, 2023
@philpax philpax added the issue:enhancement New feature or request label Mar 24, 2023
@philpax
Copy link
Collaborator

philpax commented Mar 26, 2023

@setzer22 Are you okay with me closing this issue and splitting it into individual issues?

@setzer22
Copy link
Collaborator Author

Yup, sounds good 👍

@philpax
Copy link
Collaborator

philpax commented Mar 26, 2023

This issue has been superseded by #35, #62, #78, #79 and #80.

@philpax philpax closed this as completed Mar 26, 2023
@philpax philpax unpinned this issue Apr 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
issue:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants