Skip to content

Conversation

@maekawatoshiki
Copy link
Contributor

@maekawatoshiki maekawatoshiki commented Mar 12, 2023

Hello!

I noticed that the model loader is not using buffered IO, so I added a piece of code for buffering.
I measured the loading time only for llama 7B on my M1 Pro Macbook, but it reduced the time from 1316ms to 749ms.

main.cpp Outdated
fin.close();
}

free(f_buf);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f_buf will not be free if this function returns earlier, but I think it does not matter since it's a small amount of memory :)

@maekawatoshiki
Copy link
Contributor Author

maekawatoshiki commented Mar 13, 2023

Thank you for your review. Fixed as you mentioned.

@ggerganov ggerganov merged commit 63fd76f into ggml-org:master Mar 13, 2023
rooprob pushed a commit to rooprob/llama.cpp that referenced this pull request Aug 2, 2023
Speed up rmsnorm by using sqrtf/expf
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants