Skip to content

mietekrmd/expllama

Repository files navigation

The purpose of this repo is to play around with different architectures.

This is a fork from the repo https://github.com/jzhang38/TinyLlama. I made some minor quality of life changes borrowed by Karpathy's llama.c repo (https://https://github.com/karpathy/llama2.c). Changes include the following: I corrected the formula of how intermidiate_dim is calculated. Also the number of groups for the 120M network.

I also tighted the weights of head and embed layer.

Remark about speed: Playing around with small models (up to 160M), I noticed that Karpathy's implementation is faster due to compilation of the model. This conflicts with the latest vesion of flash-attention.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published