GitHub

The purpose of this repo is to play around with different architectures.

This is a fork from the repo https://github.com/jzhang38/TinyLlama. I made some minor quality of life changes borrowed by Karpathy's llama.c repo (https://https://github.com/karpathy/llama2.c). Changes include the following: I corrected the formula of how intermidiate_dim is calculated. Also the number of groups for the 120M network.

I also tighted the weights of head and embed layer.

Remark about speed: Playing around with small models (up to 160M), I noticed that Karpathy's implementation is faster due to compilation of the model. This conflicts with the latest vesion of flash-attention.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github		.github
lit_gpt		lit_gpt
out/tiny_LLaMA_120M		out/tiny_LLaMA_120M
pretrain		pretrain
scripts		scripts
sft		sft
speculative_decoding		speculative_decoding
.gitignore		.gitignore
EVAL.md		EVAL.md
LICENSE		LICENSE
PRETRAIN.md		PRETRAIN.md
README.md		README.md
README_zh-CN.md		README_zh-CN.md
llama.png		llama.png
requirements.txt		requirements.txt
script.sh		script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 6

Languages

License

mietekrmd/expllama

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages