GitHub - shichaog/large_language_modules: This is an implementation of the paper Attention is all you need.

About

This is a series implementation of GPT, from translation example to LLM examples.

The translation is based on a toy example implementation of paper《Attention is all you need》https://arxiv.org/abs/1706.03762.

The LLM currently is based on Llama-2, about SFT and Chinese Llama

In this repo I implement an english to french translation transformer in pytorch.

environment

My environment is venv build by my pycharm python 3.7.16 venv and all requirements add install by pycharm according to requirements.txt. you need install spacy language model in terminal by following shell commands:

#python3 -m spacy download en_core_web_sm
#python3 -m spacy download fr_core_news_sm

Dataset

The original data is stored in txt format in data directory, you can check each pair in en_to_fr.csv file.

model and training process

model and training data preprocessing are all in train_translation_model.py file. Since I already write a CSDN blog about detail of the implementation and there many references resources, So I'm not explain each line of the codes here.

I hope this repo can help you much. Good luck.

LlaMA-2 fine-tune

202308 Update： ADD LlaMA-2 fine-tune using single T4 GPU by colab.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
Finetuning_LLama_2_0_on_Colab_with_1_GPU.ipynb		Finetuning_LLama_2_0_on_Colab_with_1_GPU.ipynb
README.md		README.md
Sentencepiece_python_module_example.ipynb		Sentencepiece_python_module_example.ipynb
en_to_fr.csv		en_to_fr.csv
requirements.txt		requirements.txt
sft_llama2.ipynb		sft_llama2.ipynb
train_translation_model.py		train_translation_model.py
使用_Sentencepiece扩充LLama_2中文词汇.ipynb		使用_Sentencepiece扩充LLama_2中文词汇.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

environment

Dataset

model and training process

LlaMA-2 fine-tune

About

Releases

Packages

Languages

shichaog/large_language_modules

Folders and files

Latest commit

History

Repository files navigation

About

environment

Dataset

model and training process

LlaMA-2 fine-tune

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages