廣東話草泥馬 Cantonese Alpaca

Generate Cantonese Instruction dataset by Gemini Pro using Stanford's Alpaca prompts for fine-tuning LLMs. this repo contain a script to generate the dataset and manually translate seed prompts to Cantonese from Alpaca repo.

You can find the generated dataset on Huggingface here.

Pre-requisites

pip install -r requirements.txt

Usage

export GOOGLE_AISTUDIO_API_KEY=YOUR_API_KEY

python generate.py

Citation Information

@misc{alpaca,
  author = {Rohan Taori and Ishaan Gulrajani and Tianyi Zhang and Yann Dubois and Xuechen Li and Carlos Guestrin and Percy Liang and Tatsunori B. Hashimoto },
  title = {Stanford Alpaca: An Instruction-following LLaMA model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/tatsu-lab/stanford_alpaca}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

generate.py

generate.py

requirements.txt

requirements.txt

Repository files navigation

廣東話草泥馬 Cantonese Alpaca

Pre-requisites

Usage

Citation Information

About

Releases

Packages

Languages

License

hon9kon9ize/yue-alpaca

Folders and files

Latest commit

History

Repository files navigation

廣東話草泥馬 Cantonese Alpaca

Pre-requisites

Usage

Citation Information

About

Resources

License

Stars

Watchers

Forks

Languages