This repository provides an comprehensive overview of available large language models (LLM) for natural language generation (NLG). Of particular interest to us are models with German language capabilities.
Name |
Size |
Model Card |
License |
Implementation |
Paper |
GPT-J |
6B |
MIT, Weights: Apache 2.0 |
- |
||
BLOOMZ |
560M 1.1B 1.7B 3B 7.1B 176B |
RAIL |
Muennighoff, Niklas, et al. Crosslingual generalization through multitask finetuning. (2022). DOI: https://doi.org/10.48550/arXiv.2211.01786 |
||
BLOOM german |
350M 1.5B 6.4B |
RAIL |
Ostendorff, Malte; Rehm, Georg. Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning. (2023). DOI: https://doi.org/10.48550/arXiv.2301.09626 |
||
OPT |
125M 350M 1.3B 2.7B 6.7B 13B 30B 66B |
OPT-LICENSE |
Zhang, Susan, et al. Opt: Open pre-trained transformer language models. (2022). DOI: https://doi.org/10.48550/arXiv.2205.01068 |
||
FLAN-T5 |
80M 250M 780M 3B 11B |
Apache 2.0 |
Chung, Hyung Won, et al. Scaling instruction-finetuned language models. (2022). DOI: https://doi.org/10.48550/arXiv.2210.11416 |
||
MT0 |
300M 580M 1.2B 3.7B 13B |
Apache 2.0 |
Muennighoff, Niklas, et al. Crosslingual generalization through multitask finetuning. (2022). DOI: https://doi.org/10.48550/arXiv.2211.01786 |
||
GPT2 |
117M 117M 1.5B ? ? ? |
malteos/gpt2-wechsel-german-ds-meg |
MIT |
https://github.com/CPJKU/wechsel https://github.com/bminixhofer/gerpt2 |
Minixhofer, Benjamin, et al. WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. (2021). DOI: http://dx.doi.org/10.18653/v1/2022.naacl-main.293 |
mGPT |
1.3B |
Apache 2.0 |
Shliazhko, Oleh, et al. mgpt: Few-shot learners go multilingual. (2022). DOI: https://doi.org/10.48550/arXiv.2204.07580 |
||
MT5 |
300M 580M 1.2B 3.7B 13B |
Apache 2.0 |
Xue, Linting, et al. mT5: A massively multilingual pre-trained text-to-text transformer. (2020). DOI: https://doi.org/10.18653/v1/2021.naacl-main.41 |