Multilingual Pre-trained Models Tutorial

Considering the rapid growth of the research of multilingual NLP, we have established this repository to gather relevant literature in this specific multilingual domain. (As a contribution of the survey paper "A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers")

This is also a tutorial of multilingual pre-trained models maintained by the Beijing Jiaotong University (BJTU) NLP Group (Continual Updated).

The past five years have witnessed the rapid development of multilingual pre-trained models, especially for data-driven large language models (LLMs). Due to the dominance of multilingual NLP at the present time, priority is given to collecting important, up-to-date multilingual pre-trained models papers and their performance. As one of the contributions of the survey, we continuously update and expand the content according to the chapters in the survey. Our list is still incomplete and the categorization might be inappropriate. We will keep adding papers and improving the list. Any suggestions are welcome!

LLMs with Multilingualism

We only present an overview of representative LLMs (most of trainable parameters greater than 7B) that have certain multilingual capabilities, including their release time and details. The latest models that achieve good performance on the leaderboard will be updated in a timely manner, or contact us for updates and promotion.

General Multilingual Leaderboard

We investigate the LLMs with multilingualism in our reconstructed benchmarks. (If there are many versions of a model, we only choose the version that perform the best.)

In this leaderboard we use a unified prompt for each task to explore the multilingual capabilities of the model. The potential enhancement capabilities of the model are explored in the next chapter "Multilingual Inference Strategies".

🎈 A suite for calling LLMs is coming soon! The benchmark is under built.

Open-Source Models

All models are available on the Internet. The link of paper or Github is given.

LLaMA, Meta AI
- LLaMA-1, LLaMA: Open and Efficient Foundation Language Models, 2023.02.27
- LLaMA-2, Llama 2: Open Foundation and Fine-Tuned Chat Models, 2023.07.18
- LLaMA-3, Meta Llama 3, 2024.04.18
GLM, ZHIPU, ChatGLM-6B, 2023.05.13, The other versions are released at their Github as well.
Baichuan, Baichuan AI
- Baichuan-1, Technical Report, 2023.06.15
- Baichuan-2, Baichuan 2: Open Large-scale Language Models, 2023.09.06
- Baichuan-3, Chat Platform, 2024.01.29
Qwen, Alibaba, Qwen Technical Report
- Qwen, Technical Report, 2023.08.03
- Qwen-1.5, Technical Report, 2024.02.05
Phi, Microsoft
- Phi-1, Textbooks Are All You Need, 2023.06.20
- Phi-1.5, Textbooks Are All You Need II: phi-1.5 technical report, 2023.09.11
- Phi-2, Phi-2: The surprising power of small language models, 2023.12.12
- Phi-3, Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone, 2024.04.23
Mistral, Mistral 7B
- Mistral 7B, 2023.10.10
- Mixtral 8x7B, 2023.12.11
- Mixtral 8x22B, 2024.04.17
OpenChat, Tsinghua University, OpenChat: Advancing Open-source Language Models with Mixed-Quality Data, 2023.09.20
Deepseek, DeepSeek AI, DeepSeek LLM Scaling Open-Source Language Models with Longtermism, 2024.01.05
InternLM, Shanghai AI Laboratory
- InternLM, Repo, 2023.09.20
- InternLM2, InternLM2 Technical Report, 2024.03.26
BLOOM, Big Science, BLOOM: A 176B-Parameter Open-Access Multilingual Language Model, 2022.11.09
- BLOOMZ-7b1, Hugging Face, Crosslingual Generalization through Multitask Finetuning, 2023.05.29
Bayling, Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS), BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models, 2023.06.19

Closed-Source Systems

We only investigate a few representative closed-source LLMs because most of these commercial systems are expensive to invoke. We hope to get sponsors or voluntary enterprise responses to compare closed-source systems, otherwise our goal is to discuss the future potential of LLMs within the open-source community.

GPT, OpenAI
- ChatGPT (GPT-3.5-turbo)
- GPT-4, GPT-4 Technical Report, 2023.05.15
PaLM, Google
- PaLM-2
Claude, Anthropic
Gemini, DeepMind, Chat with Gemini

Multilingual Inference Strategies

We investigate several inference strategies for LLMs to explore the potential enhancement capabilities with multilingualism in the related benchmarks. (The multilingual inference strategies are to act on prompt with external knowledge, and LLMs are frozen.)

Reasoning Leaderboard

Model	Method	MGSM	XCOPA	XNLI	PAWS-X	MKQA	Avg
GPT-3.5	Basic	34.4	72.3	52.2	49.7	35.4	48.8
	En-Basic	41.1	76.1	63.0	62.0	36.5	55.7
	CoT	49.9	72.4	50.6	50.3	35.4	51.7
	En-CoT	61.1	78.6	56.4	61.6	42.9	60.1
	XLT	62.3	79.3	59.2	59.4	37.6	59.6
	Trans-Google	73.9	84.5	60.5	67.2	43.8	66.0
	Trans-NLLB	61.0	79.7	59.2	67.5	37.2	60.9
BLOOMZ-7b1	Basic	1.3	21.4	8.3	-	7.8	9.7
	En-Basic	2.0	56.5	43.9	-	10.6	28.3
	CoT	1.2	20.9	8.2	-	6.5	9.2
	En-CoT	1.7	53.9	35.9	-	9.3	25.2
	XLT	1.7	50.5	35.4	-	8.0	23.9
	Trans-Google	2.7	63.7	44.3	-	17.2	32.0
	Trans-NLLB	2.4	61.8	43.8	-	14.7	30.7
Mistral-7B-Instruct	Basic	11.2	54.2	42.8	44.6	7.8	32.1
	En-Basic	23.8	34.9	50.2	46.9	7.0	32.6
	CoT	17.0	53.8	43.4	44.3	7.8	33.3
	En-CoT	27.6	40.8	50.0	46.6	11.5	35.3
	XLT	31.8	61.5	46.0	47.8	9.6	39.3
	Trans-Google	41.3	59.2	55.0	51.5	17.0	44.8
	Trans-NLLB	31.7	54.4	53.0	52.4	15.5	41.4
Llama-2-7B-Chat	Basic	8.4	46.5	34.6	48.1	14.4	30.4
	En-Basic	9.3	49.7	39.0	48.8	16.1	32.6
	CoT	10.9	46.3	35.6	48.3	13.3	30.9
	En-CoT	13.6	54.9	41.0	48.7	13.8	34.4
	XLT	10.4	50.8	44.8	44.5	14.6	33.0
	Trans-Google	28.6	67.7	45.5	57.5	19.8	43.8
	Trans-NLLB	24.8	64.6	44.1	56.2	17.4	41.4
Llama-2-13B-Chat	Basic	15.6	50.1	36.4	54.0	18.3	34.9
	En-Basic	19.0	54.3	43.4	59.1	20.2	39.2
	CoT	18.1	50.9	35.7	54.8	15.7	35.0
	En-CoT	19.9	54.5	43.7	57.6	19.8	39.1
	XLT	22.3	56.0	51.4	55.7	19.0	40.9
	Trans-Google	39.1	71.9	46.1	58.4	33.8	49.9
	Trans-NLLB	31.8	68.2	45.4	57.8	28.4	46.3
Llama-2-70B-Chat	Basic	23.6	51.5	39.0	52.8	24.8	38.3
	En-Basic	28.6	55.3	46.5	60.4	24.7	43.1
	CoT	23.5	50.5	37.9	54.9	21.9	37.7
	En-CoT	30.2	61.2	45.9	64.9	31.1	46.7
	XLT	32.8	58.7	52.2	55.7	26.6	45.2
	Trans-Google	53.3	80.9	54.0	68.5	39.7	59.3
	Trans-NLLB	43.8	77.1	52.2	69.2	19.4	52.3

*Note: This leaderboard is followed by Liu et al., we will update it in the next version when the evaluation suite for calling LLMs is built.

Baseline Details

[Question]: "制作一件袍子需要2匹蓝色纤维布料和这个数量一半的白色纤维布料。它一共需要用掉多少匹布料"
Basic: [Query]=[Question]+[Prompt: 您的最终答案的格式应为："答案: <阿拉伯数字>".]
En-Basic: [Query]=[Question]+[Prompt -> English Prompt: You should format your final answer as "Answer: <Arabic numeral>".]
CoT: [Query]=[Question]+[Prompt -> CoT: 让我们一步步思考。您的最终答案的格式应为："答案: <阿拉伯数字>".]
En-CoT: [Query]=[Question]+[Prompt -> English CoT: Let’s think step by step in English. You should format your final answer as "Answer: <Arabic numeral>".]
XLT: [Query]=[Prefix: I want you to act as an arithmetic reasoning expert for Chinese.Request: ]+[Question]+[Complex Prompt: You should retell the request in English. You should do step-by-step answer to obtain a number answer. You should step-by-step answer the request. You should tell me the answer in this format "Answer:".]
Trans-X: [Query]=[Question -> English Question by X]+[Prompt -> English CoT: Let’s think step by step in English. You should format your final answer as "Answer: <Arabic numeral>".]

Reading List

We provide a reading list (Continual Updated) for this chapter corresponding to the section 4 in the survey.

Security

Jailbreaking Leaderboard

This leaderboard is built by the EasyJailbreak framework on the AdvBench.

Method	GPT-3.5	GPT-4	Llama-2-7B-Chat	Llama-2-13B-Chat	Vicuna-7B-v1.5	Vicuna-13B-v1.5	ChatGLM	Qwen-7B-Chat	InterLM-7B	Mistral-7B
GCG	12%	0%	46%	46%	94%	94%	34%	48%	10%	82%
JailBroken	100%	58%	6%	4%	100%	100%	95%	100%	100%	100%
GPTFUZZER	35%	0%	31%	41%	93%	94%	85%	82%	92%	99%
AutoDAN	45%	2%	51%	72%	100%	97%	89%	99%	98%	98%
DeepInception	66%	35%	8%	0%	29%	17%	33%	58%	36%	40%
ICA	0%	1%	0%	0%	51%	81%	54%	36%	23%	75%
PAIR	19%	20%	27%	13%	99%	95%	96%	77%	86%	95%
ReNeLLM	87%	38%	31%	69%	77%	87%	86%	70%	67%	90%
Multilingual	12%	0%	46%	46%	94%	94%	34%	48%	10%	82%
Cipher	100%	58%	6%	4%	100%	100%	95%	100%	100%	100%
CodeChameleon	35%	0%	31%	41%	93%	94%	85%	82%	92%	99%

Reading List

We provide a reading list of jailbreaking and defense methods (Continual Updated) for this chapter corresponding to the section 5 in the survey.

Multidomain

Legal

🎈 The leaderboard of legal benchmark is under built.

Medical

🎈 The leaderboard of medical benchmark is under built.

Finance

Coming soon! This domain is under updated.

Data Resource & Evaluation

The data resource and popular benchmarks are listed in the reading list in details.

Contact Us

Project Lead:

Kaiyu Huang, kyhuang@bjtu.edu.cn
Fengran Mo, fengran.mo@umontreal.ca

Section Contributors:

Inference: Yulong Mao
Security: Hongliang Li
Multidomain: You Li

Special Thanks:

Chaoqun Liu (Nanyang Technological University, Singapore) provides valuable thoughts and contributes part of the implementation of the multilingual inference strategies.

Citation

@misc{huang2024survey,
      title={A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers}, 
      author={Kaiyu Huang and Fengran Mo and Hongliang Li and You Li and Yuanchi Zhang and Weijian Yi and Yulong Mao and Jinchen Liu and Yuzhuang Xu and Jinan Xu and Jian-Yun Nie and Yang Liu},
      year={2024},
      eprint={2405.10936},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
attach		attach
readinglist		readinglist
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multilingual Pre-trained Models Tutorial

LLMs with Multilingualism

General Multilingual Leaderboard

Open-Source Models

Closed-Source Systems

Multilingual Inference Strategies

Reasoning Leaderboard

Baseline Details

Reading List

Security

Jailbreaking Leaderboard

Reading List

Multidomain

Legal

Medical

Finance

Data Resource & Evaluation

Contact Us

Citation

About

Releases

Packages

Contributors 5

License

kaiyuhwang/MLLM-Survey

Folders and files

Latest commit

History

Repository files navigation

Multilingual Pre-trained Models Tutorial

LLMs with Multilingualism

General Multilingual Leaderboard

Open-Source Models

Closed-Source Systems

Multilingual Inference Strategies

Reasoning Leaderboard

Baseline Details

Reading List

Security

Jailbreaking Leaderboard

Reading List

Multidomain

Legal

Medical

Finance

Data Resource & Evaluation

Contact Us

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Packages