# Fine-Tuning Qwen-Chat Large Language Model (Multiple GPUs)

Tongyi Qianwen is a large language model developed by Alibaba Cloud based on the Transformer architecture, trained on an extensive set of pre-training data. The pre-training data is diverse and covers a wide range, including a large amount of internet text, specialized books, code, etc. In addition, an AI assistant called Qwen-Chat has been created based on the pre-trained model using alignment mechanism.

This notebook uses Qwen-1.8B-Chat as an example to introduce how to fine-tune the Qianwen model using Deepspeed.

## Environment Requirements

Please refer to **requirements.txt** to install the required dependencies.

## Preparation

### Download Qwen-1.8B-Chat

First, download the model files. You can choose to download directly from ModelScope.

In [None]:
from modelscope.hub.snapshot_download import snapshot_download
model_dir = snapshot_download('Qwen/Qwen-1_8B-Chat', cache_dir='.', revision='master')

### Download Example Training Data

Download the data required for training; here, we provide a tiny dataset as an example. It is sampled from [Belle](https://github.com/LianjiaTech/BELLE).

Disclaimer: the dataset can be only used for the research purpose.

In [4]:
!wget https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/qwen_recipes/Belle_sampled_qwen.json

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


--2024-09-13 13:59:21--  https://atp-modelzoo-sh.oss-cn-shanghai.aliyuncs.com/release/tutorials/qwen_recipes/Belle_sampled_qwen.json
Connecting to 127.0.0.1:3128... failed: Connection refused.


You can also refer to this format to prepare the dataset. Below is a simple example list with 1 sample:

```json
[
  {
    "id": "identity_0",
    "conversations": [
      {
        "from": "user",
        "value": "你好"
      },
      {
        "from": "assistant",
        "value": "我是一个语言模型，我叫通义千问。"
      }
    ]
  }
]
```

You can also use multi-turn conversations as the training set. Here is a simple example:

```json
[
  {
    "id": "identity_0",
    "conversations": [
      {
        "from": "user",
        "value": "你好，能告诉我遛狗的最佳时间吗？"
      },
      {
        "from": "assistant",
        "value": "当地最佳遛狗时间因地域差异而异，请问您所在的城市是哪里？"
      },
      {
        "from": "user",
        "value": "我在纽约市。"
      },
      {
        "from": "assistant",
        "value": "纽约市的遛狗最佳时间通常在早晨6点至8点和晚上8点至10点之间，因为这些时间段气温较低，遛狗更加舒适。但具体时间还需根据气候、气温和季节变化而定。"
      }
    ]
  }
]
```

## Fine-Tune the Model

You can directly run the prepared training script to fine-tune the model. **nproc_per_node** refers to the number of GPUs used fro training.

## Test the Model

We can test the model as follows:

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation import GenerationConfig

model_dir = '../models/Qwen2-1.5B/'
device = "cuda"

tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForCausalLM.from_pretrained(
    model_dir,
    device_map="auto"
).eval()

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.


A large language model (LLM) is a type of artificial intelligence (AI) system that is designed to understand and generate human-like language. LLMs are typically trained on large amounts of text data, such as books, articles, and web pages, and are able to learn and improve over time through reinforcement learning.

LLMs are often used in a variety of applications, including natural language processing (NLP), machine translation, and chatbots. They can be trained to perform a wide range of tasks, from generating summaries of long documents to generating coherent and contextually appropriate responses to user queries.

LLMs are particularly useful for tasks that require a high degree of accuracy and consistency, such as language translation, summarization, and question answering. They can also be used to automate repetitive tasks, such as customer service and data analysis.

Overall, LLMs are a powerful tool for processing and generating human-like language, and are likely to play an in