#  Federated ChatGLM Tuning with Parameter Efficient methods in FATE-LLM

In this tutorial, we will demonstrate how to efficiently train federated ChatGLM-6B with deepspeed using the FATE-LLM framework. In FATE-LLM, we introduce the "pellm"(Parameter Efficient Large Language Model) module, specifically designed for federated learning with large language models. We enable the implementation of parameter-efficient methods in federated learning, reducing communication overhead while maintaining model performance. In this tutorial we particularlly focus on ChatGLM-^b, and we will also emphasize the use of the Adapter mechanism for fine-tuning ChatGLM-6B, which enables us to effectively reduce communication volume and improve overall efficiency.


## FATE-LLM: ChatGLM-6B

### ChatGLM-6B
ChatGLM-6B is a large transformer-based language model with 6.2 billion parameters, trained on about 1T tokens of Chinese and English corpus. ChatGLM-6B is an open bilingual language model based on General Language Model. You can download the pretrained model from [here](https://huggingface.co/THUDM/chatglm-6b), or let the program automatically download it when you use it later.

### Current Features

In current version, FATE-LLM: ChatGLM-6B supports the following features:
<div align="center">
  <img src="../images/fate-llm-chatglm-6b.png">
</div>

## Experiment Setting

Before running experiment, please make sure that [FATE-LLM Cluster](https://github.com/FederatedAI/FATE/wiki/Download#llm%E9%83%A8%E7%BD%B2%E5%8C%85) has been deployed. 

### Dataset: Advertising Text Generation

This is an advertising test generateion dataset, you can download dataset from the following links and place it in the examples/data folder. 
- [data link 1](https://drive.google.com/file/d/13_vf0xRTQsyneRKdD1bZIr93vBGOczrk/view)
- [data link 2](https://cloud.tsinghua.edu.cn/f/b3f119a008264b1cabd1/?dl=1)  

You can refer to following link for more details about [data](https://aclanthology.org/D19-1321.pdf)

In [46]:
import pandas as pd
df = pd.read_json('./datas/AdvertiseGen/train.json', lines=True)

In [2]:
df.head()

Unnamed: 0,content,summary
0,类型#裤*版型#宽松*风格#性感*图案#线条*裤型#阔腿裤,宽松的阔腿裤这两年真的吸粉不少，明星时尚达人的心头爱。毕竟好穿时尚，谁都能穿出腿长2米的效果...
1,类型#裙*风格#简约*图案#条纹*图案#线条*图案#撞色*裙型#鱼尾裙*裙袖长#无袖,圆形领口修饰脖颈线条，适合各种脸型，耐看有气质。无袖设计，尤显清凉，简约横条纹装饰，使得整身...
2,类型#上衣*版型#宽松*颜色#粉红色*图案#字母*图案#文字*图案#线条*衣样式#卫衣*衣款...,宽松的卫衣版型包裹着整个身材，宽大的衣身与身材形成鲜明的对比描绘出纤瘦的身形。下摆与袖口的不...
3,类型#裙*版型#宽松*材质#雪纺*风格#清新*裙型#a字*裙长#连衣裙,踩着轻盈的步伐享受在午后的和煦风中，让放松与惬意感为你免去一身的压力与束缚，仿佛要将灵魂也寄...
4,类型#上衣*材质#棉*颜色#蓝色*风格#潮*衣样式#polo*衣领型#polo领*衣袖长#短...,想要在人群中脱颖而出吗？那么最适合您的莫过于这款polo衫短袖，采用了经典的polo领口和柔...


In [3]:
import json
def generate_json_data(data, filename):
  with open(filename, 'w') as f:
    for index, row in data.iterrows():
      res = {}
      res["content"] = row[0]
      res["summary"] = row[1]
      json_str = json.dumps(res, ensure_ascii=False)
      f.write(json_str)
      f.write("\n")

In [21]:
import json
def generate_csv_data(data, filename):
    data.to_csv(filename, index=False)

In [15]:
generate_json_data(df.loc[:200, :], './datas/AdvertiseGen/train_guest.json')

In [16]:
generate_json_data(df.loc[500:700, :], './datas/AdvertiseGen/train_host.json')

In [22]:
generate_csv_data(df.loc[:200, :], './datas/AdvertiseGen/train_guest.csv')
generate_csv_data(df.loc[500:700, :], './datas/AdvertiseGen/train_host.csv')

### ChatGLM-6B with Adapter

In this section, we will guide you through the process of finetuning ChatGLM-6B with adapters using the FATE-LLM framework. Before starting this section, we recommend that you read through this tutorial first: [Model Customization](https://github.com/FederatedAI/FATE/blob/master/doc/tutorial/pipeline/nn_tutorial/Homo-NN-Customize-Model.ipynb).

ChatGLM model is located on fate_llm/model_zoo/chatglm.py, can be use directly

In [6]:
! ls ../fate/python/fate_llm/model_zoo/pellm

albert.py  bloom.py    distilbert.py  parameter_efficient_llm.py
bart.py    chatglm.py  gpt2.py	      roberta.py
bert.py    deberta.py  llama.py


#### Adapters

We can directly use adapters from the peft. See details for adapters on this page [Adapter Methods](https://huggingface.co/docs/peft/index) for more details. By specifying the adapter name and the adapter
config dict we can insert adapters into our language models:

! pip install peft -i https://pypi.tuna.tsinghua.edu.cn/simple

In [7]:
from peft import LoraConfig, TaskType

# define lora config
lora_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,
    target_modules=['c_attn'],
)

#### Init ChatGLM Model 

In [8]:
import torch as t
from pipeline import fate_torch_hook
from pipeline.component.nn import save_to_fate_llm
fate_torch_hook(t)

model_path = "./ChatGLM-6B/chatglm-6b"
model = t.nn.Sequential(
    t.nn.CustModel(module_name='pellm.chatglm', class_name='ChatGLMForConditionalGeneration',
                   peft_config=lora_config.to_dict(), peft_type='LoraConfig',
                   pretrained_path=model_path)
)


**During the training process, all weights of the pretrained language model will be frozen, and weights of adapters are traininable. Thus, FATE-LLM only train in the local training and aggregate adapters' weights in the fedederation process**

Now available adapters are [Adapters Overview](https://huggingface.co/docs/peft/index) for details.


#### Inint DeepSpeed Config

In [9]:
ds_config = {
    "train_micro_batch_size_per_gpu": 1,
    "optimizer": {
        "type": "Adam",
        "params": {
            "lr": 5e-4
        }
    },
    "fp16": {
        "enabled": True
    },
    "zero_optimization": {
        "stage": 2,
        "allgather_partitions": True,
        "allgather_bucket_size": 5e8,
        "overlap_comm": False,
        "reduce_scatter": True,
        "reduce_bucket_size": 5e8,
        "contiguous_gradients": True
    }
}


### Submit Federated Task
To run federated task, please make sure to ues fate>=v1.11.2 and deploy it with gpu machines. To running this code, make sure training data path is already binded. The following code shoud be copy to a script and run in a command line like "python federated_chatglm.py"

You can use this script to submit the model, but submitting the model will take a long time to train and generate a long log, so we won't do it here.

! pipeline init --ip 127.0.0.1 --port 9380

#### upload data to fate

In [37]:
from pipeline.backend.pipeline import PipeLine
import os

guest_0 = 9999
host_1 = 10000
pipeline_upload = PipeLine().set_initiator(role='guest', party_id=guest_0).set_roles(guest=guest_0, host=host_1,
                                                                              arbiter=guest_0)
data_base = "/data/standalone_fate_install_1.11.3_release/fate_llm_demo/datas/"
pipeline_upload.add_upload_data(file=os.path.join(data_base, "AdvertiseGen/train_guest.csv"),
                                table_name="ad_guest",             # table name
                                namespace="experiment",         # namespace
                                head=1, partition=1)               # data info

pipeline_upload.add_upload_data(file=os.path.join(data_base, "AdvertiseGen/train_host.csv"),
                                table_name="ad_host",
                                namespace="experiment",
                                head=1, partition=1)

pipeline_upload.upload(drop=1)

 UPLOADING:||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.00%


[32m2023-11-26 20:40:22.256[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m83[0m - [1mJob id is 202311262040220161650
[0m
[32m2023-11-26 20:40:22.267[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[0mm2023-11-26 20:40:23.281[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m125[0m - [1m
[32m2023-11-26 20:40:23.284[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:01[0m
[32m2023-11-26 20:40:24.302[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:02[0m
[32m2023-11-26 20:40:25.331[0m | [1mI

 UPLOADING:||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||100.00%

[32m2023-11-26 20:40:28.648[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m83[0m - [1mJob id is 202311262040284553540
[0m





[32m2023-11-26 20:40:28.658[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:00[0m
[32m2023-11-26 20:40:29.677[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m98[0m - [1m[80D[1A[KJob is still waiting, time elapse: 0:00:01[0m
[0mm2023-11-26 20:40:30.697[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m125[0m - [1m
[32m2023-11-26 20:40:30.700[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:02[0m
[32m2023-11-26 20:40:31.840[0m | [1mINFO    [0m | [36mpipeline.utils.invoker.job_submitter[0m:[36mmonitor_job_status[0m:[36m127[0m - [1m[80D[1A[KRunning component upload_0, time elapse: 0:00:03[0m
[32m2023-11-26 20

In [43]:
!pip show keras

Name: keras
Version: 2.9.0
Summary: Deep learning for humans.
Home-page: https://keras.io/
Author: Keras team
Author-email: keras-users@googlegroups.com
License: Apache 2.0
Location: /data/standalone_fate_install_1.11.3_release/env/python/venv/lib/python3.8/site-packages
Requires: 
Required-by: tensorflow-cpu


In [45]:
import torch as t
import os
from pipeline import fate_torch_hook
from pipeline.component import HomoNN
from pipeline.backend.pipeline import PipeLine
from pipeline.component import Reader
from pipeline.interface import Data
from pipeline.runtime.entity import JobParameters

fate_torch_hook(t)


guest_0 = 9999
host_1 = 10000
pipeline = PipeLine().set_initiator(role='guest', party_id=guest_0).set_roles(guest=guest_0, host=host_1,
                                                                              arbiter=guest_0)
data_guest = {"name": "ad_guest", "namespace": "experiment"}
data_host = {"name": "ad_host", "namespace": "experiment"}
# guest_data_path = "./datas/AdvertiseGen/train.json_guest"
# host_data_path = "./datas/AdvertiseGen/train.json_host"
# make sure the guest and host's training data are already binded

reader_0 = Reader(name="reader_0")
reader_0.get_party_instance(role='guest', party_id=guest_0).component_param(table=data_guest)
reader_0.get_party_instance(role='host', party_id=host_1).component_param(table=data_host)

## Add your pretriained model path here, will load model&tokenizer from this path

from peft import LoraConfig, TaskType
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,
    target_modules=['query_key_value'],
)
ds_config = {
    "train_micro_batch_size_per_gpu": 1,
    "optimizer": {
        "type": "Adam",
        "params": {
            "lr": 5e-4
        }
    },
    "fp16": {
        "enabled": True
    },
    "zero_optimization": {
        "stage": 2,
        "allgather_partitions": True,
        "allgather_bucket_size": 5e8,
        "overlap_comm": False,
        "reduce_scatter": True,
        "reduce_bucket_size": 5e8,
        "contiguous_gradients": True
    }
}

model_path = "./ChatGLM-6B/chatglm-6b"
from pipeline.component.homo_nn import DatasetParam, TrainerParam
model = t.nn.Sequential(
    t.nn.CustModel(module_name='pellm.chatglm', class_name='ChatGLMForConditionalGeneration',
                   peft_config=lora_config.to_dict(), peft_type='LoraConfig',
                   pretrained_path=model_path)
)

# DatasetParam
dataset_param = DatasetParam(dataset_name='glm_tokenizer', text_max_length=64, tokenizer_name_or_path=model_path,
                             padding_side="left")
# TrainerParam
trainer_param = TrainerParam(trainer_name='fedavg_trainer', epochs=5, batch_size=4, 
                             checkpoint_save_freqs=1, pin_memory=False, 
                             task_type="seq_2_seq_lm",
                             data_loader_worker=1, 
                             save_to_local_dir=True, # pay attention to tihs parameter
                             collate_fn="DataCollatorForSeq2Seq")


nn_component = HomoNN(name='nn_0', model=model , ds_config=ds_config)

# set parameter for client 1
nn_component.get_party_instance(role='guest', party_id=guest_0).component_param(
    dataset=dataset_param,
    trainer=trainer_param,
    torch_seed=100
)

# set parameter for client 2
nn_component.get_party_instance(role='host', party_id=host_1).component_param(
    dataset=dataset_param,
    trainer=trainer_param,
    torch_seed=100
)

# set parameter for server
nn_component.get_party_instance(role='arbiter', party_id=guest_0).component_param(
    trainer=trainer_param
)

pipeline.add_component(reader_0)
pipeline.add_component(nn_component, data=Data(train_data=reader_0.output.data))
print("=======================================================================")
print(reader_0.output.data)
print("************************************************************************")
print(type(reader_0.output.data))
pipeline.compile()

pipeline.fit(JobParameters(task_conf={
    "nn_0": {
        "launcher": "deepspeed",
        "world_size": 0 # world_size means num of gpus to train in a single client
    }
}))


[32m2023-11-26 21:11:51.074[0m | [31m[1mERROR   [0m | [36m__main__[0m:[36m<module>[0m:[36m104[0m - [31m[1mAn error has been caught in function '<module>', process 'MainProcess' (9295), thread 'MainThread' (140567494672960):[0m
[33m[1mTraceback (most recent call last):[0m

  File "/data/standalone_fate_install_1.11.3_release/env/python/miniconda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
           │         │     └ {'__name__': '__main__', '__doc__': 'Entry point for launching an IPython kernel.\n\nThis is separate from the ipykernel pack...
           │         └ <code object <module> at 0x7fd86b85fd40, file "/data/standalone_fate_install_1.11.3_release/env/python/venv/lib/python3.8/sit...
           └ <function _run_code at 0x7fd86a307b80>
  File "/data/standalone_fate_install_1.11.3_release/env/python/miniconda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
         │     └ {'_

reader_0.data
************************************************************************
<class 'str'>


TypeError: Object of type set is not JSON serializable

### Training With P-Tuning V2 Adapter

To use another adapter lke P-Tuning V2, slightly changes is needed!

In [20]:
from pipeline.component.homo_nn import DatasetParam, TrainerParam
model = t.nn.Sequential(
    t.nn.CustModel(module_name='pellm.chatglm', class_name='ChatGLMForConditionalGeneration',
                   pre_seq_len=128, # only this parameters is needed
                   pretrained_path=model_path)
)

### Inference

Models trained with FATE-LLM can be find under the directory `${fate_install}/fateflow/model/$jobids/$cpn_name/{model.pkl, checkpoint_xxx.pkl/adapter_model.bin}`, users must may sure "save_to_local_dir=True".  
The following code is an example to load trained lora adapter weights:

In [None]:
import json
import sys
import torch
from peft import PeftModel, PeftConfig, LoraConfig, TaskType, get_peft_model
from transformers import AutoModel, AutoTokenizer


def load_model(pretrained_model_path):
    _tokenizer = AutoTokenizer.from_pretrained(pretrained_model_path, trust_remote_code=True)
    _model = AutoModel.from_pretrained(pretrained_model_path, trust_remote_code=True)

    _model = _model.half().quantize(4)
    _model = _model.eval()

    return _model, _tokenizer


def load_data(data_path):
    with open(data_path, "r") as fin:
        for _l in fin:
            yield json.loads(_l.strip())

chatglm_model_path = "/data/standalone_fate_install_1.11.3_release/fate_llm_demo/ChatGLM-6B/chatglm-6b"   # init_model
model, tokenizer = load_model(chatglm_model_path)

test_data_path = os.path.join(data_base, "AdvertiseGen/dev.csv")
dataset = load_data(test_data_path)

peft_path = trained_model_path
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,
    target_modules=['query_key_value'],
)

model = get_peft_model(model, peft_config)
model.load_state_dict(torch.load(peft_path), strict=False)
model = model.half()
model.eval()

for p in model.parameters():
    if p.requires_grad:
        print(p)

# model.cuda("cuda:0")

content = "风衣#春季"
model.chat(tokenizer, content, do_sample=False)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]