### Federated ChatGLM3 Tuning with Parameter Efficient methods in FATE-LLM

Source:

**FATE-LLM** 

FATE-LLM is a framework to support federated learning for large language models(LLMs) and small language models(SLMs).

https://github.com/FederatedAI/FATE-LLM

In this tutorial, we will demonstrate how to efficiently train federated ChatGLM3-6B with deepspeed using the FATE-LLM framework. In FATE-LLM, we introduce the "pellm"(Parameter Efficient Large Language Model) module, specifically designed for federated learning with large language models. We enable the implementation of parameter-efficient methods in federated learning, reducing communication overhead while maintaining model performance. In this tutorial we particularlly focus on ChatGLM3-6B, and we will also emphasize the use of the Adapter mechanism for fine-tuning ChatGLM3-6B, which enables us to effectively reduce communication volume and improve overall efficiency.


## FATE-LLM: ChatGLM3-6B

### ChatGLM-6B
ChatGLM3-6B is a large transformer-based language model with 5.977 billion parameters, it is an open bilingual language model based on General Language Model. You can download the pretrained model from [here](https://github.com/THUDM/ChatGLM3), or let the program automatically download it when you use it later.

### Current Features

In current version, FATE-LLM: ChatGLM-6B supports the following features:
<div align="center">
  <img src="../../images/fate-llm-chatglm-6b.png">
</div>

## Experiment Setting

Before running experiment, please make sure that [FATE-LLM Cluster](https://github.com/FederatedAI/FATE/wiki/Download#llm%E9%83%A8%E7%BD%B2%E5%8C%85) has been deployed. 

In [7]:
# !fate_flow init --ip 127.0.0.1 --port 9380 --home $(pwd)/fate_workspace
!fate_flow init --ip 127.0.0.1 --port 9380 --home fate_workspace
!pipeline init --ip 127.0.0.1 --port 9380

zsh:1: command not found: fate_flow







Traceback (most recent call last):
  File "/Users/mjack6/GSU_Spring2025/MSA8700/venv_agenticai_frameworks/bin/pipeline", line 5, in <module>
    from fate_client.pipeline.pipeline_cli import pipeline_group
  File "/Users/mjack6/GSU_Spring2025/MSA8700/venv_agenticai_frameworks/lib/python3.11/site-packages/fate_client/pipeline/__init__.py", line 15, in <module>
    from .pipeline import FateFlowPipeline
  File "/Users/mjack6/GSU_Spring2025/MSA8700/venv_agenticai_frameworks/lib/python3.11/site-packages/fate_client/pipeline/pipeline.py", line 19, in <module>
    from .executor import FateFlowExecutor
  File "/Users/mjack6/GSU_Spring2025/MSA8700/venv_agenticai_frameworks/lib/python3.11/site-packages/fate_client/pipeline/executor/__init__.py", line 15, in <module>
    from .task_executor import FateFlowExecutor
  File "/Users/mjack6/GSU_Spring2025/MSA8700/venv_agenticai_frameworks/lib/python3.11/site-packages/fate_client/pipeline/executor/task_execu

### Dataset: Advertising Text Generation

This is an advertising test generateion dataset, you can download dataset from the following links and place it in the examples/data folder. 
- [data link 1](https://drive.google.com/file/d/13_vf0xRTQsyneRKdD1bZIr93vBGOczrk/view)
- [data link 2](https://cloud.tsinghua.edu.cn/f/b3f119a008264b1cabd1/?dl=1)  

You can refer to following link for more details about [data](https://aclanthology.org/D19-1321.pdf)

In [8]:
import pandas as pd
# df = pd.read_json('${fate_install}/examples/data/AdvertiseGen/train.json', lines=True)
df = pd.read_json('data/AdvertiseGen/train.json', lines=True)

### ChatGLM3-6B with Adapter

In this section, we will guide you through the process of finetuning ChatGLM-6B with adapters using the FATE-LLM framework. 

ChatGLM model is located on fate_llm/model_zoo/chatglm.py, can be use directly

In [9]:
! ls ../../../../fate_llm/python/fate_llm/model_zoo/pellm

ls: ../../../../fate_llm/python/fate_llm/model_zoo/pellm: No such file or directory


#### Adapters

We can directly use adapters from the peft. See details for adapters on this page [Adapter Methods](https://huggingface.co/docs/peft/index) for more details. By specifying the adapter name and the adapter
config dict we can insert adapters into our language models:

In [10]:
from peft import LoraConfig, TaskType

lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,
    target_modules=['query_key_value'],
)
lora_config.target_modules = list(lora_config.target_modules) # this line is needed to ensure lora_config is jsonable

### Init ChatGLM3 Model 

In [11]:
from fate_client.pipeline.components.fate.nn.loader import LLMModelLoader

# pretrained_model_path = "fill with pretrained model download path please"
pretrained_model_path = ""

model = LLMModelLoader(
    "pellm.chatglm",
    "ChatGLM",
    pretrained_path=pretrained_model_path,
    peft_type="LoraConfig",
    peft_config=lora_config.to_dict(),
    trust_remote_code=True
)

AttributeError: 
"safe_load()" has been removed, use

  yaml = YAML(typ='safe', pure=True)
  yaml.load(...)

instead of file "/Users/mjack6/GSU_Spring2025/MSA8700/venv_agenticai_frameworks/lib/python3.11/site-packages/fate_client/pipeline/conf/env_config.py", line 25

    __DEFAULT_CONFIG: dict = yaml.safe_load(fin)



**During the training process, all weights of the pretrained language model will be frozen, and weights of adapters are traininable. Thus, FATE-LLM only train in the local training and aggregate adapters' weights in the fedederation process**

Now available adapters are [Adapters Overview](https://huggingface.co/docs/peft/index) for details.


### Specify Dataset And DataCollator To Process Data

In [12]:
from fate_client.pipeline.components.fate.nn.loader import LLMDatasetLoader, LLMDataFuncLoader

tokenizer_params = dict(
    tokenizer_name_or_path=pretrained_model_path,
    trust_remote_code=True,
)

dataset = LLMDatasetLoader(
    "prompt_dataset",
    "PromptDataset",
    **tokenizer_params,
)

data_collator = LLMDataFuncLoader(
    "data_collator.cust_data_collator",
    "get_seq2seq_data_collator",
    **tokenizer_params,
)

AttributeError: 
"safe_load()" has been removed, use

  yaml = YAML(typ='safe', pure=True)
  yaml.load(...)

instead of file "/Users/mjack6/GSU_Spring2025/MSA8700/venv_agenticai_frameworks/lib/python3.11/site-packages/fate_client/pipeline/conf/env_config.py", line 25

    __DEFAULT_CONFIG: dict = yaml.safe_load(fin)



#### Init DeepSpeed Config

In [13]:
ds_config = {
    "train_micro_batch_size_per_gpu": 1,
    "optimizer": {
        "type": "Adam",
        "params": {
            "lr": 5e-4
        }
    },
    "fp16": {
        "enabled": True
    },
    "gradient_accumulation_steps": 1,
    "zero_optimization": {
        "stage": 2,
        "allgather_partitions": True,
        "allgather_bucket_size": 1e8,
        "overlap_comm": True,
        "reduce_scatter": True,
        "reduce_bucket_size": 1e8,
        "contiguous_gradients": True,
        "offload_optimizer": {
            "device": "cpu"
        },
        "offload_param": {
            "device": "cpu"
        }
    }
}


### Submit Federated Task
To run federated task, please make sure to ues fate>=2.1.0 and deploy it with gpu machines. To running this code, make sure training data path is already binded. The following code shoud be copy to a script and run in a command line like "python federated_chatglm.py"

You can use this script to submit the model, but submitting the model will take a long time to train and generate a long log, so we won't do it here.

In [None]:
import time
from fate_client.pipeline.components.fate.reader import Reader
from fate_client.pipeline import FateFlowPipeline
from fate_client.pipeline.components.fate.homo_nn import HomoNN, get_config_of_seq2seq_runner
from fate_client.pipeline.components.fate.nn.algo_params import Seq2SeqTrainingArguments, FedAVGArguments
from fate_client.pipeline.components.fate.nn.loader import LLMModelLoader, LLMDatasetLoader, LLMDataFuncLoader
from peft import LoraConfig, TaskType


guest = '10000'
host = '10000'
arbiter = '10000'

epochs = 1
batch_size = 1
lr = 5e-4

ds_config = {
    "train_micro_batch_size_per_gpu": batch_size,
    "optimizer": {
        "type": "Adam",
        "params": {
            "lr": lr,
            "torch_adam": True,
            "adam_w_mode": False
        }
    },
    "fp16": {
        "enabled": True
    },
    "gradient_accumulation_steps": 1,
    "zero_optimization": {
        "stage": 2,
        "allgather_partitions": True,
        "allgather_bucket_size": 1e8,
        "overlap_comm": True,
        "reduce_scatter": True,
        "reduce_bucket_size": 1e8,
        "contiguous_gradients": True,
        "offload_optimizer": {
            "device": "cpu"
        },
        "offload_param": {
            "device": "cpu"
        }
    }
}

pipeline = FateFlowPipeline().set_parties(guest=guest, host=host, arbiter=arbiter)
# pipeline.bind_local_path(path="", namespace="experiment", name="ad")
time.sleep(5)


reader_0 = Reader("reader_0", runtime_parties=dict(guest=guest, host=host))
reader_0.guest.task_parameters(
    namespace="experiment",
    name="ad"
)
reader_0.hosts[0].task_parameters(
    namespace="experiment",
    name="ad"
)

# define lora config
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False, r=8, lora_alpha=32, lora_dropout=0.1,
    target_modules=['query_key_value'],
)
lora_config.target_modules = list(lora_config.target_modules)

pretrained_model_path = "/data/cephfs/llm/models/chatglm3-6b"

model = LLMModelLoader(
    "pellm.chatglm",
    "ChatGLM",
    pretrained_path=pretrained_model_path,
    peft_type="LoraConfig",
    peft_config=lora_config.to_dict(),
    trust_remote_code=True
)


tokenizer_params = dict(
    tokenizer_name_or_path=pretrained_model_path,
    trust_remote_code=True,
)

dataset = LLMDatasetLoader(
    "prompt_dataset",
    "PromptDataset",
    **tokenizer_params,
)

data_collator = LLMDataFuncLoader(
    "data_collator.cust_data_collator",
    "get_seq2seq_data_collator",
    **tokenizer_params,
)

conf = get_config_of_seq2seq_runner(
    algo='fedavg',
    model=model,
    dataset=dataset,
    data_collator=data_collator,
    training_args=Seq2SeqTrainingArguments(
        num_train_epochs=epochs,
        per_device_train_batch_size=batch_size,
        remove_unused_columns=False, 
        predict_with_generate=False,
        deepspeed=ds_config,
        learning_rate=lr,
        use_cpu=False, # this must be set as we will gpu
        fp16=True,
    ),
    fed_args=FedAVGArguments(),
    task_type='causal_lm',
    save_trainable_weights_only=True # only save trainable weights
)

homo_nn_0 = HomoNN(
    'nn_0',
    runner_conf=conf,
    train_data=reader_0.outputs["output_data"],
    runner_module="homo_seq2seq_runner",
    runner_class="Seq2SeqRunner",
)

homo_nn_0.guest.conf.set("launcher_name", "deepspeed") # tell schedule engine to run task with deepspeed
homo_nn_0.hosts[0].conf.set("launcher_name", "deepspeed") # tell schedule engine to run task with deepspeed

pipeline.add_tasks([reader_0, homo_nn_0])
pipeline.conf.set("task", dict(engine_run={"cores": 1})) # the number of gpus of each party

pipeline.compile()
pipeline.fit()

### Training With P-Tuning V2 Adapter

To use another adapter lke P-Tuning V2, slightly changes is needed!

In [20]:
model = LLMModelLoader(
    "pellm.chatglm",
    "ChatGLM",
    pretrained_path=pretrained_model_path,
    pre_seq_len=128,
    trust_remote_code=True
)

### Inference

Models trained with FATE-LLM can be find under the directory `${fate_install}/fateflow/model/$job_id/${role}/${party_id}/$cpn_name/0/output/output_model/model_directory/adapter_model.bin}`,
The following code is an example to load trained lora adapter weights:

In [None]:
import json
import sys
import torch
from peft import PeftModel, PeftConfig, LoraConfig, TaskType, get_peft_model
from transformers import AutoModel, AutoTokenizer


def load_model(pretrained_model_path):
    _tokenizer = AutoTokenizer.from_pretrained(pretrained_model_path, trust_remote_code=True)
    _model = AutoModel.from_pretrained(pretrained_model_path, trust_remote_code=True)

    _model = _model.half()
    _model = _model.eval()

    return _model, _tokenizer


def load_data(data_path):
    with open(data_path, "r") as fin:
        for _l in fin:
            yield json.loads(_l.strip())


chatglm_model_path = ""
model, tokenizer = load_model(chatglm_model_path)

test_data_path = "{fate_install}/examples/data/AdvertiseGen/dev.json"
dataset = load_data(test_data_path)

peft_path = "${fate_install}/fateflow/model/$job_id/${role}/${party_id}/$cpn_name/0/output/output_model/model_directory/adapter_model.bin}"

model = PeftModel.from_pretrained(model, peft_path)
model = model.half()
model.eval()

for p in model.parameters():
    if p.requires_grad:
        print(p)

model.cuda("cuda:0")

content = list(dataset)[0]["content"]
print(model.chat(tokenizer, content, do_sample=False))