Hidden Echo

The rise of large language models (LLMs) has driven the adoption of Model-as-a-Service (MaaS). However, transmitting raw text to servers raises critical privacy concerns. Existing approaches employ deep neural networks (DNNs) or differential privacy (DP) to perturb inputs. Yet, these approaches suffer notable limitations: DNN-based methods often require task-specific pre-training, and conventional DP techniques, though privacy-preserving, suffer from noise amplification as perturbed inputs propagate through the deep transformer layer, leading to significant degradation in downstream task performance. To alleviate this, we propose HIDDENECHO, an end-to-end framework with client noise correction, where hidden states are sent from the server to the client and refined by a lightweight module using both embeddings and intermediate representations. HIDDENECHO suppresses inter-layer noise amplification without pretraining, effectively preserving task-relevant signals under DP constraints. To further reduce communication, HIDDENECHO incorporates gradient-based hidden layer selection and information bottleneck compression, reducing communication cost while preserving essential task information. Experiments across text classification and generation tasks demonstrate that HIDDENECHO achieves up to 46.89% performance improvement over DP baselines, over 85% communication reduction, and up to 72.52% faster training compared to existing denoising approaches, establishing a new privacy-utility trade-off for privatized LLMs.

Quick Start

Install dependencies:

environment.yml contains all the dependencies. You can create a new conda environment and install them.
Run the script:
```
bash scripts/simple.sh
```
The script trains a model on the financial phrasebank dataset with a privacy budget of 5000. You can modify the parameters in the script as needed.

Baselines

LDP

python train_split.py \
    --experiment_name "ldp" \
    --model_path "/path/to/Qwen2-1.5B-Instruct" \
    --dataset_name "financial_phrasebank" \
    --num_train_epochs 20 \
    --lr_scheduler_type "constant" \
    --learning_rate 4e-4 \
    --max_len 128 \
    --train_batch_size 48 \
    --eval_batch_size 48 \
    --lora_rank 16 \
    --privacy_budget 5000 \
    --lst_enable false

SnD

python -m baselines.snd.data
python -m baselines.snd.train_denoise
python -m baselines.snd.train_task \
    --model_name "/path/to/Qwen2-1.5B-Instruct" \
    --dataset_name "financial_phrasebank" \
    --privacy_budget 5000 \
    --num_train_epochs 15 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 16

GAN-DP

python -m baselines.gan.train_gan \
    --llm_path "/path/to/Qwen2-1.5B-Instruct" \
    --dataset_name "financial_phrasebank" \
    --privacy_budget 5000 \
    --train_epochs 20

python -m baselines.gan.train_task \
    --model_name "/path/to/Qwen2-1.5B-Instruct" \
    --dataset "financial_phrasebank" \
    --generator_epoch 20 \
    --privacy_budget 5000 \
    --num_train_epochs 15 \
    --per_device_train_batch_size 8 \
    --per_device_eval_batch_size 16

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
baselines		baselines
experiment		experiment
modeling		modeling
scripts		scripts
tools		tools
utils		utils
.gitignore		.gitignore
environment.yml		environment.yml
readme.md		readme.md
test.py		test.py
test_llama.py		test_llama.py
train_split.py		train_split.py
train_split_llama.py		train_split_llama.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hidden Echo

Quick Start

Baselines

LDP

SnD

GAN-DP

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Hidden Echo

Quick Start

Baselines

LDP

SnD

GAN-DP

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages