MU-SplitFed: A straggler-resilient SFL algorithm in zeroth-order optimization

Project Structure

MU-SplitFed/
├── cezo_fl/                    # Core CeZO-FL implementation
│   ├── __init__.py
│   ├── server.py              # Federated learning server
│   ├── client_test.py         # Client testing utilities
│   ├── random_gradient_estimator.py  # ZO gradient estimation
│   ├── run_client_jobs.py     # Client job execution
│   ├── shared.py              # Shared utilities
│   └── util/                  # Utility modules
│       ├── checkpoint.py      # Model checkpointing
│       ├── compression.py     # Gradient compression
│       ├── data_split.py      # Data splitting utilities
│       ├── dataloaders.py     # Data loading utilities
│       ├── dataset.py         # Dataset implementations
│       ├── language_utils.py  # Language model utilities
│       ├── metrics.py         # Evaluation metrics
│       └── model_helpers.py   # Model helper functions
├── config.py                  # Configuration management
├── preprocess.py              # Data preprocessing
├── run.py                     # Main execution script
├── sl_main_new.py            # Split learning main implementation
├── zo_optimizer_new.py       # Zeroth-order optimizer
└── dev_tools/                # Development tools
    ├── dev-requirement.txt
    └── README.md

Installation

Clone the repository:

git clone <repository-url>
cd MU-SplitFed

Install dependencies:

pip install -r requirements.txt

For development dependencies:

pip install -r dev_tools/dev-requirement.txt

Usage

Basic Training

Run split learning with zeroth-order optimization:

python sl_main_new.py --dataset sst2 --large-model opt-125m --iterations 1000 --server-iter 5 --splitted-layer 12

Key Parameters

--dataset: Dataset to use (sst2, cb, wsc, wic, multirc, rte, boolq, squad, drop, xsum)
--large-model: Model size (opt-125m, opt-1.3b, opt-2.7b, opt-6.7b, opt-13b, opt-30b)
--iterations: Number of training iterations
--server-iter: Number of server-side iterations per round
--splitted-layer: Layer where to split the model
--lr: Learning rate (default: 1e-4)
--mu: Perturbation magnitude for ZO (default: 1e-3)
--num-pert: Number of perturbations per gradient estimate (default: 1)
--lora: Enable LoRA fine-tuning
--lora-r: LoRA rank (default: 8)
--lora-alpha: LoRA alpha (default: 16)

Example Commands

Small model training:

python sl_main_new.py --dataset sst2 --large-model opt-125m --iterations 500 --server-iter 3 --splitted-layer 8 --lr 1e-4

Large model with LoRA:

python sl_main_new.py --dataset sst2 --large-model opt-1.3b --iterations 1000 --server-iter 5 --splitted-layer 12 --lora --lora-r 16 --lora-alpha 32

Generation tasks:

python sl_main_new.py --dataset squad --large-model opt-125m --iterations 200 --server-iter 2 --splitted-layer 10

Configuration

The project uses a comprehensive configuration system in config.py. Key configuration options include:

Model settings: Model type, dtype, LoRA parameters
Training settings: Batch size, learning rate, momentum
ZO settings: Perturbation magnitude, number of perturbations, gradient estimation method
Split learning: Server iterations, split layer
Hardware: CUDA/MPS support, device selection

Supported Models and Datasets

Models

OPT-125M, OPT-1.3B, OPT-2.7B, OPT-6.7B, OPT-13B, OPT-30B

Datasets

Classification: SST-2, CB, WSC, WIC, MultiRC, RTE, BoolQ
Generation: SQuAD, DROP, XSum

Memory Optimization

The implementation includes several memory optimization features:

Split Learning: Reduces memory requirements by splitting large models
Gradient Compression: Optional gradient compression techniques
Mixed Precision: Support for float16 and bfloat16
No Optim Mode: Memory-efficient training without PyTorch optimizers

Development

Running Tests

python -m pytest cezo_fl/

Code Style

The project follows Python best practices and includes type hints throughout.

Citation

If you use this code in your research, please cite:

@misc{cezo-fl-2024,
  title={CeZO-FL: Communication-Efficient Zeroth-Order Federated Learning},
  author={[Your Name]},
  year={2024},
  howpublished={GitHub Repository},
  url={https://github.com/[username]/MU-SplitFed}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Acknowledgments

Built on top of Hugging Face Transformers
Uses PEFT for LoRA implementation
Inspired by federated learning and zeroth-order optimization research

Troubleshooting

Common Issues

CUDA Out of Memory: Reduce batch size or use smaller models
MPS Issues on macOS: The code automatically falls back to CPU if MPS is not available
Model Loading: Ensure you have sufficient disk space for large model downloads

Performance Tips

Use --no-optim flag for memory efficiency
Adjust --splitted-layer based on your memory constraints
Use smaller --num-pert values for faster training
Enable LoRA for large models to reduce memory usage

For more detailed information, please refer to the individual module documentation or open an issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MU-SplitFed: A straggler-resilient SFL algorithm in zeroth-order optimization

Project Structure

Installation

Usage

Basic Training

Key Parameters

Example Commands

Configuration

Supported Models and Datasets

Models

Datasets

Memory Optimization

Development

Running Tests

Code Style

Citation

License

Contributing

Acknowledgments

Troubleshooting

Common Issues

Performance Tips

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
config.py		config.py
main.py		main.py
preprocess.py		preprocess.py
run.py		run.py
zo_optimizer_new.py		zo_optimizer_new.py

Folders and files

Latest commit

History

Repository files navigation

MU-SplitFed: A straggler-resilient SFL algorithm in zeroth-order optimization

Project Structure

Installation

Usage

Basic Training

Key Parameters

Example Commands

Configuration

Supported Models and Datasets

Models

Datasets

Memory Optimization

Development

Running Tests

Code Style

Citation

License

Contributing

Acknowledgments

Troubleshooting

Common Issues

Performance Tips

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages