MU-SplitFed/
├── cezo_fl/ # Core CeZO-FL implementation
│ ├── __init__.py
│ ├── server.py # Federated learning server
│ ├── client_test.py # Client testing utilities
│ ├── random_gradient_estimator.py # ZO gradient estimation
│ ├── run_client_jobs.py # Client job execution
│ ├── shared.py # Shared utilities
│ └── util/ # Utility modules
│ ├── checkpoint.py # Model checkpointing
│ ├── compression.py # Gradient compression
│ ├── data_split.py # Data splitting utilities
│ ├── dataloaders.py # Data loading utilities
│ ├── dataset.py # Dataset implementations
│ ├── language_utils.py # Language model utilities
│ ├── metrics.py # Evaluation metrics
│ └── model_helpers.py # Model helper functions
├── config.py # Configuration management
├── preprocess.py # Data preprocessing
├── run.py # Main execution script
├── sl_main_new.py # Split learning main implementation
├── zo_optimizer_new.py # Zeroth-order optimizer
└── dev_tools/ # Development tools
├── dev-requirement.txt
└── README.md
- Clone the repository:
git clone <repository-url>
cd MU-SplitFed- Install dependencies:
pip install -r requirements.txt- For development dependencies:
pip install -r dev_tools/dev-requirement.txtRun split learning with zeroth-order optimization:
python sl_main_new.py --dataset sst2 --large-model opt-125m --iterations 1000 --server-iter 5 --splitted-layer 12--dataset: Dataset to use (sst2, cb, wsc, wic, multirc, rte, boolq, squad, drop, xsum)--large-model: Model size (opt-125m, opt-1.3b, opt-2.7b, opt-6.7b, opt-13b, opt-30b)--iterations: Number of training iterations--server-iter: Number of server-side iterations per round--splitted-layer: Layer where to split the model--lr: Learning rate (default: 1e-4)--mu: Perturbation magnitude for ZO (default: 1e-3)--num-pert: Number of perturbations per gradient estimate (default: 1)--lora: Enable LoRA fine-tuning--lora-r: LoRA rank (default: 8)--lora-alpha: LoRA alpha (default: 16)
Small model training:
python sl_main_new.py --dataset sst2 --large-model opt-125m --iterations 500 --server-iter 3 --splitted-layer 8 --lr 1e-4Large model with LoRA:
python sl_main_new.py --dataset sst2 --large-model opt-1.3b --iterations 1000 --server-iter 5 --splitted-layer 12 --lora --lora-r 16 --lora-alpha 32Generation tasks:
python sl_main_new.py --dataset squad --large-model opt-125m --iterations 200 --server-iter 2 --splitted-layer 10The project uses a comprehensive configuration system in config.py. Key configuration options include:
- Model settings: Model type, dtype, LoRA parameters
- Training settings: Batch size, learning rate, momentum
- ZO settings: Perturbation magnitude, number of perturbations, gradient estimation method
- Split learning: Server iterations, split layer
- Hardware: CUDA/MPS support, device selection
- OPT-125M, OPT-1.3B, OPT-2.7B, OPT-6.7B, OPT-13B, OPT-30B
- Classification: SST-2, CB, WSC, WIC, MultiRC, RTE, BoolQ
- Generation: SQuAD, DROP, XSum
The implementation includes several memory optimization features:
- Split Learning: Reduces memory requirements by splitting large models
- Gradient Compression: Optional gradient compression techniques
- Mixed Precision: Support for float16 and bfloat16
- No Optim Mode: Memory-efficient training without PyTorch optimizers
python -m pytest cezo_fl/The project follows Python best practices and includes type hints throughout.
If you use this code in your research, please cite:
@misc{cezo-fl-2024,
title={CeZO-FL: Communication-Efficient Zeroth-Order Federated Learning},
author={[Your Name]},
year={2024},
howpublished={GitHub Repository},
url={https://github.com/[username]/MU-SplitFed}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Built on top of Hugging Face Transformers
- Uses PEFT for LoRA implementation
- Inspired by federated learning and zeroth-order optimization research
- CUDA Out of Memory: Reduce batch size or use smaller models
- MPS Issues on macOS: The code automatically falls back to CPU if MPS is not available
- Model Loading: Ensure you have sufficient disk space for large model downloads
- Use
--no-optimflag for memory efficiency - Adjust
--splitted-layerbased on your memory constraints - Use smaller
--num-pertvalues for faster training - Enable LoRA for large models to reduce memory usage
For more detailed information, please refer to the individual module documentation or open an issue.