Vulnerability of Federated Learning to Data Noises

This repository contains the source code and experiments for our research paper, "Vulnerability of Federated Learning to Data Noises". This project systematically investigates the impact of feature-level data noise on the performance of Federated Learning (FL) and provides a comprehensive comparison against traditional Centralized Learning (CL).

Overview

Federated Learning (FL) enables collaborative model training on decentralized data, offering significant privacy advantages. However, real-world data, especially data collected on edge devices, is often imperfect and noisy. This project explores a critical, yet underexplored, vulnerability of FL: its sensitivity to feature-level noise (i.e., corruption in the input data itself, such as blur in images or typos in text).

Our research addresses the following key questions:

How does the performance of FL degrade under increasing levels of feature noise?
How does this degradation compare to that of traditional Centralized Learning (CL)?
What are the underlying mechanisms within the FL process that cause these effects?

Key Findings

Our extensive, multi-modal experiments lead to a clear and consistent conclusion:

Federated Learning is significantly more vulnerable to feature noise than Centralized Learning.

Project Structure

The repository is organized by data modality, with a dedicated toolkit for noise generation.

.
├── audio/
│   └── README.md   # Code for noise injection and model training on audio data
├── image/
│   └── README.md   # Code for noise injection and model training on image data
├── DataNoiseGenerator/
│   └── README.md   # A standalone, command-line toolkit for injecting noise
├── tabular/
│   └── README.md   # Code for noise injection and model training on tabular data
├── text/
│   └── README.md   # Code for noise injection and model training on text data
└── video/
    └── README.md   # Code for noise injection and model training on video data

Modality Directories (/audio, /image, etc.): Each directory contains the necessary scripts for data preparation, noise injection, and running the FL/CL experiments for that specific data type. Please refer to the README.md within each directory for detailed instructions.
DataNoiseGenerator/: This directory contains our flexible, open-source noise injection toolkit. It is designed to be a standalone tool that can inject a wide range of common, modality-aware noises into five different data types with precise control.

Core Components

1. Multi-Modal Experiments

To ensure our findings are generalizable, our study covers five diverse data modalities:

Image: Object Recognition on CIFAR-10 and Object Detection on Pascal VOC 2012
Video: Action Recognition on UCF101, Something-Something (V2), and ARID
Audio: Sound Classification on UrbanSound8K and Environmental Sound Classification (ESC) 50
Text: Next-Word Prediction on Shakespeare, AG News, and Amazon Reviews
Tabular: Phishing Website Prediction (classification) and House Price Prediction (regression)

2. `DataNoiseGenerator`: A Unified Noise Injection Toolkit

A key contribution of this project is DataNoiseGenerator, a powerful command-line tool for injecting controlled feature noise into datasets. It was built on three core principles:

Modality-Awareness: Implements noise types that are realistic for each data modality (e.g., motion blur for video, typos for text).
Fine-Grained Controllability: Allows precise control over both noise intensity (how strong the noise is) and noise proportion (what fraction of the data is affected).
Unified Interface: Provides a consistent command-line interface across all data types, making it easy to set up controlled experiments.

For detailed usage, please see the DataNoiseGenerator README.

How to Run the Experiments

Each modality-specific directory (/audio, /image, etc.) is self-contained and includes its own README.md file with detailed instructions for:

Setting up the Python environment and dependencies.
Downloading and preparing the dataset.
Using DataNoiseGenerator to create noisy versions of the data.
Running the training scripts for both Federated and Centralized Learning.

Please navigate to the directory of interest to get started.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vulnerability of Federated Learning to Data Noises

Overview

Key Findings

Project Structure

Core Components

1. Multi-Modal Experiments

2. `DataNoiseGenerator`: A Unified Noise Injection Toolkit

How to Run the Experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
DataNoiseGenerator		DataNoiseGenerator
audio		audio
compute_canada		compute_canada
image		image
tabular		tabular
text		text
video		video
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
experiment_plan.md		experiment_plan.md

Folders and files

Latest commit

History

Repository files navigation

Vulnerability of Federated Learning to Data Noises

Overview

Key Findings

Project Structure

Core Components

1. Multi-Modal Experiments

2. DataNoiseGenerator: A Unified Noise Injection Toolkit

How to Run the Experiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. `DataNoiseGenerator`: A Unified Noise Injection Toolkit

Packages