# Introduction to Federated Learning

Summarized from "From distributed machine learning to federated learning: a survey", Liu et al

**Federated learning (FL)** emerges as an efficient approach to exploit the distributed resources to collaboratively train a machine learning model.

 FL is a distributed machine learning approach where multiple users collaboratively train a model, while keeping the raw data decentralized without being moved to a single server or data center [

### Types of FL
 - **Horizontal**: (eg. 2 hospitals) The essence of HFL is the union of samples. When users’ features of two datasets overlap more and users overlap less, we divide the dataset horizontally and take out the part of the data where the users’ features of both datasets are the same, but the users are not exactly the same for training (that is the user dimension).
 - **Vertical** (eg bank + e-commerce system): THe essenve of VFL is the union of features. When two datasets have more overlap in users but less overlap in users’ features, we divide the dataset vertically and take out the part of the data where the users on both sides are the same, but the user features are not exactly the same for training (that is the feature dimension)
 - **Hybrid**: In those cases where both datasets have less overlap in users and users’ features, we do not slice the data and use the transfer learning method to overcome the shortfall of data or label.

## Basic Concepts

 - Machine learning is the process to automatically extract the models or patterns from data
 - A machine learning model is an ensemble of a model structure, which is typically expressed as a directed acyclic graph (DAG), data processing units, e.g., activation functions in deep neural networks (DNNs), and the associated parameters or hyper-parameters.
 - The input data can be processed through a machine learning model to generate the output, e.g., the prediction results or the classification results, which is the inference process.
 - According to whether the training data have labels, the training process of machine learning can be classified into four types, i.e., supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.
 - FL focuses on supervised learning
 - **FL is a distributed machine learning approach where multiple users collaboratively train a model, while keeping the raw data distributed without being moved to a single server or data center**

## Characteristics of FL
FL is a special type of distributed machine learning, which differs from other distributed machine learning approaches in the following three points.
 - FL does not allow direct raw data communication, while other approaches have no restriction.
 - FL exploits the distributed computing resources in multiple regions or organizations, while the other approaches generally only utilize a single server or a cluster in a single region, which belongs to a single organization.
 - FL generally takes advantage of encryption or other defense techniques to ensure the data privacy or security, while the other approaches pay little attention to this security issue

## FL model life cycle

The life cycle of an FL model is a description of the state transitions of an FL model from creation to completion

In this paper (liu et al), we adopt a combination of workflow life cycle views with a few variations, condensed into four phases:
 - **The composition phase** is for the creation of an FL model, which is used to address a specific machine learning problem, e.g., classification
 - **The FL training phase** is for the training phase of the FL model.
 - **The FL model evaluation phase is to apply the trained FL models, in order to analyze the performance of the trained FL models on a simulation platform or a real distributed system
 - **The FL model deployment phase** is to deploy the FL model in a real-life scenario to process the data

## FL model's layers

### Presentation Layer
 - The presentation layer is a user interface (UI) for the interaction between users and FL systems at one or multiple stages of the FL model life cycle.
 - In addition, this layer also supports the modules at the user services layer, e.g., shows the status of the distributed training process.

### Training layer 
 - The FL training layer carries out the distributed training process with distributed data and computing resources
 - This layer consists of three modules: parallelization, scheduling, and fault-tolerance.
 - During the training process, the SP (scheduling plan) is generally defined by a training algorithm, which aggregates the updates, i.e., gradients or models, from each computing resource in order to generate a final machine learning model.

### Infrastructure Layer
 - The infrastructure layer provides the interaction between an FL system and the distributed resources
 - This layer contains three modules: a data security module, a data transfer module, and a distributed execution module
 - The **data security module** generally exploits differential privacy (DP) and encryption techniques, e.g., homomorphic, to protect the raw data used during the training process
 - Although the raw data cannot be directly transferred, intermediate data, e.g., the gradients or models, can be communicated among distributed computing resources.

## Distributed Training

### Data Parallelism
 - Data parallelism is realized by having the data processing performed in parallel at different computing resources, with the same model, on different data points.
 - During the training process of FL, the training data is not transferred among different computing resources, while the intermediate data, e.g., the models or the gradients in Formula, are transferred.

### Model parallelism
 - Model parallelism is realized by having independent data processing nodes distributed at different computing resources, so as to process the data points of specific features
- model parallelism is achieved when different parts of each data point are distributed at different computing resources
- original model needs to be partitioned to be distributed at different computing resources. Two organizations generally apply this type of FL when each organization owns parts of the features of users and they would like to collaboratively train a model using the data of all the features, which corresponds to crosssilo FL

### Pipeline parallelism
 - is realized by having dependent data processing nodes distributed at different computing resources
 - the data processing nodes are distributed at multiple computing resources

## Aggregation
### Centralized Aggregation
 - A single parameter server is used to calculate the average models or gradients sent from multiple computing resources (mobiles). 
 - The weights of the model (model) or the gradients are calculated in each computing resource, which are transferred to a parameter server.

### Hierarchical Aggregation
 - A hierarchical architecture is also exploited using multiple parameter servers. A two-layer hierarchical architecture is proposed to reduce the time to transfer models between a parameter server and computing resources. The hierarchical architecture uses a global parameter server (GPS) and multiple region parameter servers.

### Decentralized Aggregation
 - The computing resources can be organized with a connected topology and can communicate with a peer-to-peer manner,
 - The degree and connectivity of the topology affect the communication efficiency and the convergence rate of the aggregation algorithm.

## Data Security

### Data privacy

The techniques to protect data privacy consist of three types: **trusted execution environment (TEE), encryption, differential privacy (DP), and anti-generative adversarial network (GAN) methods**.

 - Homomorphic encryption  allows specific types of computations to be carried out on encrypted input data and to generate an encrypted result, which matches the result of the same computations on the decrypted input data
 - The fully homomorphic encryption supports both addition and multiplication on ciphertext, while partially homomorphic encryption only supports either an addition or a multiplication operation on ciphertext, which corresponds to less computational flexibility and better runtime efficiency.
 - As sharing gradients also leaks the information of training data in horizontal federated learning, it is of much importance to protect the privacy of the intermediate data.
 - the intermediate data can be encrypted using a homomorphic encryption algorithm before being sent to a parameter server
 - homomorphic encryption incurs significant costs in computation and communication during distributed training
 - The adversary can reconstruct other participating clients’ private data, even if it has no knowledge of the label information using the GANs.