In previous chapters we found out the severity and impact that a targeted poisoning attack can have on a Federated Learning model. In this one, we are going to find a way to tackle it, with an end-goal of eliminating the users that try to poison our model. 

Given the context of this research work, we tried to think outside of the box when it comes to ideas in defending against the above mentioned attacks. During the literature review we observed the pattern of defense mechanisms that have been adopted by researchers in the field, which does not include the user reporting anything else than the gradients back to the aggregator. This is of course done due to privacy concerns of the user being identified by any other metadata that they may report.

However, we opted in making the users return their training loss for their local training round. We assume that the training loss for users that act maliciously will behave differently than the honest ones, thus by aggregating this information we will be able to detect them and eliminate them from contributing to the training phase of the model. In later sections we will confirm that allegation by observing the behaviour of the model when this piece of information is utilized. In our knowledge, there is no other published work up to August 2023 that utilizes loss in order to categorize clients as malicious or honest. 

Of course, if a user reports the exact value of their training loss, this could prove catastrophic, as somebody could extract useful information regarding the instances tht the user used for training, something that brakes the promise of the privacy offered to the users. In order to avoid that, we are going to utilize the foundations and the logic behind Local Differential Privacy, by injecting a random amount of noise every time a user reports their loss. 

This chapter is dedicated in introductory experiments, explaining the algorithm behind the idea, and extracting useful observations regarding the performance and scalability of the defense mechanism. 

## Threat Model

To formally describe the algorithm and the logistics of our defense solution, we must first describe the threat model under which we are operating. 

As we have already established, the attack scenario occurs in a Federated Learning context, where users have the responsibility of training a local model which is then communicated to a central authority in charge of aggregating an upgrading the global model with the gradients given by the users. Thus, the users are totally independent and decentralized, something that leads in the server not having any information about their training, other that the values reported by them. In our case, this information includes the weights shaped by the training, and the CrossEntropy Loss of the local model as a result of training the user's dataset. 

We make the assumption that the user reports a correct value for both of the above-mentioned values, as this is crucial for our defense algorithm to function correctly. This can be easily ensured in a real world scenario, by the correct development of the framework, or by introducing cryptographic primitives that help in that context, such as Zero Knowledge Proofs (TODO cite) or Commitments (TODO cite), which of course add computational overhead, but at the same time ensure that a malicious user will not succeed in reporting a false value. Since this dissertation does not include an industrial implementation of the solution, we are not going to focus on how this assumption will be ensured.

We also assume that the server is not actively malicious and not colliding with malicious users, as an arbitrary acting aggregator could ignore the algorithm of the defense and only include malicious individuals in the global training step. 

When it comes to percentages of the participating users being actively malicious, there is no limit, as we are going to examine numbers ranging from 0% up to high percentages. However, as we have already seen, there is no point of raising the percentage higher than 50%, as it makes no difference to the already harmed model. Hence, we are going to assume the maximum percentage of malicious users participating in an FL training process as 50%, and point out that for higher numbers that those, the defense algorithm will work but will have worse results.

Finally, the definition of a "malicious user" expands as a device participating in the training procedure that is totally controlled by an adversary, who can view, alter labels of already existing instances, as well as insert new instances with new, false labels. This could be accomplished either by physical or remote access of the attacker to the victim's device.


## Algorithm of the solution proposed

From the introduction given, it is clear that some alterations to the FL training algorithm must be made in order for our defense idea to be implemented. In this section we are going to state in detail the way those alterations, which will result into a new FL algorithm.

### Local Training Step

In the Local Training step that is being carried out by all the users randomly selected to participate in a round of global training of the Federated Learning Model, the following process is being carried out:

 - The client receives the local model by the centralized entity in charge of coordinating the FL procedure.
 - The user relies on the hyperparameters decided and trains the local model with their data.
 - While doing so, they keep track of the training loss that occurs as a result of updating the gradients of the local model.
 - After completing the training process, the user locally adds to the training loss gathered a quantity of random noise generated by an already known distribution, with predefined bounds, that follows the foundations of Local Differential Privacy.
 - Finally, the user reports back to the server the gradients forming the updated version of the local model, as well as the loss value after the insertion of random noise, and nothing else that will help the centralized authority in recognizing or gathering extra information about the user.
