Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opinion about dataset split #14

Closed
djskwh opened this issue Jun 7, 2023 · 2 comments
Closed

Opinion about dataset split #14

djskwh opened this issue Jun 7, 2023 · 2 comments

Comments

@djskwh
Copy link

djskwh commented Jun 7, 2023

Hi, KarhouTam.
I recently talked about dataset split for Traditional FL(fedavg, fedprox, feddyn, etc. ...) with my colleague.
my colleague insisted that in order to evaluate the FL algorithm, i should evaluate the model on isolated dataset
(i.e. when MNIST, first split 6000~ imgs to test dataset for global model evaluation and then, assign the rest of imgs to each client for test and eval).
as far as i know, global server can't see the entire dataset for privacy issue. right?
i think thats why your dataset creation setting also don't assign a test dataset for global model evaluation.

what is your opinion about this?

@KarhouTam
Copy link
Owner

KarhouTam commented Jun 8, 2023

Hi, @djskwh.

First, actually I think you are right. In my opinion, the global server should not hold a global testset for evaluating. In my imagine, in industry, the side responsible for FL training is unable to obtain the testset that has the same data distribution as the trainset. But in academy, the global testset setting is proposed for evaluation convenience and it seems permissible.

Let us review the code in FL-bench for testing FL methods (no matter traditional or personalized).

for client_id in self.test_clients:
client_local_params = self.generate_client_params(client_id)
stats = self.trainer.test(client_id, client_local_params)
correct_before.append(stats["before"]["test_correct"])
correct_after.append(stats["after"]["test_correct"])
loss_before.append(stats["before"]["test_loss"])
loss_after.append(stats["after"]["test_loss"])
num_samples.append(stats["before"]["test_size"])
loss_before = torch.tensor(loss_before)
loss_after = torch.tensor(loss_after)
correct_before = torch.tensor(correct_before)
correct_after = torch.tensor(correct_after)
num_samples = torch.tensor(num_samples)
self.test_results[self.current_epoch + 1] = {
"loss": "{:.4f} -> {:.4f}".format(
loss_before.sum() / num_samples.sum(),
loss_after.sum() / num_samples.sum(),
),
"accuracy": "{:.2f}% -> {:.2f}%".format(
correct_before.sum() / num_samples.sum() * 100,
correct_after.sum() / num_samples.sum() * 100,
),
}

Suppose I have a global testset, its size is $S$ and number of final model predicting correctly is $C$.

According to your colleague's opinion, the final accuracy of traditional FL methods should be calculated by
$$\frac{C}{S}$$

Suppose I have two FL clients at all, $A, B$, the size of testset part of them are $S_A, S_B$ ($S = S_A + S_B$) and the number of predicting correctly $C_A, C_B$ ($C = C_A + C_B$).

What my code calculated is based on
$$\frac{C_A + C_B}{S_A + S_B} == \frac{C}{S}$$

So in my opinion, the result my code calculated should be the same as the results calculated with a global testset on traditional FL methods.

Of course, personalized FL methods are N/A to this discussion and they are incompatible to the global testset setting.

@djskwh
Copy link
Author

djskwh commented Jun 8, 2023

thanks for the detailed review of your code and explanation.
good luck with your FL research!

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants