-
Notifications
You must be signed in to change notification settings - Fork 882
Description
I am running the FID calculation on 100k images from VAR (https://github.com/FoundationVision/VAR). I am using the provided evaluations/VIRTUAL_imagenet512.npz as a reference.
However, I noticed that there are some precision errors between the provided mu, sigma in the npz, and what I get when running the InceptionV3 Model locally and computing the stats on the features:
First entries of mu (.npz)
0.27961591 0.27735934 0.30061378, ...
Calculated values:
0.2756471 0.2786731 0.30220714, ...
While minor, this adds up to an L1 distance of the mu's of 3049, and an L2 of 19. Where this gets more concerning is in the FID calculation:
Looking at each of the terms in FID with the .npz entries:
In def frechet_distance(self, other, eps=1e-6), I assign:
mean_dist = diff.dot(diff)
sample_trace = np.trace(sigma1)
reference_trace = np.trace(sigma2)
cov_mean_trace = - 2 * tr_covmean
FID = mean_dist (0.5444087052905134) + sample_trace (164.41316997634908) + reference_trace(180.60860338424823) + cov_mean_trace (-342.4942215742233) = 3.0719604916645267
But, if I disable reading the precomputed values from the .npz:
def read_statistics(
self, npz_path: str, activations: Tuple[np.ndarray, np.ndarray], can_ignore_precomputed_values: bool = False
) -> Tuple[FIDStatistics, FIDStatistics]:
return tuple(self.compute_statistics(x) for x in activations)
I now get
FID = mean_dist (0.5444087052905134) + sample_trace (164.41316997634908) + reference_trace(179.62816414137643) + cov_mean_trace (-338.0132935033606) = 6.570405034207226
This is a significant difference. Should I trust the values in the .npz file? How were they computed, and at what precision?