# Error in manifold.t_sne._kl_divergence #9711

Closed
opened this Issue Sep 8, 2017 · 2 comments

Projects
None yet
3 participants
Contributor

### goncalo-rodrigues commented Sep 8, 2017

 There seems to be an error in the calculation of kl_divergence in t_sne. (line 140) ``` X_embedded = params.reshape(n_samples, n_components) # Q is a heavy-tailed distribution: Student's t-distribution n = pdist(X_embedded, "sqeuclidean") n += 1. n /= degrees_of_freedom n **= (degrees_of_freedom + 1.0) / -2.0 Q = np.maximum(n / (2.0 * np.sum(n)), MACHINE_EPSILON) # Optimization trick below: np.dot(x, y) is faster than # np.sum(x * y) because it calls BLAS # Objective: C (Kullback-Leibler divergence of P and Q) kl_divergence = 2.0 * np.dot(P, np.log(P / Q))``` The correct formula for t-student probability distribution function is, according to wikipedia: https://en.wikipedia.org/wiki/Student%27s_t-distribution (gamma((v+1)/2)/gamma(sqrt(v*pi)*gamma(v/2)) * (1+(t^2)/v)^(-(v+1)/2) = constant * (1+(t^2)/v)^(-(v+1)/2) where v = degrees of freedom and t-sne uses the pairwise distances as the variable t. We can ignore the constant because it doesn't depend on t and it will be normalized away. The code, however is computing (1/v + (t^2)/v)^(-(v+1)/2). ``` X_embedded = params.reshape(n_samples, n_components) # Q is a heavy-tailed distribution: Student's t-distribution n = pdist(X_embedded, "sqeuclidean") n /= degrees_of_freedom n += 1. n **= (degrees_of_freedom + 1.0) / -2.0 Q = np.maximum(n / (2.0 * np.sum(n)), MACHINE_EPSILON) # Optimization trick below: np.dot(x, y) is faster than # np.sum(x * y) because it calls BLAS # Objective: C (Kullback-Leibler divergence of P and Q) kl_divergence = 2.0 * np.dot(P, np.log(P / Q))``` This would be the correct version (n+=1 after dividing by degrees of freedom)

### goncalo-rodrigues added a commit to goncalo-rodrigues/scikit-learn that referenced this issue Sep 8, 2017

``` Bug fix in tsne kl_divergence issue scikit-learn#9711 ```
`Kl_divergence had a small bug and did not follow the correct formula for the t-student probability distribution. Large error would be induced for large n_components.`
``` 8609a5a ```

### goncalo-rodrigues added a commit to goncalo-rodrigues/scikit-learn that referenced this issue Sep 8, 2017

``` Merge pull request #1 from goncalo-rodrigues/tsne-kl_divergence-bug-fix ```
`Bug fix in tsne kl_divergence  issue scikit-learn#9711`
``` e2f3f12 ```
Member

### jnothman commented Sep 9, 2017

 ping @tomMoral
Contributor

### tomMoral commented Sep 9, 2017

 This is indeed a bug. It should also be changed in the `_barnes_hut` implementation for the computation of the `kl_divergence`: ```# In _barnes_hut_tsne.pyx, line 136 - qij = (((1.0 + dij) / dof) ** exponent) + qij = ((1.0 + dij / dof) ** exponent)``` We should also check the gradient computations as I assume this formula was correct when I checked it.

Merged

### jnothman added a commit to jnothman/scikit-learn that referenced this issue Oct 9, 2017

``` FIX Error in manifold.t_sne._kl_divergence for n_components > 2 (scik… ```
```…it-learn#9712)

Fixes scikit-learn#9711```
``` 5678a92 ```

``` FIX Error in manifold.t_sne._kl_divergence for n_components > 2 (scik… ```
```…it-learn#9712)

Fixes scikit-learn#9711```
``` e24ccd1 ```

### jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017

``` FIX Error in manifold.t_sne._kl_divergence for n_components > 2 (scik… ```
```…it-learn#9712)

Fixes scikit-learn#9711```
``` 1dde61a ```