# Visualizing Data using t-sne
###  Journal of Machine Learning Research 9 (2008) 2579-2605 (Maaten and Hinton)

## One Line Summary
New visualization scheme to visualize high dimension data in low dimensions (2 or 3) - Improvement of TSNE by

- using joint probabilities instead of conditional probabilies for measuring pairwise similarity between points in both high and low dimensional spaces - ie uses $p_{ij}$ and $q_{ij}$ instead of $p_{i|j}$ and $q_{i|j}$ .  This leads to a symmetric formulations of $p_{ij}$ and $q_{ij}$ unlike sne, leading to simpler gradients


- for $q_{ij}$ (low dimension space) - uses a t distribution instead of a gaussian ( t is heavy tailed, so better , avoids crowding problem)
  - What is crowding problem ?
  - From SNE cost function, penalty is lesser for point pairs with low $p_{j|i}$ and high $q_{j|i}$ - points seperated in original space, closer in lower dimensional space - this leads to points in priginal space being crowded in the lower dimensional space.
  Having a heavy tailed distribution in the lower dimensional space counteracts this behavior a bit, doesn't crowd points in lower dimensional space as much as guassian


  ## The Math

  From SNE,

  p<sub>j|i</sub> = $\frac{e^{-\frac{|x^{i}-x^{j}|^{2}}{2\sigma_i^{2}}}}{\sum_{k,k!=i}{e^{-\frac{|x^{i}-x^{k}|^{2}}{2\sigma_i^{2}}}}}$


Similarity of points in high dimensioan space p<sub>ij</sub>
  p<sub>ij</sub> = 0.5*(p<sub>i|j</sub> + p<sub>j|i</sub>)
 - Note that this is symmetric

 $q_{ij} = \frac{{1 + {|y_i-y_j|^{2}}}^{-1}}{\sum_k\sum_{l,l!=k}{1 + {|y_k-y_l|^{2}}}^{-1}}$


The cost function
$C = \sum_iKL(P_i|Q_i)$ where $P_i$ and $Q_i$ are joint probabilities as defined above

  = $\sum_i\sum_jp_{ij}ln(\frac {p_{ij}}{q_{ij}})$


The partial derivative of C wrt $y_i$ has the form below
$\frac{\partial C}{\partial y_i}$ = $4*\sum_j(p_{ij}-q_{ij})(y_i-y_j)*(1 + |y_i-y_j|^{2} )^{-1}$


Proof -
Change indices notation from i,j to k,l for easier derivative wrt $p_{i}$
$C = \sum_k\sum_lp_{klj}ln(p_{klj}) - \sum_k\sum_lq_{kl}ln(q_{kl})$

Let $q_{kl}$ = $\frac{D_{kl}}{Z}$, where $D_{kl}$ = ${1+|y_k-y_l|^{2}}^{-1}$,
Z = $\sum_k\sum_{l,l!=k}{1 + {|y_k-y_l|^{2}}}^{-1}$
Note that Z is independent of k and l

Therefore, C = C1 -$\sum_k\sum_lp_{kl}ln(D_{kl})$ + $\sum_k\sum_lp_{kl}ln(Z)$

Where C1 is a function of $p_{kl}$, independent of $y_{i}$

Term 1 = C1

Term 2 = -$\sum_k\sum_lp_{kl}ln(D_{kl})$

Term 3 = $\sum_k\sum_lp_{kl}ln(Z)$   = $ln(Z)$ as $ln(Z)$ is independent of k and l, and $\sum_k\sum_lp_{kl}$ = 1

taking partial derivatives,

$\frac{\partial Term1}{\partial y_i}$ = $\frac{\partial C1}{\partial y_i}$ = 0  as C1 is independent of $y_{i}$


$\frac{\partial Term3}{\partial y_i}$ = $\frac{1}{Z}\frac{\partial Z}{\partial y_{i}}$



Z = $\sum_k\sum_{l,l!=k}{1 + {|y_k-y_l|^{2}}}^{-1}$

Note that this formulation of Z is symmetric

The only terms dependent of $y_{i}$  which will stay after partial differentiation arrive

$2*sum_{j,j!=i}{1 + {|y_i-y_j|^{2}}}^{-1}$



 =  $2*sum_{j,j!=i}D_{ij}$

 Therefore,  $\frac{\partial Z}{\partial y_{i}}$ = $2*sum_{j,j!=i}\frac{\partial D_{ij}}{\partial y_i}$


$D_{ij}$ = ${1 + {|y_i-y_j|^{2}}}^{-1}$

$\frac{\partial D_{ij}}{\partial y_i}$ = $-1*D_{ij}^{2}*2*(y_i-y_j)$


Therefore,


$\frac{\partial Term3}{\partial y_i}$ =  $\frac{1}{Z}*2*sum_{j,j!=i}(-2*(y_{i}-y_{j})*{D_{ij}}^2)$


Taking Z inside the summation,


$\frac{\partial Term3}{\partial y_i}$ =

-4*$sum_{j,j!=i}((y_i-y_j)*D_{ij}*q_{ij})$




Finally coming to term 2,

Term 2 = -$\sum_k\sum_lp_{kl}ln(D_{kl})$

The only terms dependent on y_i arrive

-2*$\sum_i\sum_{j,j!=i}p_{ij}ln(D_{ij})$,  where the 2 is because of symmetry


Taking partial derivative of term 2 wrt $y_{i}$,

$\frac{\partial Term2}{\partial y_i}$ =   $-2*\sum_i\sum_{j,j!=i}\frac{p_{ij}}{D_{ij}}\frac{\partial D_{ij}}{\partial y_i}$


Plugging in the form of $\frac{\partial D_{ij}}{\partial y_i}$ from earlier,


we get


$\frac{\partial Term2}{\partial y_i}$ =


$-2*\sum_i\sum_{j,j!=i}\frac{p_{ij}}{D_{ij}}*-2*{D_{ij}}^2*(y_{i}-y_{j})$


= $4*\sum_i\sum_{j,j!=i}p_{ij}{D_{ij}}*(y_{i}-y_{j})$

Adding the 3 derivatives, we get the expected answer


    $\frac{\partial C}{\partial y_i}$ = $\sum_j4*D_{ij}(y_{i}-y_{j})(p_{ij}-q_{ij})$

