#### STOCHASTIC OPTIMIZATION OF SORTING NETWORKS VIA CONTINUOUS RELAXATIONS

This paper deals with an object sorting problem, generally well known in many machine learning pipelines. For instance, the to k-multi-class classification, ranking documents for information retrieval and multi-object target tracking in computer vision.To solve these problems, algorithms are used that typically require the learning of informative representations of complex high-dimensional data, such as images, prior to sorting and subsequent downstream processing. 

However, for a downstream sorting problem, it is not possible to optimize it from end to end because the sorting operator is not differentiable with respect to its input. The goal of this paper is to propose a method that makes the sort operator differentiable almost everywhere with respect to the inputs. This proposed method is $\textbf{NeuralSort}$. This report concerns scientic aspects of NeuralSort. It is organized as follows: 

- $\textbf{Presents a well-understood summay of NeuralSort method}$;
- $\textbf{Give an application of this method on data}$

#### 𝐍𝐞𝐮𝐫𝐚𝐥𝐒𝐨𝐫𝐭 𝐦𝐞𝐭𝐡𝐨𝐝

$\textbf{How understand it}$: in the sorting problem, the output can be viewed as a permutation matrix, which is a square matrix with entries in $\{0,1\}$ such that every row and every column sums to 1. For NeuralSort, we consider other matrix called  unimodal row-stochastic matrix. It is a square matrix with positive real entries, where each row sums to 1 and has a distinct arg max. All permutation matrices are unimodal row-stochastic matrices. 


$\textbf{How NeuralSort is it trained ?}$: the goal is to optimize training objectives involving a sort operator with gradient-based methods.

The problem can be written in the following form:

$\mathcal{L}(\theta,s)= f(P_z,\theta)$ and $ z = sort(s)$

Here, 

- $s\in \mathbb{R}^n$ denotes a vector of n real-valued scores that follows a  Plackett-Luce distribution with 
probability mass function for any $z \in \mathcal{Z}_n $ is given by:
$q(z|s)=\dfrac{s_{z_1}}{Z} \dfrac{s_{z_2}}{Z-s_{z_2}}\cdots \dfrac{s_{z_n}}{Z-\sum_{i=1}^{n-1}s_{z_i}}$, $Z$ is the normalization constant is given by $Z=\sum_{i=1}^{n}s_i$.

- z is the permutation that (deterministically) sorts the scores s, Every
permutation  z is associated with a permutation  matrix $P_z \in \{0,1\}^{n*n}$ with $P_z[i,i]=\mathbb{1}(j=z_i)$.
$\textbf{Example}$, let $s = [9; 10; 5; 2]^T$ , then $sort(s) = [2; 1; 3; 4]^T$ since the largest element is at the second index, second largest element is at the first index and so on. In case of ties, elements are assigned indices in the order they appear. We can obtain the sorted vector simply via $P_{sort(s)}s$.

- $f(·)$ is an arbitrary function of interest assumed to be differentiable w.r.t a set
of parameters $\theta$ and z. 

Since, the sort operation is not, the proposed solution of the authors to derive a relaxation to the sort operator that leads to a surrogate objective with well-defined gradients. In particular, we seek to use such a relaxation to replace the permutation matrix $P_z$ in the objective function above with an approximation $\hat{P}_z$ such that the surrogate
objective $f(\hat{P}_z; \theta)$ is differentiable w.r.t. the scores s.



#### Our implementation

In [20]:
# run the classical KNN with cifar 10
!python run_baseline.py --k=9 --tau=64 --nloglr=3 --method=deterministic --dataset=cifar10


Files already downloaded and verified
Namespace(k=9, tau=64.0, nloglr=3.0, method='deterministic', resume=False, dataset='cifar10')
Beginning epoch 0:  baseline-resnet-cifar10-deterministic-k9-t640-b3
train -1.3696844577789307
val 0.47660000690817833
Saving...
Beginning epoch 1:  baseline-resnet-cifar10-deterministic-k9-t640-b3
train -1.1611714363098145
val 0.5814000078439713
Saving...
Beginning epoch 2:  baseline-resnet-cifar10-deterministic-k9-t640-b3
train -1.0585336685180664
val 0.608400007724762
Saving...
Beginning epoch 3:  baseline-resnet-cifar10-deterministic-k9-t640-b3
train -1.1423571109771729
val 0.7014000089168548
Saving...
Beginning epoch 4:  baseline-resnet-cifar10-deterministic-k9-t640-b3
train -0.7420566082000732
val 0.7208000096082687
Saving...
Beginning epoch 5:  baseline-resnet-cifar10-deterministic-k9-t640-b3
train -0.5786139965057373
val 0.7256000094413757
Saving...
Beginning epoch 6:  baseline-resnet-cifar10-deterministic-k9-t640-b3
train -0.7228672504425049
val 0

In [None]:
# run the classical KNN with cifar 100
!python run_baseline.py --k=9 --tau=64 --nloglr=3 --method=deterministic --dataset=cifar100

Files already downloaded and verified
Namespace(k=9, tau=64.0, nloglr=3.0, method='deterministic', resume=False, dataset='cifar100')
Beginning epoch 0:  baseline-resnet-cifar100-deterministic-k9-t640-b3
^C
Traceback (most recent call last):
  File "/home/onyxia/work/automatic-differentiation/Neuralsort/run_baseline.py", line 138, in <module>
    train(t)
  File "/home/onyxia/work/automatic-differentiation/Neuralsort/run_baseline.py", line 86, in train
    x = x.to(device=gpu)
KeyboardInterrupt


In [None]:
# run the classical KNN with cifar 10
!python run_dknn.py --k=9 --tau=64 --nloglr=3 --method=deterministic --dataset=cifar10


Files already downloaded and verified
Namespace(k=9, tau=64.0, nloglr=3.0, method='deterministic', resume=False, dataset='cifar10', num_train_queries=100, num_test_queries=10, num_train_neighbors=100, num_samples=5, num_epochs=200)
Beginning epoch 0:  dknn-resnet-cifar10-deterministic-k9-t6400-b3
^C


In [None]:
# run the classical KNN with cifar 100
!python run_dknn.py --k=9 --tau=64 --nloglr=3 --method=deterministic --dataset=cifar100

Files already downloaded and verified
Namespace(k=9, tau=64.0, nloglr=3.0, method='deterministic', resume=False, dataset='cifar100', num_train_queries=100, num_test_queries=10, num_train_neighbors=100, num_samples=5, num_epochs=200)
Traceback (most recent call last):
  File "/home/onyxia/work/automatic-differentiation/Neuralsort/run_dknn.py", line 95, in <module>
    h_phi=preactresnet18().to(gpu)
NameError: name 'preactresnet18' is not defined. Did you mean: 'PreActResNet18'?
