# Mallows Hamming
### A python package for Mallows Model using Hamming distance with complete rankings 
&emsp;&emsp; **By Ahmed Boujaada, Fabien Collas and Ekhine Irurozki**

We present methods for working with the Mallows Model (MM), the best-known distribution for permutations. Before continuing reading, please refer to the preliminaries in [link](https://github.com/ekhiru/top-k-mallows/blob/master/Permutations.ipynb).

In [1]:
import numpy as np
import mallows_hamming as mh

## Hamming Distance

In [2]:
perm1= np.array([1,2,0,3,4])
perm2= np.array([0,2,1,3,4])

In [3]:
mh.distance(perm1, perm2)

2

Let $\sigma$ and $\pi$ be two permutations. Hamming distance between $\sigma$ and $\pi$ is the number of positions at which their associated ranks are different. It's given by

$$d(\sigma, \pi) = \sum^n_{i=1} \mathbb{I}[\sigma(i)\neq\pi(i)]$$

where $\mathbb{I[\cdot]}$ represents the Indicator function and $n$ is the length of permutations. Hamming distance between two permutations can be computed as follows: 

In [4]:
perm1 = np.array([3, 1, 2, 0, 4])
perm2 = np.array([3, 1, 4, 2, 0])

In [5]:
mh.distance(perm1, perm2)

3

If only one permutation is given as input, it will be assumed that the second permutation is the identity permutation, $e = (1, 2, \dotsc, n)$.

In [6]:
mh.distance(perm1)

2

The maximum value of Hamming distance between two permutations of length $n$ is $n$ itself. 

In [7]:
n=5

In [8]:
mh.dist_at_uniform(n)

5

This package also allows to sample a permutation at a given Hamming distance where all the possible permutations have the same probability of being generated.

In [9]:
dist=2

In [10]:
mh.sample_at_dist(n,dist, sigma0=[4,1,2,3,0])

array([4., 1., 3., 2., 0.])

*Remark:* Hamming distance between permutations is defined in $\mathbb{N}^{[0,n]}\backslash\{1\}$.

## Mallows Model (MM) for complete permutations

The probability mass function of a Mallows Model with central permutation $\sigma_0$ and dispersion parameter $\theta$ can be computed using the following function:

In [11]:
sigma = np.array([3,1,2,0,4])
sigma_0 = np.array(range(5))
theta = 0.1

mh.prob(sigma, sigma_0, theta)

0.010125860730934329

**Sampling** This package includes a sampler based on the factorization of Hamming  distance. 

In [12]:
mh.sample(m=4,n=5,theta=1.5)

array([[0., 1., 2., 3., 4.],
       [4., 1., 2., 3., 0.],
       [4., 1., 2., 3., 0.],
       [0., 1., 2., 3., 4.]])

Note that in the package, the sampling functions generates the samples considering $\sigma_0 = e$, identity permutation by default. But any other central permutation can be given as a parameter.  
In practice, we can draw a sample from a MM as follows:

In [13]:
mh.sample(m=4, n=5, theta=0.1, s0=np.array([4,3,2,1,0]))

array([[3., 0., 1., 4., 2.],
       [0., 4., 2., 1., 3.],
       [2., 0., 1., 3., 4.],
       [1., 0., 2., 3., 4.]])

In this package, we can specify also the parameter `phi` instead of `theta`. This functionality holds for most functions. The sampling, for example, is done then as follows:

In [14]:
mh.sample(m=4,n=5,phi=.5)

array([[0., 1., 2., 3., 4.],
       [0., 3., 2., 1., 4.],
       [0., 2., 3., 1., 4.],
       [1., 0., 2., 3., 4.]])

**Expected distance** The expected value of Hamming distance under the MM is given by: 
$$\mathbb{E}[D] = \frac{n \cdot \exp(-\theta)}{1 - \exp(-\theta)} - \sum_{j=1}^n\frac{j \cdot \exp(-j \theta)}{1 - \exp(-j \theta)}$$

In [15]:
theta_mm = 0.7
expected_dist = mh.expected_dist_mm(n, theta_mm)
expected_dist

2.9927710045244083

**Learning** Then Hungarian algorithm allows to approximate the central permutation:  

In [16]:
sample_mm = mh.sample(m=40,n=5,phi=.3, s0=[3,2,1,0,4])

In [17]:
mh.median(sample_mm)

array([3, 2, 1, 0, 4])