# Divide and Conquer: Counting Inversions

### Divide and Conquer

* Break the problem into disjoint sub-problems
* Combine these sub-problem solutions efficiently

**Examples**
* Merge sort
  - Split into left and right half and sort each half separately
  - Merge the sorted halves
* Quicksort
  - Re-arrange into lower and upper partitions, sort each partition separately
  - Place pivot between sorted lower and upper partitions

### Recommender systems

* Online services recommend items to you
* Compare your profile with other customers
* Identify people who share your likes and dislikes
* Recommend items that they like
* Comparing profiles: How similar are your rankings to those of others?

**Comparing rankings**
* You and your friend rank $5$ movies $\{A,B,C,D,E\}$
  - Your ranking: $D,B,C,A,E$
  - Your friend's ranking: $B,A,C,D,E$
* How to measure how similar these rankings are?
* For each pair of movies, compare preferences
  - You rank $B$ above $C$, so does your friend
  - You rank $D$ aboe $B$, your friend ranks $B$ above $D$

### Compare based on inversions

**Inversions**
* Pair of movies ranked in opposite order
  - You rank $D$ above $B$, your friend ranks $B$ above $D$
* No inversion $\implies$ rankings identical
* Every pair inverted $\implies$ maximally dissimilar
* Number of inversions range from $0$ to $n(n - 1)/2$ $\rightarrow$ measure of dissimilarity

**Permutations**
* Fix the order of one ranking as a sorted sequence $1, 2, ..., n$
* The other ranking is a permutation of $1, 2, ..., n$
* An inversion is a pair $(i, j), i \lt j$, where $j$ appears before $i$

### Counting inversions

* Number of inversions ranges from $0$ to $n(n - 1)/2$ $\rightarrow$ measure of dissimilarity
* Your ranking: D, B, C, A, E
  - D = 1, B = 2, C = 3, A = 4, E = 5
* Your friend's ranking: B, A, C, D, E
  - 2, 4, 3, 1, 5
* Inversions in 2, 4, 3, 1, 5?
* (1, 2), (1, 3), (1, 4), (3, 4)

**Graphically**
* Write the 2 permutations as 2 rows of nodes
* Connect every pair $(j, j)$ between the two rows

![Graph](https://firebasestorage.googleapis.com/v0/b/fb-sandbox-25.appspot.com/o/W8L1_1.png?alt=media&token=3175e56c-e2c9-4c80-94f5-1d4c000d8d92)

* Every crossing is an inversion
* Brute force - check every $(i, j), O(n^2)$

### Divide and Conquer

* Friend's permutation is $i_1, i_2, ..., i_n$
* Divide into 2 lists
  - $L = [i_1, i_2, ..., i_{n/2}]$
  - $R = [i_{n/2 + 1}, i_{n/2 + 2}, ..., i_{n}]$
* Recursively count inversions in $L$ and $R$
* Add inversions across the boundary between $L$ and $R$
  - $i \in L, j \in R, i \gt j$
  - How many elements in $L$ are bigger than elements in $R$?
* How to count inversions across the boundary?
* Adapt merge sort
* Recursively **sort and count** inversions in $L$ and $R$
* Count inversions while merging - **merge and count**
-------------------------------------------------------------------------------
**Merge and Count**
* Merge $L = [i_1, i_2, ..., i_{n/2}]$ and $R = [i_{n/2 + 1}, i_{n/2 + 2}, ..., i_n]$, sorted
* Count inversions while merging
  - If we add $i_m$ from $R$ to the output, $i_m$ is smaller than elements currently in $L$
  - $i_m$ is hence inverted w.r.t. elements currently in $L$
  - Add current size (total size - current pointer index) of $L$ to the inversion count

In [None]:
def merge_and_count(A, B):
  m = len(A)
  n = len(B)
  C = []
  i, j, k, count = 0, 0, 0, 0

  while k < m + n:
    if i == m:
      C.append(B[j])
      j += 1
      k += 1
    elif j == n:
      C.append(A[i])
      i += 1
      k += 1
    elif A[i] < B[j]:
      C.append(A[i])
      i += 1
      k += 1
    else:
      C.append(B[j])
      j += 1
      k += 1
      count = count + (m - i) # m - i is the current length of L
  
  return (C, count)

* `sort_and_count` is merge sort with `merge_and_count`

In [None]:
def sort_and_count(A):
  n = len(A)
  if n <= 1:
    return (A, 0)
  
  (L, countL) = sort_and_count(A[:n//2])
  (R, countR) = sort_and_count(A[n//2:])
  (B, countB) = merge_and_count(L, R) # countB is cross inversions

  return (B, countL + countR + countB)

### Analysis

* Recurrence is similar to merge sort
  - $T(0) = T(1) = 1$
  - $T(n) = 2T(n/2) + n$
* Solve to get $T(n) = O(n \ log \ n)$
* Note that the number of inversions can still be $O(n^2)$
  - Number ranges from $0$ to $n(n - 1)/2$
* We are counting them efficiently without enumerating each one