# Executive Summary
This notebook demonstrates how to use the bicluster module to recover group assignment for a collection
of 1000 vectors using the GaussianAsymmetricSBM class.

We use the *estimate_number_of_hillclimbs* function to perform 10 hill climbs with each possible initializer to
demonstrate the variability of scores dependent on the initializer.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from bicluster import GaussianAsymmetricSBM

np.set_printoptions(linewidth=200)

import scipy.stats as st
from tqdm.autonotebook import tqdm
from collections import Counter

import bicluster

import warnings
warnings.filterwarnings('ignore')

Create 1000 vectors by sampling 100 vectors each from 10 clusters in $\mathbb{R}^8$:

In [None]:
K = 10
D = 8
N = 1000

means = np.random.normal(scale=10,size=(K,D))
variances = np.random.gamma(2,size=K)
cov = [np.eye(D)*x for x in variances]

data = [st.multivariate_normal(means[k],cov=cov[k]).rvs(100) for k in range(K)]
data = np.vstack(data)
data.shape

Now create our contigency table $A[i,j] := ||X_i - X_j||^2$:

In [None]:
X = (data**2).sum(axis=1).reshape(-1,1)
Y = X.T
A = X +Y - 2*data.dot(data.T)

Let's see how well and fast random initialization works with 10 hill climbs:

In [None]:
scores, predictions,entropy = bicluster.estimate_number_of_hillclimbs(A,10,num_attempts=10,init=None)

Let's see what cluster assignment predictions it makes for each group. Each batch of 100 vectors **ought** to be in its own cluster 
starting from cluster 0 to cluster 9:

In [None]:
predictions.reshape(50,20)

Only 3 distinct cluster assignments made. This is bad. Let's see how confident it was in it group assignment per vector by 
computing the entropy in its group assignments for each vector. Low entropy (0) means it's very sure. High entropy means it's not sure.

In [None]:
print(entropy.reshape((100,10)))

Let's see how well initializing using the spectralbiclustering algorithm works:

In [None]:
scores2, predictions2,entropy2 = bicluster.estimate_number_of_hillclimbs(A,10,num_attempts=10,init='spectral')

Let's see what group assignment predictions it makes:

In [None]:
print(predictions2.reshape(20,50))

Let's see how confident it is in group assignments by inspecting the assignment entropy:

In [None]:
print(entropy2.reshape(100,10))

Let's see how well the hill climbs work using the *Kmeans* based initializer:

In [None]:
scores3, predictions3,entropy3 = bicluster.estimate_number_of_hillclimbs(A,10,num_attempts=10,init='kmeans')

Let's see what the predicted group assignments are:

In [None]:
print(predictions3.reshape(20,50))

Let's see how confident it is in its group assignments using the Shannon Entropy in its group assignment predictions:

In [None]:
print(entropy3.reshape(100,10))