## Contents
- <a href='#7-1-1'>Exercise 7.1.1</a>
- <a href='#7-1-2'>Exercise 7.1.2</a>
- <a href='#7-2-1'>Exercise 7.2.1</a>

In [1]:
import numpy as np
from __future__ import division

<a id='7-1-1'></a>
# 7.1.1 

Monte Carlo Integration of E(|X-Y|) where X and Y are independent U(0,1) random variables.

In [2]:
n = 100000
sum = 0
for i in range(n):
    x = np.random.rand()
    y = np.random.rand()
    sum += np.abs(x-y)
sum/n

0.33451502070000289

<a id='7-1-2'></a>
# 7.1.2 

Simulation to estimate the expected Euclidean distance between two random points of a unit square.

In [6]:
n = 100000
sum_ed = 0
for i in range(n):
    x = np.random.rand(2)
    y = np.random.rand(2)
    sum_ed += np.sqrt((x[0]-y[0])**2+(x[1]-y[1])**2)
sum/n

0.33451502070000289

<a id='7-2-1'></a>
# 7.2.1

In [188]:
def Euclidean(x,y):
    xc = np.array(x)
    yc = np.array(y)
    return np.sqrt(np.dot(xc-yc,xc-yc))

In [189]:
# sqrt[(4-1)^2+(4-2)^2]
Euclidean(np.array([4,4]), np.array([1,2]))

3.6055512754639891

In [190]:
Euclidean([1],[0])

1.0

In [191]:
def mean(x):
    """
    This function takes as input a lists of the clusters and outputs
    the overall average of these clusters. This output is stored as
    a tuple so that it can be used to access the cluster index.
    """
    N = len(x)
    n = len(x[0])
    sum_vec = np.zeros(n)
    for point in x:
        sum_vec += np.array(point)
    mean_vec = sum_vec / N
    return tuple(mean_vec)

# mean([[1],[2]])
# mean([[1,2],[3,4],[5,6]])

In [295]:
def agg_(clusters, print_summary = True):
    """
    This function takes as input a dictionary of clusters in 
    Euclidean space and returns the Agglomerative clustering. 
    The key of the dictionary is the centroid of the corresponding
    cluster.
    
    Note that the clustering agglomerative clustering is done in
    place with respect to the clusters list input.
    """
    step = 1
    while len(clusters) > 1:
#     while step < 3:
        # clusters hash table (use centroids as hash keys)
        clusters_ix = {el[0]:i for i,el in enumerate(clusters)}
        # double loop to consider the minimal distance between all pairs of clusters
        n = len(clusters)
        min_dist = 2**32-1
        c1 = None
        c2 = None
        new_cluster = []
        for i in range(n-1):
            for j in range(i+1,n):
                # the distance between centroids of cluster i and cluster j
                distance_ij = Euclidean(clusters[i][0], clusters[j][0])
                if distance_ij < min_dist:
                    min_dist = distance_ij
                    c1 = clusters[i]
                    c2 = clusters[j]
        # merge the two clusters that result in minimum Euclidean distance
        new_cluster = c1[1] + c2[1]
        new_centroid = mean(new_cluster)
        clusters.append([new_centroid, new_cluster])
        # remove the merged clusters from the list 
        del clusters[max(clusters_ix[c1[0]],clusters_ix[c2[0]])]
        del clusters[min(clusters_ix[c1[0]],clusters_ix[c2[0]])]
        if print_summary:
            print 'Step %d:' % step
            print 'Merged clusters: %s and %s' %(str(c1[1]),str(c2[1]))
            print 'New clusters list:'
            print [el[1] for el in clusters] 
            print 'New centroids:'
            print [el[0][0] for el in clusters]
            print ''
            print '--------------------------------------------------------'
            print ''
        step += 1

In [296]:
# an array storing centroid and cluster
clusters = [[(i**2,), [[i**2]]] for i in range(1,10)]
clusters

[[(1,), [[1]]],
 [(4,), [[4]]],
 [(9,), [[9]]],
 [(16,), [[16]]],
 [(25,), [[25]]],
 [(36,), [[36]]],
 [(49,), [[49]]],
 [(64,), [[64]]],
 [(81,), [[81]]]]

In [297]:
agg_(clusters)

Step 1:
Merged clusters: [[1]] and [[4]]
New clusters list:
[[[9]], [[16]], [[25]], [[36]], [[49]], [[64]], [[81]], [[1], [4]]]
New centroids:
[9, 16, 25, 36, 49, 64, 81, 2.5]

--------------------------------------------------------

Step 2:
Merged clusters: [[9]] and [[1], [4]]
New clusters list:
[[[16]], [[25]], [[36]], [[49]], [[64]], [[81]], [[9], [1], [4]]]
New centroids:
[16, 25, 36, 49, 64, 81, 4.666666666666667]

--------------------------------------------------------

Step 3:
Merged clusters: [[16]] and [[25]]
New clusters list:
[[[36]], [[49]], [[64]], [[81]], [[9], [1], [4]], [[16], [25]]]
New centroids:
[36, 49, 64, 81, 4.666666666666667, 20.5]

--------------------------------------------------------

Step 4:
Merged clusters: [[36]] and [[49]]
New clusters list:
[[[64]], [[81]], [[9], [1], [4]], [[16], [25]], [[36], [49]]]
New centroids:
[64, 81, 4.666666666666667, 20.5, 42.5]

--------------------------------------------------------

Step 5:
Merged clusters: [[9], [1], 

In [298]:
#final clusters list
clusters

[[(31.666666666666668,), [[9], [1], [4], [16], [25], [36], [49], [64], [81]]]]