# Hierarchical Clustering

**Hierarchical clustering** refers to a class of clustering methods that seek to build a **hierarchy** of clusters, in which some clusters contain others. In this assignment, we will explore a top-down approach, recursively bipartitioning the data using k-means.

**Note to Amazon EC2 users**: To conserve memory, make sure to stop all the other notebooks before running this notebook.

## Import packages

In [81]:
from __future__ import print_function # to conform python 2.x print to python 3.x
import turicreate
import matplotlib.pyplot as plt
import numpy as np
import sys
import os
import time
from scipy.sparse import csr_matrix
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances
%matplotlib inline

## Load the Wikipedia dataset

In [82]:
wiki = turicreate.SFrame('people_wiki.sframe/')

As we did in previous assignments, let's extract the TF-IDF features:

In [83]:
wiki['tf_idf'] = turicreate.text_analytics.tf_idf(wiki['text'])

To run k-means on this dataset, we should convert the data matrix into a sparse matrix.

In [84]:
def load_sparse_csr(filename):
    loader = np.load(filename)
    data = loader['data']
    indices = loader['indices']
    indptr = loader['indptr']
    shape = loader['shape']
    
    return csr_matrix( (data, indices, indptr), shape)

tf_idf = load_sparse_csr('people_wiki_tf_idf.npz')
map_index_to_word = turicreate.SFrame('people_wiki_map_index_to_word.gl/')

To be consistent with the k-means assignment, let's normalize all vectors to have unit norm.

In [85]:
from sklearn.preprocessing import normalize
tf_idf = normalize(tf_idf)

## Bipartition the Wikipedia dataset using k-means

Recall our workflow for clustering text data with k-means:

1. Load the dataframe containing a dataset, such as the Wikipedia text dataset.
2. Extract the data matrix from the dataframe.
3. Run k-means on the data matrix with some value of k.
4. Visualize the clustering results using the centroids, cluster assignments, and the original dataframe. We keep the original dataframe around because the data matrix does not keep auxiliary information (in the case of the text dataset, the title of each article).

Let us modify the workflow to perform bipartitioning:

1. Load the dataframe containing a dataset, such as the Wikipedia text dataset.
2. Extract the data matrix from the dataframe.
3. Run k-means on the data matrix with k=2.
4. Divide the data matrix into two parts using the cluster assignments.
5. Divide the dataframe into two parts, again using the cluster assignments. This step is necessary to allow for visualization.
6. Visualize the bipartition of data.

We'd like to be able to repeat Steps 3-6 multiple times to produce a **hierarchy** of clusters such as the following:
```
                      (root)
                         |
            +------------+-------------+
            |                          |
         Cluster                    Cluster
     +------+-----+             +------+-----+
     |            |             |            |
   Cluster     Cluster       Cluster      Cluster
```
Each **parent cluster** is bipartitioned to produce two **child clusters**. At the very top is the **root cluster**, which consists of the entire dataset.

Now we write a wrapper function to bipartition a given cluster using k-means. There are three variables that together comprise the cluster:

* `dataframe`: a subset of the original dataframe that correspond to member rows of the cluster
* `matrix`: same set of rows, stored in sparse matrix format
* `centroid`: the centroid of the cluster (not applicable for the root cluster)

Rather than passing around the three variables separately, we package them into a Python dictionary. The wrapper function takes a single dictionary (representing a parent cluster) and returns two dictionaries (representing the child clusters).

In [86]:
def bipartition(cluster, maxiter=400, num_runs=4, seed=None):
    '''cluster: should be a dictionary containing the following keys
                * dataframe: original dataframe
                * matrix:    same data, in matrix format
                * centroid:  centroid for this particular cluster'''
    
    data_matrix = cluster['matrix']
    dataframe   = cluster['dataframe']
    
    # Run k-means on the data matrix with k=2. We use scikit-learn here to simplify workflow.
    kmeans_model = KMeans(n_clusters=2, max_iter=maxiter, n_init=num_runs, random_state=seed)
    kmeans_model.fit(data_matrix)
    centroids, cluster_assignment = kmeans_model.cluster_centers_, kmeans_model.labels_
    
    # Divide the data matrix into two parts using the cluster assignments.
    data_matrix_left_child, data_matrix_right_child = data_matrix[cluster_assignment==0], \
                                                      data_matrix[cluster_assignment==1]
    
    # Divide the dataframe into two parts, again using the cluster assignments.
    cluster_assignment_sa = turicreate.SArray(cluster_assignment) # minor format conversion
    dataframe_left_child, dataframe_right_child     = dataframe[cluster_assignment_sa==0], \
                                                      dataframe[cluster_assignment_sa==1]
        
    
    # Package relevant variables for the child clusters
    cluster_left_child  = {'matrix': data_matrix_left_child,
                           'dataframe': dataframe_left_child,
                           'centroid': centroids[0]}
    cluster_right_child = {'matrix': data_matrix_right_child,
                           'dataframe': dataframe_right_child,
                           'centroid': centroids[1]}
    
    return (cluster_left_child, cluster_right_child)

The following cell performs bipartitioning of the Wikipedia dataset. Allow 20-60 seconds to finish.

Note. For the purpose of the assignment, we set an explicit seed (`seed=1`) to produce identical outputs for every run. In pratical applications, you might want to use different random seeds for all runs.

In [87]:
%%time
wiki_data = {'matrix': tf_idf, 'dataframe': wiki} # no 'centroid' for the root cluster
left_child, right_child = bipartition(wiki_data, maxiter=100, num_runs=1, seed=0)

CPU times: user 10 s, sys: 1.15 s, total: 11.2 s
Wall time: 5.58 s


Let's examine the contents of one of the two clusters, which we call the `left_child`, referring to the tree visualization above.

In [88]:
left_child

{'matrix': <24265x547979 sparse matrix of type '<class 'numpy.float64'>'
 	with 4478284 stored elements in Compressed Sparse Row format>,
 'dataframe': Columns:
 	URI	str
 	name	str
 	text	str
 	tf_idf	dict
 
 Rows: Unknown
 
 Data:
 +-------------------------------+-----------------+
 |              URI              |       name      |
 +-------------------------------+-----------------+
 | <http://dbpedia.org/resour... |  Harpdog Brown  |
 | <http://dbpedia.org/resour... |      G-Enka     |
 | <http://dbpedia.org/resour... |  Sam Henderson  |
 | <http://dbpedia.org/resour... |  Aaron LaCrate  |
 | <http://dbpedia.org/resour... | Trevor Ferguson |
 | <http://dbpedia.org/resour... |   Grant Nelson  |
 | <http://dbpedia.org/resour... |   Cathy Caruth  |
 | <http://dbpedia.org/resour... |   Sophie Crumb  |
 | <http://dbpedia.org/resour... |  Jenn Ashworth  |
 | <http://dbpedia.org/resour... |  Joerg Steineck |
 +-------------------------------+-----------------+
 +-----------------------

And here is the content of the other cluster we named `right_child`.

In [89]:
right_child

{'matrix': <34806x547979 sparse matrix of type '<class 'numpy.float64'>'
 	with 5900999 stored elements in Compressed Sparse Row format>,
 'dataframe': Columns:
 	URI	str
 	name	str
 	text	str
 	tf_idf	dict
 
 Rows: Unknown
 
 Data:
 +-------------------------------+-------------------------------+
 |              URI              |              name             |
 +-------------------------------+-------------------------------+
 | <http://dbpedia.org/resour... |         Digby Morrell         |
 | <http://dbpedia.org/resour... |         Alfred J. Lewy        |
 | <http://dbpedia.org/resour... |      Franz Rottensteiner      |
 | <http://dbpedia.org/resour... |        Jonathan Hoefler       |
 | <http://dbpedia.org/resour... | Anthony Gueterbock, 18th B... |
 | <http://dbpedia.org/resour... |       David Chernushenko      |
 | <http://dbpedia.org/resour... |         Andrew Pinsent        |
 | <http://dbpedia.org/resour... | Paddy Dunne (Gaelic footba... |
 | <http://dbpedia.org/resour.

## Visualize the bipartition

We provide you with a modified version of the visualization function from the k-means assignment. For each cluster, we print the top 5 words with highest TF-IDF weights in the centroid and display excerpts for the 8 nearest neighbors of the centroid.

In [90]:
def display_single_tf_idf_cluster(cluster, map_index_to_word):
    '''map_index_to_word: SFrame specifying the mapping betweeen words and column indices'''
    
    wiki_subset   = cluster['dataframe']
    tf_idf_subset = cluster['matrix']
    centroid      = cluster['centroid']
    
    # Print top 5 words with largest TF-IDF weights in the cluster
    idx = centroid.argsort()[::-1]
    for i in range(5):
        print('{0}:{1:.3f}'.format(map_index_to_word['category'][idx[i]], centroid[idx[i]])),
    print('')
    
    # Compute distances from the centroid to all data points in the cluster.
    distances = pairwise_distances(tf_idf_subset, [centroid], metric='euclidean').flatten()
    # compute nearest neighbors of the centroid within the cluster.
    nearest_neighbors = distances.argsort()
    # For 8 nearest neighbors, print the title as well as first 180 characters of text.
    # Wrap the text at 80-character mark.
    for i in range(8):
        text = ' '.join(wiki_subset[nearest_neighbors[i]]['text'].split(None, 25)[0:25])
        print('* {0:50s} {1:.5f}\n  {2:s}\n  {3:s}'.format(wiki_subset[nearest_neighbors[i]]['name'],
              distances[nearest_neighbors[i]], text[:90], text[90:180] if len(text) > 90 else ''))
    print('')

Let's visualize the two child clusters:

In [91]:
display_single_tf_idf_cluster(left_child, map_index_to_word)

she:0.050
her:0.034
music:0.022
film:0.020
album:0.014

* Madonna (entertainer)                              0.95870
  madonna louise ciccone tkoni born august 16 1958 is an american singer songwriter actress 
  and businesswoman she achieved popularity by pushing the boundaries of lyrical
* Janet Jackson                                      0.95875
  janet damita jo jackson born may 16 1966 is an american singer songwriter and actress know
  n for a series of sonically innovative socially conscious and
* Cher                                               0.96276
  cher r born cherilyn sarkisian may 20 1946 is an american singer actress and television ho
  st described as embodying female autonomy in a maledominated industry
* Laura Smith                                        0.96360
  laura smith is a canadian folk singersongwriter she is best known for her 1995 single shad
  e of your love one of the years biggest hits
* Anita Kunz                                         0.96396
  a

In [93]:
display_single_tf_idf_cluster(right_child, map_index_to_word)

he:0.016
league:0.014
season:0.012
university:0.012
team:0.010

* Todd Williams                                      0.97752
  todd michael williams born february 13 1971 in syracuse new york is a former major league 
  baseball relief pitcher he attended east syracuseminoa high school
* Phil King (footballer)                             0.97853
  philip geoffrey king born 28 december 1967 is an english former professional footballer he
   represented england at under21 level and in a b international he
* Micky Adams                                        0.97860
  michael richard micky adams born 8 november 1961 is an english former professional footbal
  ler turned football manager who is in charge of league two side
* Justin Knoedler                                    0.97907
  justin joseph knoedler born july 17 1980 in springfield illinois is a former major league 
  baseball catcherknoedler was originally drafted by the st louis cardinals
* Chris Day                              

The left cluster consists of athletes, whereas the right cluster consists of non-athletes. So far, we have a single-level hierarchy consisting of two clusters, as follows:

```
                                           Wikipedia
                                               +
                                               |
                    +--------------------------+--------------------+
                    |                                               |
                    +                                               +
                 Athletes                                      Non-athletes
```

Is this hierarchy good enough? **When building a hierarchy of clusters, we must keep our particular application in mind.** For instance, we might want to build a **directory** for Wikipedia articles. A good directory would let you quickly narrow down your search to a small set of related articles. The categories of athletes and non-athletes are too general to facilitate efficient search. For this reason, we decide to build another level into our hierarchy of clusters with the goal of getting more specific cluster structure at the lower level. To that end, we subdivide both the `athletes` and `non-athletes` clusters.

## Perform recursive bipartitioning

### Cluster of athletes

To help identify the clusters we've built so far, let's give them easy-to-read aliases:

In [94]:
athletes = left_child
non_athletes = right_child

Using the bipartition function, we produce two child clusters of the athlete cluster:

In [95]:
# Bipartition the cluster of athletes
left_child_athletes, right_child_athletes = bipartition(athletes, maxiter=100, num_runs=6, seed=1)

The left child cluster mainly consists of baseball players:

In [96]:
display_single_tf_idf_cluster(left_child_athletes, map_index_to_word)

she:0.132
her:0.084
actress:0.011
film:0.011
women:0.011

* Lauren Royal                                       0.93692
  lauren royal born march 3 circa 1965 is a book writer from california royal has written bo
  th historic and novelistic booksa selfproclaimed angels baseball fan
* Janet Jackson                                      0.93777
  janet damita jo jackson born may 16 1966 is an american singer songwriter and actress know
  n for a series of sonically innovative socially conscious and
* Barbara Hershey                                    0.93826
  barbara hershey born barbara lynn herzstein february 5 1948 once known as barbara seagull 
  is an american actress in a career spanning nearly 50 years
* Janine Shepherd                                    0.93956
  janine lee shepherd am born 1962 is an australian pilot and former crosscountry skier shep
  herds career as an athlete ended when she suffered major injuries
* Jane Fonda                                         0.94024


On the other hand, the right child cluster is a mix of players in association football, Austrailian rules football and ice hockey:

In [97]:
display_single_tf_idf_cluster(right_child_athletes, map_index_to_word)

music:0.029
film:0.025
album:0.018
band:0.017
art:0.015

* Julian Knowles                                     0.96786
  julian knowles is an australian composer and performer specialising in new and emerging te
  chnologies his creative work spans the fields of composition for theatre dance
* Peter Combe                                        0.97023
  peter combe born 20 october 1948 is an australian childrens entertainer and musicianmusica
  l genre childrens musiche has had 22 releases including seven gold albums two
* Craig Pruess                                       0.97040
  craig pruess born 1950 is an american composer musician arranger and gold platinum record 
  producer who has been living in britain since 1973 his career
* Ceiri Torjussen                                    0.97091
  ceiri torjussen born 1976 is a composer who has contributed music to dozens of film and te
  levision productions in the ushis music was described by
* Brenton Broadstock                       

Our hierarchy of clusters now looks like this:
```
                                           Wikipedia
                                               +
                                               |
                    +--------------------------+--------------------+
                    |                                               |
                    +                                               +
                 Athletes                                      Non-athletes
                    +
                    |
        +-----------+--------+
        |                    |
        |            association football/
        +          Austrailian rules football/
     baseball             ice hockey
```

Should we keep subdividing the clusters? If so, which cluster should we subdivide? To answer this question, we again think about our application. Since we organize our directory by topics, it would be nice to have topics that are about as coarse as each other. For instance, if one cluster is about baseball, we expect some other clusters about football, basketball, volleyball, and so forth. That is, **we would like to achieve similar level of granularity for all clusters.**

Notice that the right child cluster is more coarse than the left child cluster. The right cluster posseses a greater variety of topics than the left (ice hockey/association football/Austrialian football vs. baseball). So the right child cluster should be subdivided further to produce finer child clusters.

Let's give the clusters aliases as well:

In [98]:
baseball            = left_child_athletes
ice_hockey_football = right_child_athletes

### Cluster of ice hockey players and football players

In answering the following quiz question, take a look at the topics represented in the top documents (those closest to the centroid), as well as the list of words with highest TF-IDF weights.

Let us bipartition the cluster of ice hockey and football players.

In [99]:
left_child_ihs, right_child_ihs = bipartition(ice_hockey_football, maxiter=100, num_runs=6, seed=1)
display_single_tf_idf_cluster(left_child_ihs, map_index_to_word)


album:0.047
music:0.047
band:0.043
released:0.025
jazz:0.024

* Tony Mills (musician)                              0.95588
  tony mills born 7 july 1962 in solihull england is an english rock singer best known for h
  is work with shy and tnthailing from birmingham
* Prince (musician)                                  0.95604
  prince rogers nelson born june 7 1958 known by his mononym prince is an american singerson
  gwriter multiinstrumentalist and actor he has produced ten platinum albums
* Will.i.am                                          0.95661
  william adams born march 15 1975 known by his stage name william pronounced will i am is a
  n american rapper songwriter entrepreneur actor dj record
* Steve Overland                                     0.95741
  steve overland is a british singermusician who was the lead vocalist and songwriter for th
  e bands wildlife fm the ladder shadowman and his own group
* Mark Cross (musician)                              0.95780
  mark cross 

In [None]:
display_single_tf_idf_cluster(right_child_ihs, map_index_to_word)

film:0.035
art:0.023
music:0.019
theatre:0.016
television:0.015

* Justin Edgar                                       0.96768
  justin edgar is a british film directorborn in handsworth birmingham on 18 august 1971 edg
  ar graduated from portsmouth university in 1996 with a first class
* Bill Bennett (director)                            0.96811
  bill bennett born 1953 is an australian film director producer and screenwriterhe dropped 
  out of medicine at queensland university in 1972 and joined the australian
* Paul Swadel                                        0.96844
  paul swadel is a new zealand film director and producerhe has directed and produced many s
  uccessful short films which have screened in competition at cannes
* Shona Auerbach                                     0.96895
  shona auerbach is a british film director and cinematographerauerbach began her career as 
  a stills photographer she studied film at manchester university and cinematography at
* Anton Hecht   

: 

**Quiz Question**. Which diagram best describes the hierarchy right after splitting the `ice_hockey_football` cluster? Refer to the quiz form for the diagrams.

**ANS** [4]

**Caution**. The granularity criteria is an imperfect heuristic and must be taken with a grain of salt. It takes a lot of manual intervention to obtain a good hierarchy of clusters.

* **If a cluster is highly mixed, the top articles and words may not convey the full picture of the cluster.** Thus, we may be misled if we judge the purity of clusters solely by their top documents and words. 
* **Many interesting topics are hidden somewhere inside the clusters but do not appear in the visualization.** We may need to subdivide further to discover new topics. For instance, subdividing the `ice_hockey_football` cluster led to the appearance of runners and golfers.

### Cluster of non-athletes

Now let us subdivide the cluster of non-athletes.

In [None]:
# Bipartition the cluster of non-athletes
left_child_non_athletes, right_child_non_athletes = bipartition(non_athletes, maxiter=100, num_runs=6, seed=1)

In [None]:
display_single_tf_idf_cluster(left_child_non_athletes, map_index_to_word)

he:0.014
university:0.014
law:0.011
served:0.011
president:0.010

* Barry Sullivan (lawyer)                            0.97486
  barry sullivan is a chicago lawyer and as of july 1 2009 the cooney conway chair in advoca
  cy at loyola university chicago school of law
* David Anderson (British Columbia politician)       0.97635
  david a anderson pc oc born august 16 1937 in victoria british columbia is a former canadi
  an cabinet minister educated at victoria college in victoria
* James A. Joseph                                    0.97658
  james a joseph born 1935 is an american former diplomatjoseph is professor of the practice
   of public policy studies at duke university and founder of
* Sven Erik Holmes                                   0.97783
  sven erik holmes is a former federal judge and currently the vice chairman legal risk and 
  regulatory and chief legal officer for kpmg llp a
* Andrew Fois                                        0.97835
  andrew fois is an attorney liv

In [None]:
display_single_tf_idf_cluster(right_child_non_athletes, map_word_to_index)

113949:0.050
113949:0.042
113949:0.036
113949:0.034
113949:0.029

* Tony Smith (footballer, born 1957)                 0.94931
  anthony tony smith born 20 february 1957 is a former footballer who played as a central de
  fender in the football league in the 1970s and
* Justin Knoedler                                    0.94992
  justin joseph knoedler born july 17 1980 in springfield illinois is a former major league 
  baseball catcherknoedler was originally drafted by the st louis cardinals
* Chris Day                                          0.94996
  christopher nicholas chris day born 28 july 1975 is an english professional footballer who
   plays as a goalkeeper for stevenageday started his career at tottenham
* Todd Williams                                      0.95063
  todd michael williams born february 13 1971 in syracuse new york is a former major league 
  baseball relief pitcher he attended east syracuseminoa high school
* Ashley Prescott                                 

Neither of the clusters show clear topics, apart from the genders. Let us divide them further.

In [None]:
male_non_athletes = left_child_non_athletes
female_non_athletes = right_child_non_athletes

**Quiz Question**. Let us bipartition the clusters `male_non_athletes` and `female_non_athletes`. Which diagram best describes the resulting hierarchy of clusters for the non-athletes? Refer to the quiz for the diagrams.

**ANS** [5]

**Note**. Use `maxiter=100, num_runs=6, seed=1` for consistency of output.

In [None]:
left_child_male_non_athletes, right_child_male_non_athletes = bipartition(male_non_athletes, maxiter=100, num_runs=6, seed=1)

In [None]:
left_child_female_non_athletes, right_child_female_non_athletes = bipartition(female_non_athletes, maxiter=100, num_runs=6, seed=1)

In [None]:
display_single_tf_idf_cluster(left_child_male_non_athletes, map_index_to_word)


university:0.015
he:0.014
research:0.013
professor:0.011
his:0.009

* Lawrence W. Green                                  0.97832
  lawrence w green is best known by health education researchers as the originator of the pr
  ecede model and codeveloper of the precedeproceed model which has
* Archie Brown                                       0.97840
  archibald haworth brown cmg fba commonly known as archie brown born 10 may 1938 is a briti
  sh political scientist and historian in 2005 he became
* Timothy Luke                                       0.97919
  timothy w luke is university distinguished professor of political science in the college o
  f liberal arts and human sciences as well as program chair of
* Ferdinand K. Levy                                  0.97923
  ferdinand k levy was a famous management scientist with several important contributions to
   system analysis he was a professor at georgia tech from 1972 until
* Jerry L. Martin                                    0.97

In [None]:
display_single_tf_idf_cluster(left_child_female_non_athletes, map_index_to_word)


football:0.043
season:0.042
league:0.037
played:0.035
team:0.033

* Chris Day                                          0.94837
  christopher nicholas chris day born 28 july 1975 is an english professional footballer who
   plays as a goalkeeper for stevenageday started his career at tottenham
* Jason Roberts (footballer)                         0.94860
  jason andre davis roberts mbe born 25 january 1978 is a former professional footballer and
   now a football punditborn in park royal london roberts was
* Todd Curley                                        0.94883
  todd curley born 14 january 1973 is a former australian rules footballer who played for co
  llingwood and the western bulldogs in the australian football league
* Ashley Prescott                                    0.94894
  ashley prescott born 11 september 1972 is a former australian rules footballer he played w
  ith the richmond and fremantle football clubs in the afl between
* Tony Smith (footballer, born 1957)        

In [None]:
display_single_tf_idf_cluster(right_child_female_non_athletes, map_index_to_word)

baseball:0.108
league:0.102
major:0.053
games:0.047
season:0.045

* Steve Springer                                     0.89269
  steven michael springer born february 11 1961 is an american former professional baseball 
  player who appeared in major league baseball as a third baseman and
* Dave Ford                                          0.89545
  david alan ford born december 29 1956 is a former major league baseball pitcher for the ba
  ltimore orioles born in cleveland ohio ford attended lincolnwest
* Todd Williams                                      0.89818
  todd michael williams born february 13 1971 in syracuse new york is a former major league 
  baseball relief pitcher he attended east syracuseminoa high school
* Justin Knoedler                                    0.89979
  justin joseph knoedler born july 17 1980 in springfield illinois is a former major league 
  baseball catcherknoedler was originally drafted by the st louis cardinals
* James Baldwin (baseball)          

In [None]:
display_single_tf_idf_cluster(right_child_male_non_athletes, map_index_to_word)

party:0.036
election:0.033
minister:0.031
law:0.029
elected:0.023

* Doug Lewis                                         0.95951
  douglas grinslade doug lewis pc qc born april 17 1938 is a former canadian politician a ch
  artered accountant and lawyer by training lewis entered the
* Stephen Harper                                     0.96105
  stephen joseph harper pc mp born april 30 1959 is a canadian politician who is the 22nd an
  d current prime minister of canada and the
* Bob Menendez                                       0.96136
  robert bob menendez born january 1 1954 is the senior united states senator from new jerse
  y he is a member of the democratic party first
* Mal Sandon                                         0.96184
  malcolm john mal sandon born 16 september 1945 is an australian politician he was an austr
  alian labor party member of the victorian legislative council from
* Leo Barry (Canadian jurist)                        0.96199
  leo barry born 1943 is a cana

In [None]:
display_single_tf_idf_cluster(left_child_male_non_athletes, map_index_to_word)

university:0.015
he:0.014
research:0.013
professor:0.011
his:0.009

* Lawrence W. Green                                  0.97832
  lawrence w green is best known by health education researchers as the originator of the pr
  ecede model and codeveloper of the precedeproceed model which has
* Archie Brown                                       0.97840
  archibald haworth brown cmg fba commonly known as archie brown born 10 may 1938 is a briti
  sh political scientist and historian in 2005 he became
* Timothy Luke                                       0.97919
  timothy w luke is university distinguished professor of political science in the college o
  f liberal arts and human sciences as well as program chair of
* Ferdinand K. Levy                                  0.97923
  ferdinand k levy was a famous management scientist with several important contributions to
   system analysis he was a professor at georgia tech from 1972 until
* Jerry L. Martin                                    0.97