## The Florentine Families Problem (Solution below)

The next graph is a famous graph of Florentine families.   The data used to build this graph is taken directly
from Breiger and Pattison (1986).

>Breiger, R.L. and Pattison, P.E., 1986. Cumulated social roles: The duality of persons and their algebras. Social networks, 8(3), pp.215-256. [http://commres.net/wiki/_media/cumulated-social-roles-the-duality-of-persons-and-their-algebras.pdf](http://commres.net/wiki/_media/cumulated-social-roles-the-duality-of-persons-and-their-algebras.pdf)


The data is attributed to Padgett and Angell, who published a far more detailed study with
many more variables defining the social network in Padgett and Ansell (1993).

>Padgett, John F., and Christopher K. Ansell. 1983. "Robust Action and the Rise of the Medici: 1400-1434." American Journal of Sociology pp. 1259-1319.  [https://www.stats.ox.ac.uk/~snijders/PadgettAnsell1993.pdf](https://www.stats.ox.ac.uk/~snijders/PadgettAnsell1993.pdf)

In the graph implemented in `networkx`, a link simply represents marriage between members of the families.  Marriage between families was a recognized way
of creating alliances and obligation, and therefore a way of consolidating or extending power.  
This is an undirected graph. Therefore a link is simply a link and doesn't directly represent who owes who, or who has the power. In Padgett and Christopher's discussion, marriage between families is represented by a directed link,
because it matters which family contributed the groom and which family contributed the bride; this
is just one small way in which the graph we analyze below represents a simplification of complex social
facts.

In [5]:
import networkx as nx
ff = nx.florentine_families_graph()

#### Exercise

Part I: Try using the betweenness centrality and degree centrality measures illustrated above for the karate graph
on the Florentine Families graph.  How
well do these two measures correlate? Use `scipy.stats.pearsonr`, which takes two
value sequences, s1 and s2, where the measurement at `s1[i]` should correspond to the measurement at `s2[i].`

Part II:  Which measure does a better job representing the relative powers of
the families, at least as regards to who is the most powerful family?
(Have a look at the Padgett and Ansell paper to see who it should be).

Part III:  Look at a case for which the two measures differ significantly, the Peruzzi family.  What is
the rank of the Peruzzi family by Betweenness Centrality?  What is their rank by Degree Centrality?
Explain why.  Speculate as to which measure is doing a better job of
representing power in the world, using only facts apparent from the graph.

Part IV:  Find two families whose importance is identical according to **both**
measures, and argue using facts deduceable from the graph, that something is being missed.

**Part I**

In [6]:
bc_ff = nx.betweenness_centrality(ff)
bc_ff

{'Acciaiuoli': 0.0,
 'Medici': 0.521978021978022,
 'Castellani': 0.05494505494505495,
 'Peruzzi': 0.02197802197802198,
 'Strozzi': 0.10256410256410257,
 'Barbadori': 0.09340659340659341,
 'Ridolfi': 0.11355311355311355,
 'Tornabuoni': 0.09157509157509157,
 'Albizzi': 0.21245421245421245,
 'Salviati': 0.14285714285714288,
 'Pazzi': 0.0,
 'Bischeri': 0.1043956043956044,
 'Guadagni': 0.2545787545787546,
 'Ginori': 0.0,
 'Lamberteschi': 0.0}

In [7]:
dc_ff = nx.degree_centrality(ff)
dc_ff

{'Acciaiuoli': 0.07142857142857142,
 'Medici': 0.42857142857142855,
 'Castellani': 0.21428571428571427,
 'Peruzzi': 0.21428571428571427,
 'Strozzi': 0.2857142857142857,
 'Barbadori': 0.14285714285714285,
 'Ridolfi': 0.21428571428571427,
 'Tornabuoni': 0.21428571428571427,
 'Albizzi': 0.21428571428571427,
 'Salviati': 0.14285714285714285,
 'Pazzi': 0.07142857142857142,
 'Bischeri': 0.21428571428571427,
 'Guadagni': 0.2857142857142857,
 'Ginori': 0.07142857142857142,
 'Lamberteschi': 0.07142857142857142}

Create two sequences with values derived from the same node sequence.

In [8]:
dc_seq = [dc_ff[n] for n in ff.nodes]
bc_seq = [bc_ff[n] for n in ff.nodes]

The correlation is +.84, which is pretty high, and the correlation has a very good pvalue.

In [9]:
import scipy
scipy.stats.pearsonr(dc_seq,bc_seq)

PearsonRResult(statistic=0.8441513289848926, pvalue=7.575095714865155e-05)

Note:  This result is valid for this graph only.  The correlation will rise
and fall for other graphs.

**Part II**

Both measures rank the Medici family first.  The importance of the Medicis is pretty much the starting point
of the paper, which is about the emergence of statehood.

In [11]:
def second (x):
    return x[1]
bc_ranking = sorted(list(bc_ff.items()),key=second,reverse=True)
dc_ranking = sorted(list(dc_ff.items()),key=second,reverse=True)

In [12]:
bc_ranking[0]

('Medici', 0.521978021978022)

In [13]:
dc_ranking[0]

('Medici', 0.42857142857142855)

Therefore the link structure of the graph, while undirected, does actually do a pretty good job of capturing something about power/influence. Note that this result is more about being clever about choosing what to represent in your graph than it is about the magic of centrality measures.  The simple fact that the Medicis are linked to 6 other families by marriage is telling, and is what is driving these high centrality scores.

**Part III**

Betweenness Centrality ranks the Peruzzi family much lower (11th) than Degree Centrality (5th). 

In [14]:
(bc_fams, bc_scores) = zip(*bc_ranking)
bc_fams.index('Peruzzi')

10

In [15]:
(dc_fams, dc_scores) = zip(*dc_ranking)
dc_fams.index('Peruzzi')

4

This is because the Peruzzi family has a higher degree (3) than many of the families in
the graph, so Degree Centrality ranks it high.  

In [16]:
dc_ff['Peruzzi']

0.21428571428571427

On the other hand, its Betweenness Centrality
score is relatively low

In [17]:
bc_ff['Peruzzi']

0.02197802197802198

because its influence extends over only a small clique of 4 families.  The distance 
from Peruzzi to Medici is 3, and there are many family pairs for which no shortest path
passes through Peruzzi.  It appears from the graph that Betweenness Centrality may be doing
a better job of representing power here.  Having only a very distant connection to
the Medicis is definitely a disadvantage.

**Part IV**

The Pazzi family and the Acciaiuoli family have identical low scores by both measures.

In [None]:
bc_ff['Acciaiuoli'],bc_ff['Pazzi']

In [None]:
dc_ff['Acciaiuoli'],dc_ff['Pazzi']

Arguably the Acciaiuoli  family is much better positioned because they have married directly into the Medici
family.  That is, being directly connected to a very important node ought to count for something,
even if one has few or no other connections.

One measure that addresses this and is well-suited to undirected graphs is Eigenvector Centrality:

In [None]:
kc_ff = nx.eigenvector_centrality(ff)
kc_ff['Medici'],kc_ff['Acciaiuoli'],kc_ff['Pazzi']

To learn more about how Eigenvector Centrality works, as well as more about other Centrality Measures,
see the Centrality Measures Notebook.