## Network Analysis of Shakespeare and Company and Goodreads

This notebooks performs a network analysis of readership in two networks: Shakespeare and Company borrowers and another of Goodreads reviewers. In both networks, books are nodes and edges are created between nodes when the same person reads both of them (more readers reading the same two books = higher edge weight). The Goodreads dataset has been winnowed to include only books that were in the Shakespeare and Co library. 

### Quick data preparation:

In [3]:
# import functions from graph.py
from graph import get_goodreads_graph, get_sc_graph
from core_periphery_sbm import core_periphery as cp

import networkx as nx

from collections import Counter
from operator import itemgetter
import pandas as pd

In [7]:
# get vertex lists, edge weights, vertex to neighbors, and number of nodes
sc_books_in_vertex_order, sc_book_to_vertex_index, sc_edge_to_weight, sc_vertex_to_neighbors, sc_n, sc_book_uri_to_num_events, sc_book_uri_to_text = get_sc_graph()
gr_books_in_vertex_order, gr_book_to_vertex_index, gr_edge_to_weight, gr_vertex_to_neighbors, gr_n, gr_book_uri_to_num_events, gr_book_uri_to_text = get_goodreads_graph()

In [8]:
def print_graph_summary(books_in_vertex_order, book_to_vertex_index, edge_to_weight, vertex_to_neighbors, n):
    print('# of vertices: {:,}'.format(n))
    # all edges are included twice because these are undirected graphs
    print('# of unique edges: {:,}'.format(int(len(edge_to_weight)/2)))
    print('Total edge weights: {:,}'.format(int(sum(edge_to_weight.values())/2)))

    # list the five vertices with highest degree
    print('\nFive books with the most neighbors:')
    vertex_to_degree = {v: len(neighbors) for v, neighbors in vertex_to_neighbors.items()}
    vertex_to_degree_sorted = sorted(vertex_to_degree.items(), reverse=True, key=operator.itemgetter(1))
    for vertex_idx, degree in vertex_to_degree_sorted[:5]:
        vertex_book_name = books_in_vertex_order[vertex_idx]
        print('{} neighbors: {}'.format(degree, vertex_book_name))

In [9]:
#print_graph_summary(sc_books_in_vertex_order, sc_book_to_vertex_index, sc_edge_to_weight, sc_vertex_to_neighbors, sc_n)

In [10]:
# core-periphery code is simplest when in networkx graph format
# following code converts from tuple structure into a list, which will then be added by nodes/edges to graph

def tuple_to_list(edge_to_weight):
    fill = []
    for i in edge_to_weight.items():
        l = list(i)
        fill.append(l)
    edge_to_weight_list = []
    for i in fill:
        edges = i[0]
        listed_edge = list(edges)
        listed_edge.append(i[1])
        edge_to_weight_list.append(listed_edge)
    return edge_to_weight_list

In [11]:
sc_weights_list = tuple_to_list(sc_edge_to_weight)
gr_weights_list = tuple_to_list(gr_edge_to_weight)

In [12]:
# from edge [0] to edge [1], the weight is [2]
sc_weights_list[0]

[1491, 0, 1]

### Shakespeare and Co Analysis

In [13]:
# Create SHAKESPEARE AND CO graph
sc_G = nx.Graph()
sc_G.add_weighted_edges_from(sc_weights_list)

Next chunk is optional: I do this only to make the results more interpretable. 

In [14]:
# optional: change vertex ids to book names
sc_dict = {value:key for key, value in sc_book_to_vertex_index.items()}
mapping = sc_dict # Dictionary from id to title

In [15]:
sc_G = nx.relabel_nodes(sc_G, mapping)

### Core-Periphery Structure

This section draws from Gallagher et al.'s ["A clarified typology of core-periphery structure in networks"](https://advances.sciencemag.org/content/7/12/eabc9800), with code available [here](https://github.com/ryanjgallagher/core_periphery_sbm). Core-periphery structure studies how networks can be divided into a core of densely interconnected nodes, which are highly connected to other core nodes, and periphery nodes that are connected only to core nodes and not each other. This paper develops/assesses two core-periphery model types: the hub-and-spoke model that divides the network into two clean blocks (core vs. periphery) and a layered model that allows for layers of periphery-ness. 

In [16]:
# Initialize hub-and-spoke model and infer structure
hubspoke = cp.HubSpokeCorePeriphery(n_gibbs=100, n_mcmc=10*len(sc_G))
hubspoke.infer(sc_G)

In [116]:
layered = cp.LayeredCorePeriphery(n_layers=4, n_gibbs=100, n_mcmc=10*len(sc_G))
layered.infer(sc_G)

In [117]:
# Get core and periphery assignments from hub-and-spoke model
node2label_hs = hubspoke.get_labels(last_n_samples=50)

# Get layer assignments from the layered model
node2label_l = layered.get_labels(last_n_samples=50)

**Create dataframes**: the goal is to get this into a nice csv with columns for book title, author, h&s cp label, layered cp label, coreness, and probability.

In [118]:
sc_hub_spoke_df = pd.DataFrame.from_dict(node2label_hs, orient='index')
sc_layered_df = pd.DataFrame.from_dict(node2label_l, orient='index')

**Core-Periphery Label:** For both models (hub-and-spoke and layered) the core is 0; for the layered model, the further away from 0 the more peripheral the node. 

In [119]:
# Number of nodes in periphery vs. core in hub-and-spoke
Counter(node2label_hs.values())

Counter({1: 927, 0: 584})

In [120]:
# Number of nodes in each layer between periphery to core
Counter(node2label_l.values())

Counter({3: 594, 0: 375, 2: 371, 1: 171})

**All core books:**

In [121]:
for book, label in node2label_l.items():  # for name, age in dictionary.iteritems():  
    if label == 0:
        print(book)

1919 by Dos Passos, John (1932)
A Farewell to Arms by Hemingway, Ernest (1929)
A Handful of Dust by Waugh, Evelyn (1934)
A High Wind in Jamaica by Hughes, Richard (1929)
A Note in Music by Lehmann, Rosamond (1930)
A Room of One's Own by Woolf, Virginia (1929)
A Room with a View by Forster, E. M.
A Tale of a Tub by Swift, Jonathan (1704)
A World I Never Made by Farrell, James T. (1936)
After Many a Summer by Huxley, Aldous (1939)
Agnes Grey by Brontë, Anne (1847)
Alice Adams by Tarkington, Booth (1921)
All Passion Spent by Sackville-West, Vita (1931)
All This and Heaven Too by Field, Rachel (1938)
An American Tragedy by Dreiser, Theodore (1925)
Angel Pavement by Priestley, J. B. (1930)
Antic Hay by Huxley, Aldous (1923)
Apocalypse by Lawrence, D. H. (1931)
Appointment in Samarra by O'Hara, John (1934)
Arrowsmith by Lewis, Sinclair (1925)
Auld Licht Idylls by Barrie, J. M. (1888)
Autobiographies by Yeats, William Butler (1926)
Autobiography by Powys, John Cowper (1934)
Axel's Castle: A S

**Probability** that a node is in the core vs. periphery.

In [122]:
#  Dictionary of node -> ordered array of probabilities
node2probs_l = layered.get_labels(last_n_samples=50, prob=True, return_dict=True)

In [123]:
sc_layered_probs_df = pd.DataFrame.from_dict(node2probs_l, orient='index')

**Coreness**, where the closer to 1 the more core; closer to 0 the more peripheral.

In [124]:
# Dictionary of node -> coreness
node2coreness_hs = hubspoke.get_coreness(last_n_samples=50, return_dict=True)
node2coreness_l = layered.get_coreness(last_n_samples=50, return_dict=True)

In [125]:
sc_layered_coreness_df = pd.DataFrame.from_dict(node2coreness_l, orient='index')

In [126]:
sc_layered_coreness_df

Unnamed: 0,0
"1914 and Other Poems by Brooke, Rupert (1915)",0.026667
"1919 by Dos Passos, John (1932)",0.913333
"A Backward Glance by Wharton, Edith (1934)",0.340000
"A Book of Nonsense by Lear, Edward (1846)",0.006667
"A Child's Garden of Verses by Stevenson, Robert Louis (1885)",0.253333
...,...
"Young Man with a Horn by Baker, Dorothy (1939)",0.333333
"Young Men in Love by Arlen, Michael (1927)",0.920000
"Zola and His Time by Josephson, Matthew (1928)",0.040000
"Zuleika Dobson by Beerbohm, Max (1911)",0.540000


In [127]:
# all books that are especially "corey" in the layered model
most_core = []
for book, coreness in node2coreness_l.items():  # for name, age in dictionary.iteritems():  
    if coreness > .85:
        most_core.append(book)

In [128]:
most_core

['1919 by Dos Passos, John (1932)',
 'A Farewell to Arms by Hemingway, Ernest (1929)',
 'A Handful of Dust by Waugh, Evelyn (1934)',
 'A High Wind in Jamaica by Hughes, Richard (1929)',
 'A Note in Music by Lehmann, Rosamond (1930)',
 "A Room of One's Own by Woolf, Virginia (1929)",
 'A Room with a View by Forster, E. M.',
 'A Tale of a Tub by Swift, Jonathan (1704)',
 'A World I Never Made by Farrell, James T. (1936)',
 'After Many a Summer by Huxley, Aldous (1939)',
 'Agnes Grey by Brontë, Anne (1847)',
 'Alice Adams by Tarkington, Booth (1921)',
 'All Passion Spent by Sackville-West, Vita (1931)',
 'All This and Heaven Too by Field, Rachel (1938)',
 'An American Tragedy by Dreiser, Theodore (1925)',
 'Angel Pavement by Priestley, J. B. (1930)',
 'Antic Hay by Huxley, Aldous (1923)',
 'Apocalypse by Lawrence, D. H. (1931)',
 "Appointment in Samarra by O'Hara, John (1934)",
 'Arrowsmith by Lewis, Sinclair (1925)',
 'Auld Licht Idylls by Barrie, J. M. (1888)',
 'Autobiographies by Yeat

## Save to csv

In [129]:
sc_layered_df['book_info'] = sc_layered_df.index
sc_layered_df = sc_layered_df.rename(columns={0: "layer"})

In [130]:
sc_hub_spoke_df['book_info'] = sc_hub_spoke_df.index
sc_hub_spoke_df = sc_hub_spoke_df.rename(columns={0: "hub_and_spoke"})

In [131]:
sc_layered_coreness_df['book_info'] = sc_layered_coreness_df.index
sc_layered_coreness_df = sc_layered_coreness_df.rename(columns={0: "coreness"})

In [132]:
full_sc_df = sc_layered_df.merge(sc_hub_spoke_df, how='inner', on='book_info').merge(sc_layered_coreness_df, how='inner', on='book_info')

In [133]:
full_sc_df = full_sc_df[["book_info", "hub_and_spoke", "layer", "coreness"]]

In [134]:
full_sc_df

Unnamed: 0,book_info,hub_and_spoke,layer,coreness
0,"1914 and Other Poems by Brooke, Rupert (1915)",1,3,0.026667
1,"1919 by Dos Passos, John (1932)",0,0,0.913333
2,"A Backward Glance by Wharton, Edith (1934)",1,2,0.340000
3,"A Book of Nonsense by Lear, Edward (1846)",1,3,0.006667
4,"A Child's Garden of Verses by Stevenson, Rober...",1,2,0.253333
...,...,...,...,...
1506,"Young Man with a Horn by Baker, Dorothy (1939)",1,2,0.333333
1507,"Young Men in Love by Arlen, Michael (1927)",0,0,0.920000
1508,"Zola and His Time by Josephson, Matthew (1928)",1,3,0.040000
1509,"Zuleika Dobson by Beerbohm, Max (1911)",0,1,0.540000


In [135]:
# vertex
vertex_title = pd.DataFrame.from_dict(sc_dict, orient = "index")

In [136]:
vertex_title["vertex"] = vertex_title.index

In [137]:
vertex_title = vertex_title.rename(columns={0: "book_info"})

In [138]:
full_sc_df = full_sc_df.merge(vertex_title, how='inner', on='book_info')

In [139]:
full_sc_df = full_sc_df[["vertex","book_info", "layer", "hub_and_spoke", "coreness"]]

In [140]:
full_sc_df.head(10)

Unnamed: 0,vertex,book_info,layer,hub_and_spoke,coreness
0,1292,"1914 and Other Poems by Brooke, Rupert (1915)",3,1,0.026667
1,966,"1919 by Dos Passos, John (1932)",0,0,0.913333
2,128,"A Backward Glance by Wharton, Edith (1934)",2,1,0.34
3,1044,"A Book of Nonsense by Lear, Edward (1846)",3,1,0.006667
4,1088,"A Child's Garden of Verses by Stevenson, Rober...",2,1,0.253333
5,175,"A Christmas Garland by Beerbohm, Max (1912)",3,1,0.026667
6,914,"A City of Bells by Goudge, Elizabeth (1936)",1,0,0.62
7,286,A Connecticut Yankee in King Arthur's Court by...,3,1,0.0
8,841,"A Crystal Age by Hudson, W. H. (1887)",2,1,0.32
9,322,"A Daughter of the Samurai by Sugimoto, Etsuko ...",1,0,0.646667


In [141]:
full_sc_df.to_csv("shakespeare-co-core-periphery.csv")

# Other metrics

### Density
How connected is this graph? This finds the number of exissting edges divided by the number of total possible edges. 

In [43]:
sc_density = nx.density(sc_G)
print("Shakespeare and Co Network Density:", sc_density)

Shakespeare and Co Network Density: 0.17366245765051871


### Transitivity
How likely is it that if book A and book B are read together, and book B and book C are also read together, that books A and C are also connected by an edge? 

In [44]:
triadic_closure = nx.transitivity(sc_G)
print("Triadic closure for S&C:", triadic_closure)

Triadic closure for S&C: 0.6160527307976547


### Diameter length
Because this is not a connected graph, diameter length measures are slightly more complex. The below code finds the largest connected component of the graph, makes that a "subgraph" and then calculates the diameter of the largest connected component. 

In [45]:
# Get the largest connected component of the graph
components = nx.connected_components(sc_G)
largest_component = max(components, key=len)

# Create a "subgraph" of the largest component and find diameter
subgraph = sc_G.subgraph(largest_component)
diameter = nx.diameter(subgraph)
print("Network diameter of Shakespeare and Co's largest component:", diameter)

Network diameter of Shakespeare and Co's largest component: 5


### Centrality Measures
There are multiple ways to assess centrality in a network. Centrality measures usually try to capture something similar to significance or importance in a network--but there are different ways to understand importance. This code looks at the following:
- **degree centrality:** the sum of all of a node's edges. When considering S&C, a book with the highest number of degrees demonstrates that it was the book most often read with other books in the network. This is a measure of a type of popularity (*but remember -> this isn't the book checked out the most times, it's the book checked out the most times with any other book*).  
- **betweeness centrality:** betweenness centrality disregards node degree, and instead focuses on path length for determining the most important nodes. This looks at shortest paths to figure out which nodes connect otherwise disparate parts of the network. 
- **eigenvector centrality:** eigenvector centrality accounts for whether or not a node is connected to many other high-degree nodes--this would make it a hub, and also accounts for a central node that may not have the highest # of degrees, but is highly important, regardless

In [46]:
# degree centrality
sc_degree_dict = dict(sc_G.degree(sc_G.nodes()))
nx.set_node_attributes(sc_G, sc_degree_dict, 'degree')

sc_sorted_degree = sorted(sc_degree_dict.items(), key=itemgetter(1), reverse=True)
print("Top 100 nodes by degree in S&C:")
for d in sc_sorted_degree[:20]:
    print(d)

Top 100 nodes by degree in S&C:
('The Sun Also Rises by Hemingway, Ernest (1926)', 926)
('A Portrait of the Artist as a Young Man by Joyce, James (1916)', 915)
('Dubliners by Joyce, James (1914)', 898)
('Pointed Roofs (Pilgrimage 1) by Richardson, Dorothy M. (1915)', 881)
('A Farewell to Arms by Hemingway, Ernest (1929)', 872)
('Sanctuary by Faulkner, William (1931)', 862)
('Manhattan Transfer by Dos Passos, John (1925)', 861)
('The Garden Party and Other Stories by Mansfield, Katherine (1922)', 857)
('Eyeless in Gaza by Huxley, Aldous (1936)', 856)
('To the Lighthouse by Woolf, Virginia (1927)', 853)
('Mrs. Dalloway by Woolf, Virginia (1925)', 839)
('Mr. Norris Changes Trains by Isherwood, Christopher (1935)', 836)
('The Citadel by Cronin, A. J. (1937)', 834)
('The Rains Came by Bromfield, Louis (1937)', 826)
('Exiles by Joyce, James (1918)', 822)
('An American Tragedy by Dreiser, Theodore (1925)', 821)
('The Years by Woolf, Virginia (1937)', 816)
('The Waves by Woolf, Virginia (1931)

In [47]:
sc_degree_df = pd.DataFrame.from_dict(sc_degree_dict, orient='index')
sc_degree_df["book_info"] = sc_degree_df.index

In [48]:
sc_degree_df = sc_degree_df.rename(columns={0: "degree_centrality"})

In [49]:
# betweenness centrality
betweenness_dict = nx.betweenness_centrality(sc_G) # Run betweenness centrality
# Assign each to an attribute in your network
nx.set_node_attributes(sc_G, betweenness_dict, 'betweenness')

sorted_betweenness = sorted(betweenness_dict.items(), key=itemgetter(1), reverse=True)

print("Top 20 S&C nodes by betweenness centrality:")
for b in sorted_betweenness[:20]:
    print(b)

Top 20 S&C nodes by betweenness centrality:
('A Portrait of the Artist as a Young Man by Joyce, James (1916)', 0.014924835032906583)
('Dubliners by Joyce, James (1914)', 0.010156046367298105)
('Pointed Roofs (Pilgrimage 1) by Richardson, Dorothy M. (1915)', 0.008879411450326712)
('Mr. Norris Changes Trains by Isherwood, Christopher (1935)', 0.00883794691974864)
('The Sun Also Rises by Hemingway, Ernest (1926)', 0.008235339008884771)
('Moby-Dick; Or, the Whale by Melville, Herman (1851)', 0.008034408237163468)
('The Garden Party and Other Stories by Mansfield, Katherine (1922)', 0.008031697753147465)
('Exiles by Joyce, James (1918)', 0.00738764971447236)
('Manhattan Transfer by Dos Passos, John (1925)', 0.007156264192218229)
('Sister Carrie by Dreiser, Theodore (1900)', 0.006408263758941081)
('Bliss and Other Stories by Mansfield, Katherine (1920)', 0.006177860270808386)
('The Way of All Flesh by Butler, Samuel (1903)', 0.006108800015310316)
('Of Human Bondage by Maugham, W. Somerset (1

In [50]:
sc_between_df = pd.DataFrame.from_dict(betweenness_dict, orient='index')
sc_between_df["book_info"] = sc_between_df.index

sc_between_df = sc_between_df.rename(columns={0: "between_centrality"})

In [51]:
# eigenvector centrality
eigenvector_dict = nx.eigenvector_centrality(sc_G) # Run eigenvector centrality
nx.set_node_attributes(sc_G, eigenvector_dict, 'eigenvector')

sorted_eigenvector = sorted(eigenvector_dict.items(), key=itemgetter(1), reverse=True)

print("Top 20 S&C nodes by eigenvector centrality:")
for b in sorted_eigenvector[:20]:
    print(b)

Top 20 S&C nodes by eigenvector centrality:
('The Sun Also Rises by Hemingway, Ernest (1926)', 0.055801125995367726)
('Eyeless in Gaza by Huxley, Aldous (1936)', 0.05548865989154352)
('Mrs. Dalloway by Woolf, Virginia (1925)', 0.05516511583917018)
('A Farewell to Arms by Hemingway, Ernest (1929)', 0.055121774229754)
('To the Lighthouse by Woolf, Virginia (1927)', 0.054963118299115946)
('Sanctuary by Faulkner, William (1931)', 0.05472684041384402)
('The Citadel by Cronin, A. J. (1937)', 0.054447706240786826)
('Sparkenbroke by Morgan, Charles (1936)', 0.054380357944984005)
("Axel's Castle: A Study in the Imaginative Literature of 1870 – 1930 by Wilson, Edmund (1931)", 0.054352362709488754)
('The Death of the Heart by Bowen, Elizabeth (1938)', 0.05426823855787121)
('The Waves by Woolf, Virginia (1931)', 0.05420492470624962)
('The Rains Came by Bromfield, Louis (1937)', 0.05416759259919326)
('An American Tragedy by Dreiser, Theodore (1925)', 0.05413418327159532)
('Manhattan Transfer by Dos

In [52]:
sc_eigenvector_df = pd.DataFrame.from_dict(eigenvector_dict, orient='index')
sc_eigenvector_df["book_info"] = sc_eigenvector_df.index

In [53]:
sc_eigenvector_df = sc_eigenvector_df.rename(columns={0: "eigenvector_centrality"})

### Combine all

In [142]:
full_sc_df = full_sc_df.merge(sc_degree_df, how='inner', on='book_info').merge(sc_between_df, how='inner', on='book_info').merge(sc_eigenvector_df, how='inner', on='book_info')

In [143]:
full_sc_df.head(10)

Unnamed: 0,vertex,book_info,layer,hub_and_spoke,coreness,degree_centrality,between_centrality,eigenvector_centrality
0,1292,"1914 and Other Poems by Brooke, Rupert (1915)",3,1,0.026667,9,0.0,0.000335
1,966,"1919 by Dos Passos, John (1932)",0,0,0.913333,627,0.001662,0.045174
2,128,"A Backward Glance by Wharton, Edith (1934)",2,1,0.34,235,0.000264,0.015205
3,1044,"A Book of Nonsense by Lear, Edward (1846)",3,1,0.006667,13,0.0,0.000778
4,1088,"A Child's Garden of Verses by Stevenson, Rober...",2,1,0.253333,119,0.0,0.00794
5,175,"A Christmas Garland by Beerbohm, Max (1912)",3,1,0.026667,97,3.1e-05,0.006165
6,914,"A City of Bells by Goudge, Elizabeth (1936)",1,0,0.62,336,0.000219,0.026609
7,286,A Connecticut Yankee in King Arthur's Court by...,3,1,0.0,11,0.0,0.000819
8,841,"A Crystal Age by Hudson, W. H. (1887)",2,1,0.32,188,0.000135,0.011854
9,322,"A Daughter of the Samurai by Sugimoto, Etsuko ...",1,0,0.646667,391,0.000215,0.030789


In [144]:
full_sc_df.to_csv("shakespeare-co-core-periphery.csv")

## Model Selection
This is the section that needs continued work: how meaningful are either of these models? 


In [145]:
from core_periphery_sbm import model_fit as mf

# Get description length of hub-and-spoke model
inf_labels_hs = hubspoke.get_labels(last_n_samples=50, prob=False, return_dict=False)
mdl_hubspoke = mf.mdl_hubspoke(sc_G, inf_labels_hs, n_samples=100000)

# Get the description length of layered model
inf_labels_l = layered.get_labels(last_n_samples=50, prob=False, return_dict=False)
mdl_layered = mf.mdl_layered(sc_G, inf_labels_l, n_layers=5, n_samples=100000)

In [146]:
print("Description length of hub-and-spoke model: " + str(mdl_hubspoke))
print("Description length of layered model: " + str(mdl_layered))

Description length of hub-and-spoke model: 364400.60082032176
Description length of layered model: 312625.85650291905


So, the layered model is a better fit since it has a shorter description length. **BUT** this still lacks an assessment of the meaningfulness of the goodneses of fit. 

## Goodreads Core-Periphery Analysis
---> remember! this might not be the best way to compare the two networks since they represent such different types of readership w/ missing books in the Goodreads data

In [59]:
# Create Goodreads graph
gr_G = nx.Graph()
gr_G.add_weighted_edges_from(gr_weights_list)

In [60]:
# optional: change vertexes ids to book names
gr_dict = {value:key for key, value in gr_book_to_vertex_index.items()}
gr_mapping = gr_dict # Dictionary from id to title
gr_G = nx.relabel_nodes(gr_G, gr_mapping)

### Core-Periphery Structure

In [61]:
# Initialize hub-and-spoke model and infer structure
gr_hubspoke = cp.HubSpokeCorePeriphery(n_gibbs=100, n_mcmc=10*len(gr_G))
gr_hubspoke.infer(gr_G)

In [62]:
gr_layered = cp.LayeredCorePeriphery(n_layers=3, n_gibbs=100, n_mcmc=10*len(gr_G))
gr_layered.infer(gr_G)

In [63]:
# Get core and periphery assignments from hub-and-spoke model
gr_node2label_hs = gr_hubspoke.get_labels(last_n_samples=50)

# Get layer assignments from the layered model
gr_node2label_l = gr_layered.get_labels(last_n_samples=50)

In [64]:
gr_hub_spoke_df = pd.DataFrame.from_dict(gr_node2label_hs, orient='index')

In [65]:
gr_layered_df = pd.DataFrame.from_dict(gr_node2label_l, orient='index')

**Core-Periphery Label:** For both models (hub-and-spoke and layered) the core is 0; for the layered model, the further away from 0 the more peripheral the node. 

In [66]:
# Number of nodes in periphery vs. core in hub-and-spoke
Counter(gr_node2label_hs.values())

Counter({1: 1102, 0: 409})

In [67]:
# Number of nodes in each layer between periphery to core
Counter(gr_node2label_l.values())

Counter({2: 834, 1: 352, 0: 325})

In [68]:
for book, label in gr_node2label_hs.items():  # for name, age in dictionary.iteritems():  
    if label == 0:
        print(book)

A Book of Nonsense by Edward Lear (1992)
A Child's Garden of Verses by Robert Louis Stevenson, Tasha Tudor (1999)
A Connecticut Yankee in King Arthur's Court by Mark Twain (2007)
A Doll's House by Henrik Ibsen, Michael   Meyer (2007)
A Farewell to Arms by Ernest Hemingway, njf drybndry (2004)
A Handful of Dust by Evelyn Waugh (1977)
A High Wind in Jamaica by Richard Hughes, Francine Prose (1999)
A Journal of the Plague Year by Daniel Defoe, Cynthia Sundberg Wall (2003)
A Lost Lady by Willa Cather, A.S. Byatt (2006)
A Midsummer Night's Dream by William Shakespeare, Barbara A. Mowat, Paul Werstine, Catherine Belsey (2016)
A Pair of Blue Eyes by Thomas Hardy, Alan Manford, Tim Dolin (2005)
A Portrait of the Artist as a Young Man by James Joyce, Seamus Deane (2003)
A Room of One's Own by Virginia Woolf (2002)
A Room with a View by E.M. Forster (2005)
A Shropshire Lad by A.E. Housman (1990)
A Tale of Two Cities by Charles Dickens, Richard Maxwell, Hablot Knight Browne (2003)
A Tale of a Tub

**Probability** that a node is in the core vs. periphery.

In [69]:
#  Dictionary of node -> ordered array of probabilities
gr_node2probs_hs = gr_hubspoke.get_labels(last_n_samples=50, prob=True, return_dict=True)

# n_nodes x n_layers array of probabilities
gr_inf_probs_l = gr_layered.get_labels(last_n_samples=50, prob=True, return_dict=False)

In [70]:
gr_hs_probs_df = pd.DataFrame.from_dict(gr_node2probs_hs, orient='index')

**Coreness**, where the closer to 1 the more core; closer to 0 the more peripheral.

In [71]:
# Dictionary of node -> coreness
gr_node2coreness_hs = gr_hubspoke.get_coreness(last_n_samples=50, return_dict=True)
gr_node2coreness_l = gr_layered.get_coreness(last_n_samples=50, return_dict=True)

In [72]:
gr_layered_coreness_df = pd.DataFrame.from_dict(gr_node2coreness_l, orient='index')

In [73]:
# all books that are especially "corey" in the layered model
gr_most_core = []
for book, coreness in gr_node2coreness_hs.items():  # for name, age in dictionary.iteritems():  
    if coreness == 1:
        gr_most_core.append(book)

## Save as CSV

In [74]:
gr_layered_df['book_info'] = gr_layered_df.index
gr_layered_df = gr_layered_df.rename(columns={0: "layer"})

In [75]:
gr_hub_spoke_df['book_info'] = gr_hub_spoke_df.index
gr_hub_spoke_df = gr_hub_spoke_df.rename(columns={0: "hub_and_spoke"})

In [76]:
gr_layered_coreness_df['book_info'] = gr_layered_coreness_df.index
gr_layered_coreness_df = gr_layered_coreness_df.rename(columns={0: "coreness"})

In [77]:
gr_layered_df

Unnamed: 0,layer,book_info
"1914, and Other Poems by Rupert Brooke ()",2,"1914, and Other Poems by Rupert Brooke ()"
"1919 (U.S.A., #2) by John dos Passos, E.L. Doctorow (2000)",1,"1919 (U.S.A., #2) by John dos Passos, E.L. Doc..."
"A Backward Glance by Edith Wharton, Louis Auchincloss (1998)",2,"A Backward Glance by Edith Wharton, Louis Auch..."
A Book of Nonsense by Edward Lear (1992),1,A Book of Nonsense by Edward Lear (1992)
"A Child's Garden of Verses by Robert Louis Stevenson, Tasha Tudor (1999)",1,A Child's Garden of Verses by Robert Louis Ste...
...,...,...
"You Have Seen Their Faces by Erskine Caldwell, Margaret Bourke-White, Alan Trachtenberg (1995)",2,"You Have Seen Their Faces by Erskine Caldwell,..."
"Young Man with a Horn by Dorothy Baker, Gary Giddins (2012)",1,"Young Man with a Horn by Dorothy Baker, Gary G..."
Young men in Love by Michael Arlen (1927),2,Young men in Love by Michael Arlen (1927)
Zola and His Time: the History of His Martial Career in Letters by Matthew Josephson (),2,Zola and His Time: the History of His Martial...


In [78]:
full_gr_df = gr_layered_df.merge(gr_hub_spoke_df, how='inner', on='book_info').merge(gr_layered_coreness_df, how='inner', on='book_info')

In [79]:
full_gr_df

Unnamed: 0,layer,book_info,hub_and_spoke,coreness
0,2,"1914, and Other Poems by Rupert Brooke ()",1,0.02
1,1,"1919 (U.S.A., #2) by John dos Passos, E.L. Doc...",1,0.51
2,2,"A Backward Glance by Edith Wharton, Louis Auch...",1,0.02
3,1,A Book of Nonsense by Edward Lear (1992),0,0.51
4,1,A Child's Garden of Verses by Robert Louis Ste...,0,0.50
...,...,...,...,...
1506,2,"You Have Seen Their Faces by Erskine Caldwell,...",1,0.01
1507,1,"Young Man with a Horn by Dorothy Baker, Gary G...",1,0.40
1508,2,Young men in Love by Michael Arlen (1927),1,0.00
1509,2,Zola and His Time: the History of His Martial...,1,0.01


In [80]:
#full_gr_df = full_gr_df[["book_info", "layer", "hub_and_spoke", "coreness"]]

In [81]:
# vertex
#gr_vertex_title = pd.DataFrame.from_dict(gr_dict, orient = "index")

In [82]:
#gr_vertex_title

## Other metrics

### Density

In [83]:
gr_density = nx.density(gr_G)
print("Goodreads Network Density:", gr_density)

Goodreads Network Density: 0.08648805010496974


### Transitivity


In [84]:
gr_triadic_closure = nx.transitivity(gr_G)
print("Triadic closure for Goodreads:", gr_triadic_closure)

Triadic closure for Goodreads: 0.5012635713393944


### Diameter length

In [85]:
# Get the largest connected component of the graph
components = nx.connected_components(gr_G)
largest_component = max(components, key=len)

# Create a "subgraph" of the largest component and find diameter
subgraph = gr_G.subgraph(largest_component)
diameter = nx.diameter(subgraph)
print("Network diameter of Goodread's largest component:", diameter)

Network diameter of Goodread's largest component: 5


### Centrality Measures

In [86]:
# degree centrality
gr_degree_dict = dict(gr_G.degree(gr_G.nodes()))
nx.set_node_attributes(gr_G, gr_degree_dict, 'degree')

gr_sorted_degree = sorted(gr_degree_dict.items(), key=itemgetter(1), reverse=True)
print("Top 100 nodes by degree in Goodreads:")
for d in gr_sorted_degree[:20]:
    print(d)

Top 100 nodes by degree in Goodreads:
('The Great Gatsby by F. Scott Fitzgerald (2004)', 814)
('Pride and Prejudice by Jane Austen, Anna Quindlen, Margaret Oliphant, George Saintsbury, Mark Twain, A.C. Bradley, Walter A. Raleigh, Virginia Woolf (2000)', 762)
('Jane Eyre by Charlotte Bronte, Michael Mason ()', 745)
('Brave New World by Aldous Huxley (1998)', 729)
('Wuthering Heights by Emily Bronte, Richard J. Dunn (2002)', 719)
('A Tale of Two Cities by Charles Dickens, Richard Maxwell, Hablot Knight Browne (2003)', 708)
('Dracula by Bram Stoker, Nina Auerbach, David J. Skal (1986)', 700)
('Of Mice and Men by John Steinbeck ()', 685)
('Persuasion by Jane Austen, James Kinsley, Deidre Shauna Lynch (2004)', 684)
('Emma by Jane Austen, Fiona Stafford (2003)', 679)
('The Woman in White by Wilkie Collins, Matthew Sweet (2003)', 677)
('Great Expectations by Charles Dickens (1998)', 676)
('Moby-Dick or, The Whale by Herman Melville, Andrew Delbanco, Tom Quirk (2003)', 672)
('The Adventures of

In [87]:
gr_degree_df = pd.DataFrame.from_dict(gr_degree_dict, orient='index')
gr_degree_df["book_info"] = gr_degree_df.index

In [88]:
gr_degree_df = gr_degree_df.rename(columns={0: "degree_centrality"})

In [89]:
# betweenness centrality
gr_betweenness_dict = nx.betweenness_centrality(gr_G) # Run betweenness centrality
# Assign each to an attribute in your network
nx.set_node_attributes(gr_G, betweenness_dict, 'betweenness')

gr_sorted_betweenness = sorted(gr_betweenness_dict.items(), key=itemgetter(1), reverse=True)

print("Top 20 GR nodes by betweenness centrality:")
for b in gr_sorted_betweenness[:20]:
    print(b)

Top 20 GR nodes by betweenness centrality:
('The Great Gatsby by F. Scott Fitzgerald (2004)', 0.026349765113623108)
('Brave New World by Aldous Huxley (1998)', 0.0207972017595387)
('Jane Eyre by Charlotte Bronte, Michael Mason ()', 0.020786992815272843)
('Pride and Prejudice by Jane Austen, Anna Quindlen, Margaret Oliphant, George Saintsbury, Mark Twain, A.C. Bradley, Walter A. Raleigh, Virginia Woolf (2000)', 0.019857928605239442)
('Wuthering Heights by Emily Bronte, Richard J. Dunn (2002)', 0.01533441195414627)
('The Woman in White by Wilkie Collins, Matthew Sweet (2003)', 0.013526254340947393)
('Moby-Dick or, The Whale by Herman Melville, Andrew Delbanco, Tom Quirk (2003)', 0.013186658267537328)
('Dracula by Bram Stoker, Nina Auerbach, David J. Skal (1986)', 0.011886426475003497)
('Emma by Jane Austen, Fiona Stafford (2003)', 0.011852531876116791)
('Of Mice and Men by John Steinbeck ()', 0.011745558324084242)
('A Tale of Two Cities by Charles Dickens, Richard Maxwell, Hablot Knight 

In [90]:
gr_between_df = pd.DataFrame.from_dict(gr_betweenness_dict, orient='index')
gr_between_df["book_info"] = gr_between_df.index

gr_between_df = gr_between_df.rename(columns={0: "between_centrality"})


In [91]:
# eigenvector centrality
gr_eigenvector_dict = nx.eigenvector_centrality(gr_G) # Run eigenvector centrality
nx.set_node_attributes(gr_G, gr_eigenvector_dict, 'eigenvector')

gr_sorted_eigenvector = sorted(gr_eigenvector_dict.items(), key=itemgetter(1), reverse=True)

print("Top 20 GR nodes by eigenvector centrality:")
for b in gr_sorted_eigenvector[:20]:
    print(b)

Top 20 GR nodes by eigenvector centrality:
('The Great Gatsby by F. Scott Fitzgerald (2004)', 0.07141580387029405)
('Pride and Prejudice by Jane Austen, Anna Quindlen, Margaret Oliphant, George Saintsbury, Mark Twain, A.C. Bradley, Walter A. Raleigh, Virginia Woolf (2000)', 0.06929307934468357)
('A Tale of Two Cities by Charles Dickens, Richard Maxwell, Hablot Knight Browne (2003)', 0.06899208344284151)
('Jane Eyre by Charlotte Bronte, Michael Mason ()', 0.06843162842427981)
('Dracula by Bram Stoker, Nina Auerbach, David J. Skal (1986)', 0.06838168120414144)
('Wuthering Heights by Emily Bronte, Richard J. Dunn (2002)', 0.06829744469288702)
('Of Mice and Men by John Steinbeck ()', 0.06817852081636962)
('Brave New World by Aldous Huxley (1998)', 0.06816184933600872)
('Persuasion by Jane Austen, James Kinsley, Deidre Shauna Lynch (2004)', 0.06774763454628116)
('Emma by Jane Austen, Fiona Stafford (2003)', 0.0675731959074297)
('The Adventures of Huckleberry Finn by Mark Twain, John Seelye,

In [92]:
gr_eigenvector_df = pd.DataFrame.from_dict(gr_eigenvector_dict, orient='index')
gr_eigenvector_df["book_info"] = gr_eigenvector_df.index

In [93]:
gr_eigenvector_df = gr_eigenvector_df.rename(columns={0: "eigenvector_centrality"})

### Combine all

In [94]:
full_gr_df = full_gr_df.merge(gr_degree_df, how='inner', on='book_info').merge(gr_between_df, how='inner', on='book_info').merge(gr_eigenvector_df, how='inner', on='book_info')

In [95]:
full_gr_df.head(10)

Unnamed: 0,layer,book_info,hub_and_spoke,coreness,degree_centrality,between_centrality,eigenvector_centrality
0,2,"1914, and Other Poems by Rupert Brooke ()",1,0.02,54,1.111448e-05,0.00819
1,1,"1919 (U.S.A., #2) by John dos Passos, E.L. Doc...",1,0.51,153,0.0002721852,0.023052
2,2,"A Backward Glance by Edith Wharton, Louis Auch...",1,0.02,51,2.118215e-05,0.006766
3,1,A Book of Nonsense by Edward Lear (1992),0,0.51,178,3.007401e-05,0.027494
4,1,A Child's Garden of Verses by Robert Louis Ste...,0,0.5,179,0.0001772467,0.026992
5,2,"A Christmas Garland by Max Beerbohm, N. John H...",1,0.01,16,0.0,0.002791
6,1,"A City of Bells (Torminster, #1) by Elizabeth ...",1,0.5,110,5.525246e-05,0.016308
7,0,A Connecticut Yankee in King Arthur's Court by...,0,0.99,469,0.002955074,0.055751
8,2,A Crystal Age by William Henry Hudson (),1,0.04,16,2.139296e-07,0.002547
9,2,A Daughter of the Samurai by Etsu Inagaki Sugi...,1,0.01,17,1.824668e-06,0.002869


In [96]:
full_gr_df.to_csv("goodreads-core-periphery.csv")

## Combine major networks

In [97]:
import json

In [98]:
# Change S&C title to GR name
with open("data/goodreads-text-to-sc-text.json") as json_file:
    text_to_text = json.load(json_file)

#text_to_text = pd.DataFrame.from_dict(data, orient = 'index').reset_index()
#full_sc_name_df = full_sc_name_df.merge(text_to_text, on = 'book_info')

In [99]:
#complete_cp_df = full_gr_df.merge(full_sc_df, how='inner', on='book_info')

text_to_text = pd.DataFrame.from_dict(text_to_text, orient = 'index').reset_index()
text_to_text = text_to_text.rename(columns = {0:"sc_info", "index":"book_info"})

In [100]:
text_to_text

Unnamed: 0,book_info,sc_info
0,Le Morte d'Arthur: King Arthur and the Legends...,"Le Morte d'Arthur by Malory, Thomas (1485)"
1,"Utopia by Thomas More, Paul Turner (2003)","Utopia by More, Thomas (1516)"
2,"Gorboduc by Thomas Sackville, Thomas Norton, I...","Gorboduc by Norton, Thomas (1561)"
3,A Midsummer Night's Dream by William Shakespea...,"A Midsummer's Night's Dream by Shakespeare, Wi..."
4,"Love's Labour's Lost by William Shakespeare, H...","Love's Labour's Lost by Shakespeare, William (..."
...,...,...
1506,"The Republic by Plato, Desmond Lee (2003)",The Republic by Plato
1507,"The Spanish Tragedy by Thomas Kyd, John Matthe...","The Spanish Tragedy by Kyd, Thomas"
1508,"The Tale of Genji by Murasaki Shikibu, Royall ...",The Tale of Genji by Murasaki Shikibu
1509,The Tempest by William Shakespeare (),"The Tempest by Shakespeare, William"


In [101]:
full_sc_df.head()

Unnamed: 0,vertex,book_info,layer,hub_and_spoke,coreness,degree_centrality,between_centrality,eigenvector_centrality
0,1292,"1914 and Other Poems by Brooke, Rupert (1915)",4,1,0.035,9,0.0,0.000335
1,966,"1919 by Dos Passos, John (1932)",1,0,0.77,627,0.001662,0.045174
2,128,"A Backward Glance by Wharton, Edith (1934)",3,1,0.275,235,0.000264,0.015205
3,1044,"A Book of Nonsense by Lear, Edward (1846)",4,1,0.05,13,0.0,0.000778
4,1088,"A Child's Garden of Verses by Stevenson, Rober...",3,1,0.225,119,0.0,0.00794


In [102]:
# merge and drop/rename
sc_data = full_sc_df.merge(text_to_text, left_on = "book_info", right_on = "sc_info")
sc_data = sc_data.drop(['book_info_x'], axis=1)
sc_data = sc_data.rename(columns={"book_info_y":"gr_info"})

In [103]:
sc_data.head()

Unnamed: 0,vertex,layer,hub_and_spoke,coreness,degree_centrality,between_centrality,eigenvector_centrality,gr_info,sc_info
0,1292,4,1,0.035,9,0.0,0.000335,"1914, and Other Poems by Rupert Brooke ()","1914 and Other Poems by Brooke, Rupert (1915)"
1,966,1,0,0.77,627,0.001662,0.045174,"1919 (U.S.A., #2) by John dos Passos, E.L. Doc...","1919 by Dos Passos, John (1932)"
2,128,3,1,0.275,235,0.000264,0.015205,"A Backward Glance by Edith Wharton, Louis Auch...","A Backward Glance by Wharton, Edith (1934)"
3,1044,4,1,0.05,13,0.0,0.000778,A Book of Nonsense by Edward Lear (1992),"A Book of Nonsense by Lear, Edward (1846)"
4,1088,3,1,0.225,119,0.0,0.00794,A Child's Garden of Verses by Robert Louis Ste...,"A Child's Garden of Verses by Stevenson, Rober..."


In [104]:
# Merge sc_data with full_gr_df
combined_df = sc_data.merge(full_gr_df, left_on = "gr_info", right_on = "book_info")

In [105]:
combined_df = combined_df.rename(columns={"layer_x": "sc_layer",
                                               "hub_and_spoke_x":"sc_hub_and_spoke",
                                               "coreness_x":"sc_coreness",
                                               "degree_centrality_x": "sc_degree_centrality",
                                               "between_centrality_x": "sc_between_centrality",
                                               "eigenvector_centrality_x": "sc_eigenvector_centrality",
                                             "layer_y": "gr_layer",
                                                "hub_and_spoke_y": "gr_hub_and_spoke",
                                                "coreness_y":"gr_coreness",
                                                "degree_centrality_y": "gr_degree_centrality",
                                                "between_centrality_y":"gr_between_centrality",
                                                "eigenvector_centrality_y": "gr_eigenvector_centrality"
                                               }
                                      )

In [106]:
combined_df = combined_df[["vertex", "gr_info", "sc_info", "sc_layer", "sc_hub_and_spoke", "sc_coreness", "sc_degree_centrality", "sc_between_centrality", "sc_eigenvector_centrality", "gr_layer", "gr_hub_and_spoke", "gr_coreness", "gr_degree_centrality", "gr_between_centrality", "gr_eigenvector_centrality"]]

In [107]:
combined_df.head()

Unnamed: 0,vertex,gr_info,sc_info,sc_layer,sc_hub_and_spoke,sc_coreness,sc_degree_centrality,sc_between_centrality,sc_eigenvector_centrality,gr_layer,gr_hub_and_spoke,gr_coreness,gr_degree_centrality,gr_between_centrality,gr_eigenvector_centrality
0,1292,"1914, and Other Poems by Rupert Brooke ()","1914 and Other Poems by Brooke, Rupert (1915)",4,1,0.035,9,0.0,0.000335,2,1,0.02,54,1.1e-05,0.00819
1,966,"1919 (U.S.A., #2) by John dos Passos, E.L. Doc...","1919 by Dos Passos, John (1932)",1,0,0.77,627,0.001662,0.045174,1,1,0.51,153,0.000272,0.023052
2,128,"A Backward Glance by Edith Wharton, Louis Auch...","A Backward Glance by Wharton, Edith (1934)",3,1,0.275,235,0.000264,0.015205,2,1,0.02,51,2.1e-05,0.006766
3,1044,A Book of Nonsense by Edward Lear (1992),"A Book of Nonsense by Lear, Edward (1846)",4,1,0.05,13,0.0,0.000778,1,0,0.51,178,3e-05,0.027494
4,1088,A Child's Garden of Verses by Robert Louis Ste...,"A Child's Garden of Verses by Stevenson, Rober...",3,1,0.225,119,0.0,0.00794,1,0,0.5,179,0.000177,0.026992


In [108]:
combined_df.to_csv("combined-gr-sc-core-periphery.csv")

## Merge with Goodreads Ratings

In [109]:
# import Goodreads id to S&C name
with open("data/goodreads-book-id-to-text.json") as json_file:
    data = json.load(json_file)

goodreads_ids = pd.DataFrame.from_dict(data, orient = 'index').reset_index()
goodreads_ids = goodreads_ids.rename(columns = {0:"book_name", "index":"goodreads_id"})

In [110]:
# Merging goodreads_ids with combined dataframe
goodreads_ids_core_periphery = combined_df.merge(goodreads_ids, left_on="gr_info", right_on = "book_name")

In [111]:
goodreads_ids_core_periphery.head()

Unnamed: 0,vertex,gr_info,sc_info,sc_layer,sc_hub_and_spoke,sc_coreness,sc_degree_centrality,sc_between_centrality,sc_eigenvector_centrality,gr_layer,gr_hub_and_spoke,gr_coreness,gr_degree_centrality,gr_between_centrality,gr_eigenvector_centrality,goodreads_id,book_name
0,1292,"1914, and Other Poems by Rupert Brooke ()","1914 and Other Poems by Brooke, Rupert (1915)",4,1,0.035,9,0.0,0.000335,2,1,0.02,54,1.1e-05,0.00819,9857591,"1914, and Other Poems by Rupert Brooke ()"
1,966,"1919 (U.S.A., #2) by John dos Passos, E.L. Doc...","1919 by Dos Passos, John (1932)",1,0,0.77,627,0.001662,0.045174,1,1,0.51,153,0.000272,0.023052,7104,"1919 (U.S.A., #2) by John dos Passos, E.L. Doc..."
2,128,"A Backward Glance by Edith Wharton, Louis Auch...","A Backward Glance by Wharton, Edith (1934)",3,1,0.275,235,0.000264,0.015205,2,1,0.02,51,2.1e-05,0.006766,5261,"A Backward Glance by Edith Wharton, Louis Auch..."
3,1044,A Book of Nonsense by Edward Lear (1992),"A Book of Nonsense by Lear, Edward (1846)",4,1,0.05,13,0.0,0.000778,1,0,0.51,178,3e-05,0.027494,868668,A Book of Nonsense by Edward Lear (1992)
4,1088,A Child's Garden of Verses by Robert Louis Ste...,"A Child's Garden of Verses by Stevenson, Rober...",3,1,0.225,119,0.0,0.00794,1,0,0.5,179,0.000177,0.026992,20413,A Child's Garden of Verses by Robert Louis Ste...


In [112]:
# import Goodreads review data and convert Result ID column to string
goodreads_ratings_reviews = pd.read_csv("goodreads_query_results_filtered.csv")
goodreads_ratings_reviews = goodreads_ratings_reviews[goodreads_ratings_reviews['Result ID'].notna()]
goodreads_ratings_reviews["Result ID"] = goodreads_ratings_reviews["Result ID"].astype(int)
goodreads_ratings_reviews["Result ID"] = goodreads_ratings_reviews["Result ID"].astype(str)

In [113]:
goodreads_ratings_core_periphery = goodreads_ids_core_periphery.merge(goodreads_ratings_reviews, left_on = "goodreads_id", right_on = "Result ID")

In [114]:
goodreads_ratings_core_periphery.head()

Unnamed: 0,vertex,gr_info,sc_info,sc_layer,sc_hub_and_spoke,sc_coreness,sc_degree_centrality,sc_between_centrality,sc_eigenvector_centrality,gr_layer,...,Query Title,Query Author,Query ID,Result ID,Result Title,Result Author,Ratings Count,Text Reviews Count,Original Publication Year,Average Rating
0,1292,"1914, and Other Poems by Rupert Brooke ()","1914 and Other Poems by Brooke, Rupert (1915)",4,1,0.035,9,0.0,0.000335,2,...,1914 and Other Poems,Rupert Brooke,https://shakespeareandco.princeton.edu/books/b...,9857591,"1914, and Other Poems",Rupert Brooke,196.0,15.0,1915,3.88
1,128,"A Backward Glance by Edith Wharton, Louis Auch...","A Backward Glance by Wharton, Edith (1934)",3,1,0.275,235,0.000264,0.015205,2,...,A Backward Glance,Edith Wharton,https://shakespeareandco.princeton.edu/books/w...,5261,A Backward Glance,Edith Wharton,634.0,89.0,1934,3.75
2,1044,A Book of Nonsense by Edward Lear (1992),"A Book of Nonsense by Lear, Edward (1846)",4,1,0.05,13,0.0,0.000778,1,...,A Book of Nonsense,Edward Lear,https://shakespeareandco.princeton.edu/books/l...,868668,A Book of Nonsense,Edward Lear,2040.0,194.0,1846,3.64
3,1088,A Child's Garden of Verses by Robert Louis Ste...,"A Child's Garden of Verses by Stevenson, Rober...",3,1,0.225,119,0.0,0.00794,1,...,A Child's Garden of Verses,Robert Louis Stevenson,https://shakespeareandco.princeton.edu/books/s...,20413,A Child's Garden of Verses,Robert Louis Stevenson,26058.0,651.0,1885,4.25
4,175,"A Christmas Garland by Max Beerbohm, N. John H...","A Christmas Garland by Beerbohm, Max (1912)",4,1,0.035,97,3.1e-05,0.006165,2,...,A Christmas Garland,Max Beerbohm,https://shakespeareandco.princeton.edu/books/b...,36374,A Christmas Garland,Max Beerbohm,47.0,8.0,1912,3.81


In [115]:
goodreads_ratings_core_periphery.to_csv("core-periphery_goodreads_ratings.csv")

### Future work
- might look into allowing multiple cores ([Xiao Zhang, Travis Martin, and M. E. J. Newman. 2015. “Identification of core-periphery structure in networks” *Physical Review* E 91](https://journals.aps.org/pre/abstract/10.1103/PhysRevE.91.032803))
- Stats about the core:
    - density of the core
    - relative size of core
- four quadrants of core -> core, core -> periphery, periphery -> periphery, periphery -> core

## Other basic network measures
Many of these measures are drawn from Ladd et al.'s ["Exploring and Analyzing Network Data with Python"](https://programminghistorian.org/en/lessons/exploring-and-analyzing-network-data-with-python). Each measure is only used with the Shakespeare and Co dataset, but could just as easily be applied to Goodreads.  