## Network Analysis of Shakespeare and Company and Goodreads

This notebooks performs a network analysis of readership in two networks: Shakespeare and Company borrowers and another of Goodreads reviewers. In both networks, books are nodes and edges are created between nodes when the same person reads both of them (more readers reading the same two books = higher edge weight). The Goodreads dataset has been winnowed to include only books that were in the Shakespeare and Co library. 

### Quick data preparation:

In [5]:
# import functions from graph.py
from graph import get_goodreads_graph, get_sc_graph
from core_periphery_sbm import core_periphery as cp

import networkx as nx

from collections import Counter
from operator import itemgetter
import pandas as pd

In [6]:
# get vertex lists, edge weights, vertex to neighbors, and number of nodes
sc_books_in_vertex_order, sc_book_to_vertex_index, sc_edge_to_weight, sc_vertex_to_neighbors, sc_n = get_sc_graph()
gr_books_in_vertex_order, gr_book_to_vertex_index, gr_edge_to_weight, gr_vertex_to_neighbors, gr_n = get_goodreads_graph()

In [7]:
def print_graph_summary(books_in_vertex_order, book_to_vertex_index, edge_to_weight, vertex_to_neighbors, n):
    print('# of vertices: {:,}'.format(n))
    # all edges are included twice because these are undirected graphs
    print('# of unique edges: {:,}'.format(int(len(edge_to_weight)/2)))
    print('Total edge weights: {:,}'.format(int(sum(edge_to_weight.values())/2)))

    # list the five vertices with highest degree
    print('\nFive books with the most neighbors:')
    vertex_to_degree = {v: len(neighbors) for v, neighbors in vertex_to_neighbors.items()}
    vertex_to_degree_sorted = sorted(vertex_to_degree.items(), reverse=True, key=operator.itemgetter(1))
    for vertex_idx, degree in vertex_to_degree_sorted[:5]:
        vertex_book_name = books_in_vertex_order[vertex_idx]
        print('{} neighbors: {}'.format(degree, vertex_book_name))

In [8]:
#print_graph_summary(sc_books_in_vertex_order, sc_book_to_vertex_index, sc_edge_to_weight, sc_vertex_to_neighbors, sc_n)

In [9]:
# core-periphery code is simplest when in networkx graph format
# following code converts from tuple structure into a list, which will then be added by nodes/edges to graph

def tuple_to_list(edge_to_weight):
    fill = []
    for i in edge_to_weight.items():
        l = list(i)
        fill.append(l)
    edge_to_weight_list = []
    for i in fill:
        edges = i[0]
        listed_edge = list(edges)
        listed_edge.append(i[1])
        edge_to_weight_list.append(listed_edge)
    return edge_to_weight_list

In [10]:
sc_weights_list = tuple_to_list(sc_edge_to_weight)
gr_weights_list = tuple_to_list(gr_edge_to_weight)

In [11]:
# from edge [0] to edge [1], the weight is [2]
sc_weights_list[0]

[1342, 1, 1]

### Shakespeare and Co Analysis

In [12]:
# Create SHAKESPEARE AND CO graph
sc_G = nx.Graph()
sc_G.add_weighted_edges_from(sc_weights_list)

Next chunk is optional: I do this only to make the results more interpretable. 

In [13]:
# optional: change vertex ids to book names
sc_dict = {value:key for key, value in sc_book_to_vertex_index.items()}
mapping = sc_dict # Dictionary from id to title

In [14]:
sc_G = nx.relabel_nodes(sc_G, mapping)

### Core-Periphery Structure

This section draws from Gallagher et al.'s ["A clarified typology of core-periphery structure in networks"](https://advances.sciencemag.org/content/7/12/eabc9800), with code available [here](https://github.com/ryanjgallagher/core_periphery_sbm). Core-periphery structure studies how networks can be divided into a core of densely interconnected nodes, which are highly connected to other core nodes, and periphery nodes that are connected only to core nodes and not each other. This paper develops/assesses two core-periphery model types: the hub-and-spoke model that divides the network into two clean blocks (core vs. periphery) and a layered model that allows for layers of periphery-ness. 

In [16]:
# Initialize hub-and-spoke model and infer structure
hubspoke = cp.HubSpokeCorePeriphery(n_gibbs=100, n_mcmc=10*len(sc_G))
hubspoke.infer(sc_G)

In [17]:
layered = cp.LayeredCorePeriphery(n_layers=5, n_gibbs=100, n_mcmc=10*len(sc_G))
layered.infer(sc_G)

In [18]:
# Get core and periphery assignments from hub-and-spoke model
node2label_hs = hubspoke.get_labels(last_n_samples=50)

# Get layer assignments from the layered model
node2label_l = layered.get_labels(last_n_samples=50)

**Create dataframes**: the goal is to get this into a nice csv with columns for book title, author, h&s cp label, layered cp label, coreness, and probability.

In [19]:
sc_hub_spoke_df = pd.DataFrame.from_dict(node2label_hs, orient='index')
sc_layered_df = pd.DataFrame.from_dict(node2label_l, orient='index')

**Core-Periphery Label:** For both models (hub-and-spoke and layered) the core is 0; for the layered model, the further away from 0 the more peripheral the node. 

In [20]:
# Number of nodes in periphery vs. core in hub-and-spoke
Counter(node2label_hs.values())

Counter({1: 845, 0: 539})

In [21]:
# Number of nodes in each layer between periphery to core
Counter(node2label_l.values())

Counter({4: 497, 1: 134, 3: 349, 2: 191, 0: 213})

**All core books:**

In [22]:
for book, label in node2label_l.items():  # for name, age in dictionary.iteritems():  
    if label == 0:
        print(book)

A Farewell to Arms by Hemingway, Ernest (1929)
A Handful of Dust by Waugh, Evelyn (1934)
A High Wind in Jamaica by Hughes, Richard (1929)
A Note in Music by Lehmann, Rosamond (1930)
A Pin to See the Peepshow by Jesse, F. Tennyson (1934)
A Tale of a Tub by Swift, Jonathan (1704)
After Many a Summer by Huxley, Aldous (1939)
Agnes Grey by Brontë, Anne (1847)
Alice Adams by Tarkington, Booth (1921)
All Passion Spent by Sackville-West, Vita (1931)
All This and Heaven Too by Field, Rachel (1938)
Angel Pavement by Priestley, J. B. (1930)
Antic Hay by Huxley, Aldous (1923)
Appointment in Samarra by O'Hara, John (1934)
Arrowsmith by Lewis, Sinclair (1925)
Autobiographies by Yeats, William Butler (1926)
Autobiography by Powys, John Cowper (1934)
Axel's Castle: A Study in the Imaginative Literature of 1870 – 1930 by Wilson, Edmund (1931)
Babbitt by Lewis, Sinclair (1922)
Back Street by Hurst, Fannie (1931)
Beware of Pity by Zweig, Stefan (1939)
Beyond by Galsworthy, John (1917)
Bliss and Other St

**Probability** that a node is in the core vs. periphery.

In [23]:
#  Dictionary of node -> ordered array of probabilities
node2probs_l = layered.get_labels(last_n_samples=50, prob=True, return_dict=True)

In [24]:
sc_layered_probs_df = pd.DataFrame.from_dict(node2probs_l, orient='index')

**Coreness**, where the closer to 1 the more core; closer to 0 the more peripheral.

In [25]:
# Dictionary of node -> coreness
node2coreness_hs = hubspoke.get_coreness(last_n_samples=50, return_dict=True)
node2coreness_l = layered.get_coreness(last_n_samples=50, return_dict=True)

In [26]:
sc_layered_coreness_df = pd.DataFrame.from_dict(node2coreness_l, orient='index')

In [27]:
sc_layered_coreness_df

Unnamed: 0,0
"1914 and Other Poems by Brooke, Rupert (1915)",0.030
"1919 by Dos Passos, John (1932)",0.815
365 Days (1936),0.250
"A Backward Glance by Wharton, Edith (1934)",0.275
"A Book by Barnes, Djuna (1923)",0.015
...,...
"Youth: A Narrative by Conrad, Joseph (1902)",0.515
Zola,0.010
"Zola and His Time by Josephson, Matthew (1928)",0.025
"Zuleika Dobson by Beerbohm, Max (1911)",0.455


In [28]:
# all books that are especially "corey" in the layered model
most_core = []
for book, coreness in node2coreness_l.items():  # for name, age in dictionary.iteritems():  
    if coreness > .85:
        most_core.append(book)

In [29]:
most_core

['A Farewell to Arms by Hemingway, Ernest (1929)',
 'A Handful of Dust by Waugh, Evelyn (1934)',
 'A Note in Music by Lehmann, Rosamond (1930)',
 'A Pin to See the Peepshow by Jesse, F. Tennyson (1934)',
 'After Many a Summer by Huxley, Aldous (1939)',
 'Alice Adams by Tarkington, Booth (1921)',
 'All Passion Spent by Sackville-West, Vita (1931)',
 'Angel Pavement by Priestley, J. B. (1930)',
 'Antic Hay by Huxley, Aldous (1923)',
 "Axel's Castle: A Study in the Imaginative Literature of 1870 – 1930 by Wilson, Edmund (1931)",
 'Babbitt by Lewis, Sinclair (1922)',
 'Beware of Pity by Zweig, Stefan (1939)',
 'Bliss and Other Stories by Mansfield, Katherine (1920)',
 'Buddenbrooks by Mann, Thomas (1924)',
 'Burmese Days by Orwell, George (1934)',
 "Busman's Honeymoon by Sayers, Dorothy (1937)",
 'Celibate Lives by Moore, George (1927)',
 'Christmas Holiday by Maugham, W. Somerset (1939)',
 'Coming Up for Air by Orwell, George (1939)',
 'Crewe Train by Macaulay, Rose (1926)',
 'Daughter of

## Save to csv

In [30]:
sc_layered_df['book_info'] = sc_layered_df.index
sc_layered_df = sc_layered_df.rename(columns={0: "layer"})

In [31]:
sc_hub_spoke_df['book_info'] = sc_hub_spoke_df.index
sc_hub_spoke_df = sc_hub_spoke_df.rename(columns={0: "hub_and_spoke"})

In [32]:
sc_layered_coreness_df['book_info'] = sc_layered_coreness_df.index
sc_layered_coreness_df = sc_layered_coreness_df.rename(columns={0: "coreness"})

In [33]:
full_sc_df = sc_layered_df.merge(sc_hub_spoke_df, how='inner', on='book_info').merge(sc_layered_coreness_df, how='inner', on='book_info')

In [34]:
full_sc_df = full_sc_df[["book_info", "hub_and_spoke", "layer", "coreness"]]

In [35]:
full_sc_df

Unnamed: 0,book_info,hub_and_spoke,layer,coreness
0,"1914 and Other Poems by Brooke, Rupert (1915)",1,4,0.030
1,"1919 by Dos Passos, John (1932)",0,1,0.815
2,365 Days (1936),1,3,0.250
3,"A Backward Glance by Wharton, Edith (1934)",1,3,0.275
4,"A Book by Barnes, Djuna (1923)",1,4,0.015
...,...,...,...,...
1379,"Youth: A Narrative by Conrad, Joseph (1902)",0,2,0.515
1380,Zola,1,4,0.010
1381,"Zola and His Time by Josephson, Matthew (1928)",1,4,0.025
1382,"Zuleika Dobson by Beerbohm, Max (1911)",0,2,0.455


In [36]:
# vertex
vertex_title = pd.DataFrame.from_dict(sc_dict, orient = "index")

In [37]:
vertex_title["vertex"] = vertex_title.index

In [38]:
vertex_title = vertex_title.rename(columns={0: "book_info"})

In [39]:
full_sc_df = full_sc_df.merge(vertex_title, how='inner', on='book_info')

In [40]:
full_sc_df = full_sc_df[["vertex","book_info", "layer", "hub_and_spoke", "coreness"]]

In [41]:
full_sc_df.head(10)

Unnamed: 0,vertex,book_info,layer,hub_and_spoke,coreness
0,1340,"1914 and Other Poems by Brooke, Rupert (1915)",4,1,0.03
1,1027,"1919 by Dos Passos, John (1932)",1,0,0.815
2,192,365 Days (1936),3,1,0.25
3,614,"A Backward Glance by Wharton, Edith (1934)",3,1,0.275
4,591,"A Book by Barnes, Djuna (1923)",4,1,0.015
5,868,"A Book of Nonsense by Lear, Edward (1846)",4,1,0.025
6,796,"A Child's Garden of Verses by Stevenson, Rober...",3,1,0.205
7,877,"A Christmas Garland by Beerbohm, Max (1912)",4,1,0.05
8,374,"A City of Bells by Goudge, Elizabeth (1936)",2,0,0.47
9,704,A Connecticut Yankee in King Arthur's Court by...,4,1,0.02


In [42]:
#full_sc_df.to_csv("shakespeare-co-core-periphery.csv")

# Other metrics

### Density
How connected is this graph? This finds the number of exissting edges divided by the number of total possible edges. 

In [43]:
sc_density = nx.density(sc_G)
print("Shakespeare and Co Network Density:", sc_density)

Shakespeare and Co Network Density: 0.17519403658796534


### Transitivity
How likely is it that if book A and book B are read together, and book B and book C are also read together, that books A and C are also connected by an edge? 

In [44]:
triadic_closure = nx.transitivity(sc_G)
print("Triadic closure for S&C:", triadic_closure)

Triadic closure for S&C: 0.6258626488716129


### Diameter length
Because this is not a connected graph, diameter length measures are slightly more complex. The below code finds the largest connected component of the graph, makes that a "subgraph" and then calculates the diameter of the largest connected component. 

In [45]:
# Get the largest connected component of the graph
components = nx.connected_components(sc_G)
largest_component = max(components, key=len)

# Create a "subgraph" of the largest component and find diameter
subgraph = sc_G.subgraph(largest_component)
diameter = nx.diameter(subgraph)
print("Network diameter of Shakespeare and Co's largest component:", diameter)

Network diameter of Shakespeare and Co's largest component: 5


### Centrality Measures
There are multiple ways to assess centrality in a network. Centrality measures usually try to capture something similar to significance or importance in a network--but there are different ways to understand importance. This code looks at the following:
- **degree centrality:** the sum of all of a node's edges. When considering S&C, a book with the highest number of degrees demonstrates that it was the book most often read with other books in the network. This is a measure of a type of popularity (*but remember -> this isn't the book checked out the most times, it's the book checked out the most times with any other book*).  
- **betweeness centrality:** betweenness centrality disregards node degree, and instead focuses on path length for determining the most important nodes. This looks at shortest paths to figure out which nodes connect otherwise disparate parts of the network. 
- **eigenvector centrality:** eigenvector centrality accounts for whether or not a node is connected to many other high-degree nodes--this would make it a hub, and also accounts for a central node that may not have the highest # of degrees, but is highly important, regardless

In [46]:
# degree centrality
sc_degree_dict = dict(sc_G.degree(sc_G.nodes()))
nx.set_node_attributes(sc_G, sc_degree_dict, 'degree')

sc_sorted_degree = sorted(sc_degree_dict.items(), key=itemgetter(1), reverse=True)
print("Top 100 nodes by degree in S&C:")
for d in sc_sorted_degree[:20]:
    print(d)

Top 100 nodes by degree in S&C:
('The Sun Also Rises by Hemingway, Ernest (1926)', 858)
('Dubliners by Joyce, James (1914)', 812)
('A Farewell to Arms by Hemingway, Ernest (1929)', 812)
('Sanctuary by Faulkner, William (1931)', 807)
('Eyeless in Gaza by Huxley, Aldous (1936)', 799)
('To the Lighthouse by Woolf, Virginia (1927)', 790)
('Pointed Roofs (Pilgrimage 1) by Richardson, Dorothy M. (1915)', 787)
('Manhattan Transfer by Dos Passos, John (1925)', 786)
('Mrs. Dalloway by Woolf, Virginia (1925)', 781)
('The Citadel by Cronin, A. J. (1937)', 780)
('Mr. Norris Changes Trains by Isherwood, Christopher (1935)', 776)
('New Writing', 774)
('The Garden Party and Other Stories by Mansfield, Katherine (1922)', 766)
('The Rains Came by Bromfield, Louis (1937)', 763)
('Sparkenbroke by Morgan, Charles (1936)', 761)
("Axel's Castle: A Study in the Imaginative Literature of 1870 – 1930 by Wilson, Edmund (1931)", 759)
('The Years by Woolf, Virginia (1937)', 758)
('The Waves by Woolf, Virginia (19

In [47]:
sc_degree_df = pd.DataFrame.from_dict(sc_degree_dict, orient='index')
sc_degree_df["book_info"] = sc_degree_df.index

In [48]:
sc_degree_df = sc_degree_df.rename(columns={0: "degree_centrality"})

In [49]:
# betweenness centrality
betweenness_dict = nx.betweenness_centrality(sc_G) # Run betweenness centrality
# Assign each to an attribute in your network
nx.set_node_attributes(sc_G, betweenness_dict, 'betweenness')

sorted_betweenness = sorted(betweenness_dict.items(), key=itemgetter(1), reverse=True)

print("Top 20 S&C nodes by betweenness centrality:")
for b in sorted_betweenness[:20]:
    print(b)

Top 20 S&C nodes by betweenness centrality:
('Dubliners by Joyce, James (1914)', 0.013462215413708243)
('Pointed Roofs (Pilgrimage 1) by Richardson, Dorothy M. (1915)', 0.011995331377673796)
('The Sun Also Rises by Hemingway, Ernest (1926)', 0.010337681886786106)
('Sister Carrie by Dreiser, Theodore (1900)', 0.010023177029189055)
('Exiles by Joyce, James (1918)', 0.009321020696868196)
('Moby-Dick; Or, the Whale by Melville, Herman (1851)', 0.00899155635893526)
('The Garden Party and Other Stories by Mansfield, Katherine (1922)', 0.00888546081325438)
('Mr. Norris Changes Trains by Isherwood, Christopher (1935)', 0.008866947742089664)
('Manhattan Transfer by Dos Passos, John (1925)', 0.008417758110221446)
('New Writing', 0.006862249108221365)
('Of Human Bondage by Maugham, W. Somerset (1915)', 0.00681522643635387)
('A Farewell to Arms by Hemingway, Ernest (1929)', 0.0067233439136412465)
('My Life by Duncan, Isadora (1927)', 0.006714251992738399)
('Bliss and Other Stories by Mansfield, Ka

In [50]:
sc_between_df = pd.DataFrame.from_dict(betweenness_dict, orient='index')
sc_between_df["book_info"] = sc_between_df.index

sc_between_df = sc_between_df.rename(columns={0: "between_centrality"})

In [51]:
# eigenvector centrality
eigenvector_dict = nx.eigenvector_centrality(sc_G) # Run eigenvector centrality
nx.set_node_attributes(sc_G, eigenvector_dict, 'eigenvector')

sorted_eigenvector = sorted(eigenvector_dict.items(), key=itemgetter(1), reverse=True)

print("Top 20 S&C nodes by eigenvector centrality:")
for b in sorted_eigenvector[:20]:
    print(b)

Top 20 S&C nodes by eigenvector centrality:
('The Sun Also Rises by Hemingway, Ernest (1926)', 0.05770994078892642)
('Eyeless in Gaza by Huxley, Aldous (1936)', 0.05763437398159241)
('Mrs. Dalloway by Woolf, Virginia (1925)', 0.057219529089883425)
('A Farewell to Arms by Hemingway, Ernest (1929)', 0.057170855332737386)
('To the Lighthouse by Woolf, Virginia (1927)', 0.05695609592878169)
('Sanctuary by Faulkner, William (1931)', 0.05693251618778315)
('The Citadel by Cronin, A. J. (1937)', 0.05692860864290906)
("Axel's Castle: A Study in the Imaginative Literature of 1870 – 1930 by Wilson, Edmund (1931)", 0.0568289336968622)
('Sparkenbroke by Morgan, Charles (1936)', 0.056777530857592934)
('The Waves by Woolf, Virginia (1931)', 0.056411729839380136)
('The Death of the Heart by Bowen, Elizabeth (1938)', 0.05638521798915159)
('The Rains Came by Bromfield, Louis (1937)', 0.056375680667810533)
('The Years by Woolf, Virginia (1937)', 0.056168356107361074)
('South Riding: An English Landscape 

In [52]:
sc_eigenvector_df = pd.DataFrame.from_dict(eigenvector_dict, orient='index')
sc_eigenvector_df["book_info"] = sc_eigenvector_df.index

In [53]:
sc_eigenvector_df = sc_eigenvector_df.rename(columns={0: "eigenvector_centrality"})

### Combine all

In [54]:
full_sc_df = full_sc_df.merge(sc_degree_df, how='inner', on='book_info').merge(sc_between_df, how='inner', on='book_info').merge(sc_eigenvector_df, how='inner', on='book_info')

In [55]:
full_sc_df.head(10)

Unnamed: 0,vertex,book_info,layer,hub_and_spoke,coreness,degree_centrality,between_centrality,eigenvector_centrality
0,1340,"1914 and Other Poems by Brooke, Rupert (1915)",4,1,0.03,6,0.0,0.000186
1,1027,"1919 by Dos Passos, John (1932)",1,0,0.815,573,0.001874,0.04623
2,192,365 Days (1936),3,1,0.25,175,5.6e-05,0.0151
3,614,"A Backward Glance by Wharton, Edith (1934)",3,1,0.275,231,0.000408,0.016208
4,591,"A Book by Barnes, Djuna (1923)",4,1,0.015,45,2.4e-05,0.002986
5,868,"A Book of Nonsense by Lear, Edward (1846)",4,1,0.025,15,0.0,0.000781
6,796,"A Child's Garden of Verses by Stevenson, Rober...",3,1,0.205,110,0.0,0.008087
7,877,"A Christmas Garland by Beerbohm, Max (1912)",4,1,0.05,80,3.7e-05,0.005835
8,374,"A City of Bells by Goudge, Elizabeth (1936)",2,0,0.47,322,0.000145,0.029493
9,704,A Connecticut Yankee in King Arthur's Court by...,4,1,0.02,9,0.0,0.0007


In [56]:
full_sc_df.to_csv("shakespeare-co-core-periphery.csv")

## Model Selection
This is the section that needs continued work: how meaningful are either of these models? 


In [58]:
from core_periphery_sbm import model_fit as mf

# Get description length of hub-and-spoke model
inf_labels_hs = hubspoke.get_labels(last_n_samples=50, prob=False, return_dict=False)
mdl_hubspoke = mf.mdl_hubspoke(sc_G, inf_labels_hs, n_samples=100000)

# Get the description length of layered model
inf_labels_l = layered.get_labels(last_n_samples=50, prob=False, return_dict=False)
mdl_layered = mf.mdl_layered(sc_G, inf_labels_l, n_layers=5, n_samples=100000)

In [59]:
print("Description length of hub-and-spoke model: " + str(mdl_hubspoke))
print("Description length of layered model: " + str(mdl_layered))

Description length of hub-and-spoke model: 301418.9798241098
Description length of layered model: 258132.5122463365


So, the layered model is a better fit since it has a shorter description length. **BUT** this still lacks an assessment of the meaningfulness of the goodneses of fit. 

## Goodreads Core-Periphery Analysis
---> remember! this might not be the best way to compare the two networks since they represent such different types of readership w/ missing books in the Goodreads data

In [139]:
# Create Goodreads graph
gr_G = nx.Graph()
gr_G.add_weighted_edges_from(gr_weights_list)

In [140]:
# optional: change vertexes ids to book names
gr_dict = {value:key for key, value in gr_book_to_vertex_index.items()}
gr_mapping = gr_dict # Dictionary from id to title
gr_G = nx.relabel_nodes(gr_G, gr_mapping)

### Core-Periphery Structure

In [143]:
# Initialize hub-and-spoke model and infer structure
gr_hubspoke = cp.HubSpokeCorePeriphery(n_gibbs=100, n_mcmc=10*len(gr_G))
gr_hubspoke.infer(gr_G)

In [144]:
gr_layered = cp.LayeredCorePeriphery(n_layers=3, n_gibbs=100, n_mcmc=10*len(gr_G))
gr_layered.infer(gr_G)

In [145]:
# Get core and periphery assignments from hub-and-spoke model
gr_node2label_hs = gr_hubspoke.get_labels(last_n_samples=50)

# Get layer assignments from the layered model
gr_node2label_l = gr_layered.get_labels(last_n_samples=50)

In [146]:
gr_hub_spoke_df = pd.DataFrame.from_dict(gr_node2label_hs, orient='index')

In [147]:
gr_layered_df = pd.DataFrame.from_dict(gr_node2label_l, orient='index')

**Core-Periphery Label:** For both models (hub-and-spoke and layered) the core is 0; for the layered model, the further away from 0 the more peripheral the node. 

In [148]:
# Number of nodes in periphery vs. core in hub-and-spoke
Counter(gr_node2label_hs.values())

Counter({1: 1020, 0: 304})

In [150]:
# Number of nodes in each layer between periphery to core
Counter(gr_node2label_l.values())

Counter({1: 311, 2: 748, 0: 265})

In [151]:
for book, label in gr_node2label_hs.items():  # for name, age in dictionary.iteritems():  
    if label == 0:
        print(book)

A Book of Nonsense by Edward Lear (1846)
A Child's Garden of Verses by Robert Louis Stevenson (1885)
A Connecticut Yankee in King Arthur's Court by Mark Twain (1889)
A Doll's House by Henrik Ibsen (1879)
A Farewell to Arms by Ernest Hemingway (1929)
A Handful of Dust by Evelyn Waugh (1934)
A High Wind in Jamaica by Richard Hughes (1929)
A Journal of the Plague Year by Daniel Defoe (1722)
A Lost Lady by Willa Cather (1923)
A Room of One's Own by Virginia Woolf (1929)
A Shropshire Lad by A.E. Housman (1896)
A Tale of a Tub by Jonathan Swift (1704)
A Voyage to Arcturus by David Lindsay (1920)
ABC of Reading by Ezra Pound (1934)
Absalom, Absalom! by William Faulkner (1936)
Adam Bede by George Eliot (1859)
Agnes Grey by Anne Brontë (1847)
Alice Adams by Booth Tarkington (1921)
Alice's Adventures in Wonderland / Through the Looking-Glass by Lewis Carroll (1871)
All Passion Spent by Vita Sackville-West (1931)
All Quiet on the Western Front by Erich Maria Remarque (1929)
An Ideal Husband by Os

**Probability** that a node is in the core vs. periphery.

In [152]:
#  Dictionary of node -> ordered array of probabilities
gr_node2probs_hs = gr_hubspoke.get_labels(last_n_samples=50, prob=True, return_dict=True)

# n_nodes x n_layers array of probabilities
gr_inf_probs_l = gr_layered.get_labels(last_n_samples=50, prob=True, return_dict=False)

In [153]:
gr_hs_probs_df = pd.DataFrame.from_dict(gr_node2probs_hs, orient='index')

**Coreness**, where the closer to 1 the more core; closer to 0 the more peripheral.

In [154]:
# Dictionary of node -> coreness
gr_node2coreness_hs = gr_hubspoke.get_coreness(last_n_samples=50, return_dict=True)
gr_node2coreness_l = gr_layered.get_coreness(last_n_samples=50, return_dict=True)

In [155]:
gr_layered_coreness_df = pd.DataFrame.from_dict(gr_node2coreness_l, orient='index')

In [156]:
# all books that are especially "corey" in the layered model
gr_most_core = []
for book, coreness in gr_node2coreness_hs.items():  # for name, age in dictionary.iteritems():  
    if coreness == 1:
        gr_most_core.append(book)

## Save as CSV

In [158]:
gr_layered_df['book_info'] = gr_layered_df.index
gr_layered_df = gr_layered_df.rename(columns={0: "layer"})

In [159]:
gr_hub_spoke_df['book_info'] = gr_hub_spoke_df.index
gr_hub_spoke_df = gr_hub_spoke_df.rename(columns={0: "hub_and_spoke"})

In [160]:
gr_layered_coreness_df['book_info'] = gr_layered_coreness_df.index
gr_layered_coreness_df = gr_layered_coreness_df.rename(columns={0: "coreness"})

In [161]:
gr_layered_df

Unnamed: 0,layer,book_info
100 Best-Loved Poems by Philip Smith (None),1,100 Best-Loved Poems by Philip Smith (None)
"1914, and Other Poems by Rupert Brooke (1915)",2,"1914, and Other Poems by Rupert Brooke (1915)"
50 Great American Short Stories by Milton Crane (1963),2,50 Great American Short Stories by Milton Cran...
A Backward Glance by Edith Wharton (1934),1,A Backward Glance by Edith Wharton (1934)
A Book of Nonsense by Edward Lear (1846),1,A Book of Nonsense by Edward Lear (1846)
...,...,...
Young Men in Love by Michael Arlen (None),2,Young Men in Love by Michael Arlen (None)
Zola and His Time: the History of His Martial Career in Letters by Matthew Josephson (1969),2,Zola and His Time: the History of His Martial ...
Zuleika Dobson by Max Beerbohm (1911),0,Zuleika Dobson by Max Beerbohm (1911)
if my soul be lost: a self portrait by Nandi S. Crosby (2007),2,if my soul be lost: a self portrait by Nandi S...


In [162]:
full_gr_df = gr_layered_df.merge(gr_hub_spoke_df, how='inner', on='book_info').merge(gr_layered_coreness_df, how='inner', on='book_info')

In [163]:
full_gr_df

Unnamed: 0,layer,book_info,hub_and_spoke,coreness
0,1,100 Best-Loved Poems by Philip Smith (None),1,0.48
1,2,"1914, and Other Poems by Rupert Brooke (1915)",1,0.02
2,2,50 Great American Short Stories by Milton Cran...,1,0.01
3,1,A Backward Glance by Edith Wharton (1934),1,0.35
4,1,A Book of Nonsense by Edward Lear (1846),0,0.50
...,...,...,...,...
1319,2,Young Men in Love by Michael Arlen (None),1,0.00
1320,2,Zola and His Time: the History of His Martial ...,1,0.02
1321,0,Zuleika Dobson by Max Beerbohm (1911),0,0.98
1322,2,if my soul be lost: a self portrait by Nandi S...,1,0.00


In [80]:
#full_gr_df = full_gr_df[["book_info", "layer", "hub_and_spoke", "coreness"]]

In [95]:
# vertex
#gr_vertex_title = pd.DataFrame.from_dict(gr_dict, orient = "index")

In [96]:
#gr_vertex_title

Unnamed: 0,0
0,The Making of Exile: Sindhi Hindus and the Par...
1,Marriage and Morals by Bertrand Russell (1929)
2,The Old Dark House by J.B. Priestley (1927)
3,Death at Swaythling Court by J.J. Connington (...
4,The Middle of the Journey by Lionel Trilling (...
...,...
1319,Abe Lincoln in Illinois by Robert E. Sherwood ...
1320,The Virgin and the Gipsy by D.H. Lawrence (1930)
1321,Sex and Character: An Investigation of Fundame...
1322,The Dead Don't Care by Jonathan Latimer (1938)


## Other metrics

### Density

In [165]:
gr_density = nx.density(gr_G)
print("Goodreads Network Density:", gr_density)

Goodreads Network Density: 0.06342812385108458


### Transitivity


In [166]:
gr_triadic_closure = nx.transitivity(gr_G)
print("Triadic closure for Goodreads:", gr_triadic_closure)

Triadic closure for Goodreads: 0.4499363846307675


### Diameter length

In [167]:
# Get the largest connected component of the graph
components = nx.connected_components(gr_G)
largest_component = max(components, key=len)

# Create a "subgraph" of the largest component and find diameter
subgraph = gr_G.subgraph(largest_component)
diameter = nx.diameter(subgraph)
print("Network diameter of Goodread's largest component:", diameter)

Network diameter of Goodread's largest component: 5


### Centrality Measures

In [168]:
# degree centrality
gr_degree_dict = dict(gr_G.degree(gr_G.nodes()))
nx.set_node_attributes(gr_G, gr_degree_dict, 'degree')

gr_sorted_degree = sorted(gr_degree_dict.items(), key=itemgetter(1), reverse=True)
print("Top 100 nodes by degree in Goodreads:")
for d in gr_sorted_degree[:20]:
    print(d)

Top 100 nodes by degree in Goodreads:
('The Great Gatsby by F. Scott Fitzgerald (1925)', 690)
('Brave New World by Aldous Huxley (1932)', 602)
('Emma by Jane Austen (1815)', 566)
('Dracula by Bram Stoker (1897)', 564)
('Of Mice and Men by John Steinbeck (1937)', 563)
('Persuasion by Jane Austen (1817)', 557)
('Moby-Dick or, the Whale by Herman Melville (1851)', 545)
('The Adventures of Huckleberry Finn by Mark Twain (1884)', 537)
('Romeo and Juliet by William Shakespeare (1597)', 511)
('Lolita by Vladimir Nabokov (1955)', 503)
('Gone with the Wind by Margaret Mitchell (1936)', 496)
('The Scarlet Letter by Nathaniel Hawthorne (1850)', 496)
('Mrs. Dalloway by Virginia Woolf (1925)', 494)
('To the Lighthouse by Virginia Woolf (1927)', 494)
('The Sound and the Fury by William Faulkner (1929)', 490)
('Ulysses by James Joyce (1922)', 488)
('The Age of Innocence by Edith Wharton (1920)', 487)
('A Farewell to Arms by Ernest Hemingway (1929)', 484)
('The Moonstone by Wilkie Collins (1868)', 484

In [169]:
gr_degree_df = pd.DataFrame.from_dict(gr_degree_dict, orient='index')
gr_degree_df["book_info"] = gr_degree_df.index

In [170]:
gr_degree_df = gr_degree_df.rename(columns={0: "degree_centrality"})

In [171]:
# betweenness centrality
gr_betweenness_dict = nx.betweenness_centrality(gr_G) # Run betweenness centrality
# Assign each to an attribute in your network
nx.set_node_attributes(gr_G, betweenness_dict, 'betweenness')

gr_sorted_betweenness = sorted(gr_betweenness_dict.items(), key=itemgetter(1), reverse=True)

print("Top 20 GR nodes by betweenness centrality:")
for b in gr_sorted_betweenness[:20]:
    print(b)

Top 20 GR nodes by betweenness centrality:
('The Great Gatsby by F. Scott Fitzgerald (1925)', 0.04415840920743839)
('Brave New World by Aldous Huxley (1932)', 0.036398910383542794)
('Emma by Jane Austen (1815)', 0.02352302432250388)
('Dracula by Bram Stoker (1897)', 0.021592423133533645)
('Moby-Dick or, the Whale by Herman Melville (1851)', 0.019957723131357102)
('Of Mice and Men by John Steinbeck (1937)', 0.01971877446897649)
('The Adventures of Huckleberry Finn by Mark Twain (1884)', 0.016185911823300158)
('Persuasion by Jane Austen (1817)', 0.014919268868462395)
('The Sound and the Fury by William Faulkner (1929)', 0.014405481405478239)
('Gone with the Wind by Margaret Mitchell (1936)', 0.014303564769340296)
('Lolita by Vladimir Nabokov (1955)', 0.014144160925167548)
('Mrs. Dalloway by Virginia Woolf (1925)', 0.014071027070866422)
('Crime and Punishment by Fyodor Dostoyevsky (1866)', 0.013826554186291605)
('Romeo and Juliet by William Shakespeare (1597)', 0.013370015243643462)
('Nor

In [172]:
gr_between_df = pd.DataFrame.from_dict(gr_betweenness_dict, orient='index')
gr_between_df["book_info"] = gr_between_df.index

gr_between_df = gr_between_df.rename(columns={0: "between_centrality"})


In [173]:
# eigenvector centrality
gr_eigenvector_dict = nx.eigenvector_centrality(gr_G) # Run eigenvector centrality
nx.set_node_attributes(gr_G, gr_eigenvector_dict, 'eigenvector')

gr_sorted_eigenvector = sorted(gr_eigenvector_dict.items(), key=itemgetter(1), reverse=True)

print("Top 20 GR nodes by eigenvector centrality:")
for b in gr_sorted_eigenvector[:20]:
    print(b)

Top 20 GR nodes by eigenvector centrality:
('The Great Gatsby by F. Scott Fitzgerald (1925)', 0.08869192531616545)
('Brave New World by Aldous Huxley (1932)', 0.0836736262825447)
('Of Mice and Men by John Steinbeck (1937)', 0.08353235733841298)
('Dracula by Bram Stoker (1897)', 0.08303989904376641)
('Persuasion by Jane Austen (1817)', 0.0829564817974177)
('Emma by Jane Austen (1815)', 0.08281162792923336)
('Moby-Dick or, the Whale by Herman Melville (1851)', 0.0820743429694696)
('The Adventures of Huckleberry Finn by Mark Twain (1884)', 0.08147140916844786)
('Romeo and Juliet by William Shakespeare (1597)', 0.0800118787647832)
('The Scarlet Letter by Nathaniel Hawthorne (1850)', 0.0794978119620089)
('Lolita by Vladimir Nabokov (1955)', 0.07892748492072911)
('The Moonstone by Wilkie Collins (1868)', 0.0789124350544532)
('The Sound and the Fury by William Faulkner (1929)', 0.07880563876231736)
('Mrs. Dalloway by Virginia Woolf (1925)', 0.07880230746971423)
('Ulysses by James Joyce (1922)

In [174]:
gr_eigenvector_df = pd.DataFrame.from_dict(gr_eigenvector_dict, orient='index')
gr_eigenvector_df["book_info"] = gr_eigenvector_df.index

In [175]:
gr_eigenvector_df = gr_eigenvector_df.rename(columns={0: "eigenvector_centrality"})

### Combine all

In [176]:
full_gr_df = full_gr_df.merge(gr_degree_df, how='inner', on='book_info').merge(gr_between_df, how='inner', on='book_info').merge(gr_eigenvector_df, how='inner', on='book_info')

In [177]:
full_gr_df.head(10)

Unnamed: 0,layer,book_info,hub_and_spoke,coreness,degree_centrality,between_centrality,eigenvector_centrality
0,1,100 Best-Loved Poems by Philip Smith (None),1,0.48,68,7.4e-05,0.01519
1,2,"1914, and Other Poems by Rupert Brooke (1915)",1,0.02,36,3e-06,0.009037
2,2,50 Great American Short Stories by Milton Cran...,1,0.01,8,0.0,0.001644
3,1,A Backward Glance by Edith Wharton (1934),1,0.35,44,3.8e-05,0.009455
4,1,A Book of Nonsense by Edward Lear (1846),0,0.5,123,7.4e-05,0.030571
5,1,A Child's Garden of Verses by Robert Louis Ste...,0,0.54,128,0.000166,0.031296
6,2,A Christmas Garland by Max Beerbohm (1912),1,0.02,9,0.0,0.002717
7,1,A City of Bells by Elizabeth Goudge (1936),1,0.49,77,0.000102,0.017856
8,0,A Connecticut Yankee in King Arthur's Court by...,0,0.95,357,0.004005,0.065687
9,0,A Doll's House by Henrik Ibsen (1879),0,1.0,345,0.003463,0.065198


In [178]:
full_gr_df.to_csv("goodreads-core-periphery.csv")

## Combine major networks

In [184]:
import json

In [193]:
# Change S&C title to GR name
with open("data/goodreads-text-to-sc-text.json") as json_file:
    text_to_text = json.load(json_file)

#text_to_text = pd.DataFrame.from_dict(data, orient = 'index').reset_index()
#full_sc_name_df = full_sc_name_df.merge(text_to_text, on = 'book_info')

In [194]:
#complete_cp_df = full_gr_df.merge(full_sc_df, how='inner', on='book_info')

text_to_text = pd.DataFrame.from_dict(text_to_text, orient = 'index').reset_index()
text_to_text = text_to_text.rename(columns = {0:"sc_info", "index":"book_info"})

In [195]:
text_to_text

Unnamed: 0,book_info,sc_info
0,Utopia by Thomas More (1516),"Utopia by More, Thomas (1516)"
1,Gorboduc by Thomas Sackville (None),"Gorboduc by Norton, Thomas (1561)"
2,Love's Labour's Lost by William Shakespeare (1...,"Love's Labour's Lost by Shakespeare, William (..."
3,The Oxford Francis Bacon IV: The Advancement o...,"The Advancement of Learning by Bacon, Francis ..."
4,The White Devil by John Webster (1612),"The White Devil by Webster, John (1612)"
...,...,...
1442,A Valentine from Harlequin: Six Degrees of Rom...,Vale
1443,The Diana I Knew: Loving Memories of the Frien...,Wales
1444,"The Lamb of God (a 10-Week Bible Study), 2: Se...",Week
1445,Captive Witness by Carolyn Keene (1980),Witness


In [196]:
full_sc_df.head()

Unnamed: 0,vertex,book_info,layer,hub_and_spoke,coreness,degree_centrality,between_centrality,eigenvector_centrality
0,1340,"1914 and Other Poems by Brooke, Rupert (1915)",4,1,0.03,6,0.0,0.000186
1,1027,"1919 by Dos Passos, John (1932)",1,0,0.815,573,0.001874,0.04623
2,192,365 Days (1936),3,1,0.25,175,5.6e-05,0.0151
3,614,"A Backward Glance by Wharton, Edith (1934)",3,1,0.275,231,0.000408,0.016208
4,591,"A Book by Barnes, Djuna (1923)",4,1,0.015,45,2.4e-05,0.002986


In [199]:
# merge and drop/rename
sc_data = full_sc_df.merge(text_to_text, left_on = "book_info", right_on = "sc_info")
sc_data = sc_data.drop(['book_info_x'], axis=1)
sc_data = sc_data.rename(columns={"book_info_y":"gr_info"})

In [237]:
sc_data.head()

Unnamed: 0,vertex,layer,hub_and_spoke,coreness,degree_centrality,between_centrality,eigenvector_centrality,gr_info,sc_info
0,1340,4,1,0.03,6,0.0,0.000186,"1914, and Other Poems by Rupert Brooke (1915)","1914 and Other Poems by Brooke, Rupert (1915)"
1,1027,1,0,0.815,573,0.001874,0.04623,U.S.A.: The 42nd Parallel / 1919 / The Big Mon...,"1919 by Dos Passos, John (1932)"
2,192,3,1,0.25,175,5.6e-05,0.0151,That Is SO Me: 365 Days of Devotions: Flip-Flo...,365 Days (1936)
3,614,3,1,0.275,231,0.000408,0.016208,A Backward Glance by Edith Wharton (1934),"A Backward Glance by Wharton, Edith (1934)"
4,591,4,1,0.015,45,2.4e-05,0.002986,The Book of Repulsive Women: 8 Rhythms and 5 D...,"A Book by Barnes, Djuna (1923)"


In [218]:
# Merge sc_data with full_gr_df
combined_df = sc_data.merge(full_gr_df, left_on = "gr_info", right_on = "book_info")

In [220]:
combined_df = combined_df.rename(columns={"layer_x": "sc_layer",
                                               "hub_and_spoke_x":"sc_hub_and_spoke",
                                               "coreness_x":"sc_coreness",
                                               "degree_centrality_x": "sc_degree_centrality",
                                               "between_centrality_x": "sc_between_centrality",
                                               "eigenvector_centrality_x": "sc_eigenvector_centrality",
                                             "layer_y": "gr_layer",
                                                "hub_and_spoke_y": "gr_hub_and_spoke",
                                                "coreness_y":"gr_coreness",
                                                "degree_centrality_y": "gr_degree_centrality",
                                                "between_centrality_y":"gr_between_centrality",
                                                "eigenvector_centrality_y": "gr_eigenvector_centrality"
                                               }
                                      )

In [221]:
combined_df = combined_df[["vertex", "gr_info", "sc_info", "sc_layer", "sc_hub_and_spoke", "sc_coreness", "sc_degree_centrality", "sc_between_centrality", "sc_eigenvector_centrality", "gr_layer", "gr_hub_and_spoke", "gr_coreness", "gr_degree_centrality", "gr_between_centrality", "gr_eigenvector_centrality"]]

In [239]:
combined_df.head()

Unnamed: 0,vertex,gr_info,sc_info,sc_layer,sc_hub_and_spoke,sc_coreness,sc_degree_centrality,sc_between_centrality,sc_eigenvector_centrality,gr_layer,gr_hub_and_spoke,gr_coreness,gr_degree_centrality,gr_between_centrality,gr_eigenvector_centrality
0,1340,"1914, and Other Poems by Rupert Brooke (1915)","1914 and Other Poems by Brooke, Rupert (1915)",4,1,0.03,6,0.0,0.000186,2,1,0.02,36,3e-06,0.009037
1,1027,U.S.A.: The 42nd Parallel / 1919 / The Big Mon...,"1919 by Dos Passos, John (1932)",1,0,0.815,573,0.001874,0.04623,1,1,0.52,107,0.000109,0.024729
2,192,That Is SO Me: 365 Days of Devotions: Flip-Flo...,365 Days (1936),3,1,0.25,175,5.6e-05,0.0151,2,1,0.01,9,0.0,0.002413
3,614,A Backward Glance by Edith Wharton (1934),"A Backward Glance by Wharton, Edith (1934)",3,1,0.275,231,0.000408,0.016208,1,1,0.35,44,3.8e-05,0.009455
4,591,The Book of Repulsive Women: 8 Rhythms and 5 D...,"A Book by Barnes, Djuna (1923)",4,1,0.015,45,2.4e-05,0.002986,2,1,0.01,1,0.0,0.000272


In [223]:
combined_df.to_csv("combined-gr-sc-core-periphery.csv")

## Merge with Goodreads Ratings

In [228]:
# import Goodreads id to S&C name
with open("data/goodreads-book-id-to-text.json") as json_file:
    data = json.load(json_file)

goodreads_ids = pd.DataFrame.from_dict(data, orient = 'index').reset_index()
goodreads_ids = goodreads_ids.rename(columns = {0:"book_name", "index":"goodreads_id"})

In [232]:
# Merging goodreads_ids with combined dataframe
goodreads_ids_core_periphery = combined_df.merge(goodreads_ids, left_on="gr_info", right_on = "book_name")

In [240]:
goodreads_ids_core_periphery.head()

Unnamed: 0,vertex,gr_info,sc_info,sc_layer,sc_hub_and_spoke,sc_coreness,sc_degree_centrality,sc_between_centrality,sc_eigenvector_centrality,gr_layer,gr_hub_and_spoke,gr_coreness,gr_degree_centrality,gr_between_centrality,gr_eigenvector_centrality,goodreads_id,book_name
0,1340,"1914, and Other Poems by Rupert Brooke (1915)","1914 and Other Poems by Brooke, Rupert (1915)",4,1,0.03,6,0.0,0.000186,2,1,0.02,36,3e-06,0.009037,9857591,"1914, and Other Poems by Rupert Brooke (1915)"
1,1027,U.S.A.: The 42nd Parallel / 1919 / The Big Mon...,"1919 by Dos Passos, John (1932)",1,0,0.815,573,0.001874,0.04623,1,1,0.52,107,0.000109,0.024729,261441,U.S.A.: The 42nd Parallel / 1919 / The Big Mon...
2,192,That Is SO Me: 365 Days of Devotions: Flip-Flo...,365 Days (1936),3,1,0.25,175,5.6e-05,0.0151,2,1,0.01,9,0.0,0.002413,8589547,That Is SO Me: 365 Days of Devotions: Flip-Flo...
3,614,A Backward Glance by Edith Wharton (1934),"A Backward Glance by Wharton, Edith (1934)",3,1,0.275,231,0.000408,0.016208,1,1,0.35,44,3.8e-05,0.009455,5261,A Backward Glance by Edith Wharton (1934)
4,591,The Book of Repulsive Women: 8 Rhythms and 5 D...,"A Book by Barnes, Djuna (1923)",4,1,0.015,45,2.4e-05,0.002986,2,1,0.01,1,0.0,0.000272,803396,The Book of Repulsive Women: 8 Rhythms and 5 D...


In [231]:
# import Goodreads review data and convert Result ID column to string
goodreads_ratings_reviews = pd.read_csv("goodreads_query_results_filtered.csv")
goodreads_ratings_reviews = goodreads_ratings_reviews[goodreads_ratings_reviews['Result ID'].notna()]
goodreads_ratings_reviews["Result ID"] = goodreads_ratings_reviews["Result ID"].astype(int)
goodreads_ratings_reviews["Result ID"] = goodreads_ratings_reviews["Result ID"].astype(str)

In [234]:
goodreads_ratings_core_periphery = goodreads_ids_core_periphery.merge(goodreads_ratings_reviews, left_on = "goodreads_id", right_on = "Result ID")

In [241]:
goodreads_ratings_core_periphery.head()

Unnamed: 0,vertex,gr_info,sc_info,sc_layer,sc_hub_and_spoke,sc_coreness,sc_degree_centrality,sc_between_centrality,sc_eigenvector_centrality,gr_layer,...,Query Title,Query Author,Query ID,Result ID,Result Title,Result Author,Ratings Count,Text Reviews Count,Original Publication Year,Average Rating
0,1340,"1914, and Other Poems by Rupert Brooke (1915)","1914 and Other Poems by Brooke, Rupert (1915)",4,1,0.03,6,0.0,0.000186,2,...,1914 and Other Poems,Rupert Brooke,https://shakespeareandco.princeton.edu/books/b...,9857591,"1914, and Other Poems",Rupert Brooke,196.0,15.0,1915,3.88
1,1027,U.S.A.: The 42nd Parallel / 1919 / The Big Mon...,"1919 by Dos Passos, John (1932)",1,0,0.815,573,0.001874,0.04623,1,...,1919,John Dos Passos,https://shakespeareandco.princeton.edu/books/d...,261441,U.S.A.: The 42nd Parallel / 1919 / The Big Money,John Dos Passos,4700.0,187.0,1930,4.11
2,192,That Is SO Me: 365 Days of Devotions: Flip-Flo...,365 Days (1936),3,1,0.25,175,5.6e-05,0.0151,2,...,365 Days,,https://shakespeareandco.princeton.edu/books/b...,8589547,That Is SO Me: 365 Days of Devotions: Flip-Flo...,Nancy N. Rue,80.0,23.0,2010,3.76
3,614,A Backward Glance by Edith Wharton (1934),"A Backward Glance by Wharton, Edith (1934)",3,1,0.275,231,0.000408,0.016208,1,...,A Backward Glance,Edith Wharton,https://shakespeareandco.princeton.edu/books/w...,5261,A Backward Glance,Edith Wharton,634.0,89.0,1934,3.75
4,591,The Book of Repulsive Women: 8 Rhythms and 5 D...,"A Book by Barnes, Djuna (1923)",4,1,0.015,45,2.4e-05,0.002986,2,...,A Book,Djuna Barnes,https://shakespeareandco.princeton.edu/books/b...,803396,The Book of Repulsive Women: 8 Rhythms and 5 D...,Djuna Barnes,314.0,27.0,1989,3.61


In [236]:
goodreads_ratings_core_periphery.to_csv("core-periphery_goodreads_ratings.csv")

### Future work
- might look into allowing multiple cores ([Xiao Zhang, Travis Martin, and M. E. J. Newman. 2015. “Identification of core-periphery structure in networks” *Physical Review* E 91](https://journals.aps.org/pre/abstract/10.1103/PhysRevE.91.032803))
- Stats about the core:
    - density of the core
    - relative size of core
- four quadrants of core -> core, core -> periphery, periphery -> periphery, periphery -> core

## Other basic network measures
Many of these measures are drawn from Ladd et al.'s ["Exploring and Analyzing Network Data with Python"](https://programminghistorian.org/en/lessons/exploring-and-analyzing-network-data-with-python). Each measure is only used with the Shakespeare and Co dataset, but could just as easily be applied to Goodreads.  