# **World of Warcraft Avatar History:**
# **players' transitions as a graph**

* The dataset used comes from the paper "World of Warcraft Avatar History Dataset" by Yeng-Ting Lee, Kuan-Ta Chen, Yun-Maw Cheng, and Chin-Laung Lei (https://www.iis.sinica.edu.tw/~swc/pub/world_of_warcraft_avatar_history.html) where it explains how it was created and the files it contains. 
* My idea was to use this dataset in order to create a graph and analyse it by applying the theory I studied in the [Network Science book by Albert-László Barabási](https://barabasi.com/book/network-science).

In [None]:
!pip install powerlaw
import numpy as np
import pandas as pd
import seaborn as sns
sns.set()
import matplotlib.pyplot as plt
%config InlineBackend.figure_format = 'svg' 
import warnings
warnings.filterwarnings("ignore")
import os
import gc
import plotly.graph_objs as go
from mpl_toolkits.mplot3d import axes3d 
from matplotlib import style 
import networkx as nx
from collections import Counter
from collections import defaultdict
import math
from operator import itemgetter
import networkx.algorithms.centrality as nc
import powerlaw as pwl
from scipy.stats import poisson
from itertools import chain 
from scipy import optimize
import statistics

### **1. Preparing the Data Input Files Report**

In [None]:
wowah = pd.read_csv('../input/warcraft-avatar-history/wowah_data.csv', parse_dates=True, keep_date_col=True)
zones = pd.read_csv('../input/warcraft-avatar-history/zones.csv', encoding='iso-8859-1')
location_coords = pd.read_csv('../input/warcraft-avatar-history/location_coords.csv', encoding='iso-8859-1')
locations = pd.read_csv('../input/warcraft-avatar-history/locations.csv', encoding='iso-8859-1')

In [None]:
wowah.head()

In [None]:
wowah.rename({'char': 'char', 
              ' level': 'level',
              ' race': 'race',
              ' charclass': 'class',
              ' zone': 'zone',
              ' guild': 'guild',
              ' timestamp': 'timestamp'}, axis=1, inplace=True)

zones['Zone_Name'].replace({'Dalaran<U+7AF6><U+6280><U+5834>': 'Dalaran Arena'}, inplace=True)
wowah['zone'].replace({'Dalaran競技場': 'Dalaran Arena'}, inplace=True)

print('Records dataframe size:', wowah.shape)
print('Data on {:.0f} players'.format(len(wowah['char'].unique())))
wowah.head()

### **2. Preparing the location coordinates in WoW map**

* Can the given files be used in order to recreate the WoW map? My idea was to use the players' locations, with the coordinates that come from the dataset, in order to roughly get an idea about the WoW geography. Unfortunately, this could only work if there was provided a shapefile.

In [None]:
location_coords.head()

In [None]:
new = location_coords["Location_Name"].str.split(":" , n = 1, expand = True) 
location_coords["Kingdom"]= new[0] 
location_coords = location_coords.drop(['Location_Name'],axis=1)
new = location_coords["Kingdom"].str.split("/" , n = 1, expand = True) 
location_coords["Kingdom"]= new[0] 
new = location_coords["Kingdom"].str.split("(" , n = 1, expand = True) 
location_coords["Kingdom"]= new[0] 
location_coords['Kingdom']=location_coords['Kingdom'].str.replace('\t', '') 
location_coords['Kingdom'] = location_coords['Kingdom'].str.strip()

### What would it look like in 2D?

In [None]:
x = location_coords['X_coord']
y = location_coords['Y_coord']

plt.scatter(x, y, alpha=0.3)
plt.show()

### What would it look like in 3D?

In [None]:
style.use('ggplot') 
fig = plt.figure() 
ax1 = fig.add_subplot(111, projection='3d') 
x = location_coords['X_coord']
y = location_coords['Y_coord'] 
z = location_coords['Z_coord'] 
ax1.scatter(x, y, z, c = 'm', marker = 'o', alpha=0.3)
ax1.set_xlabel('x-axis') 
ax1.set_ylabel('y-axis') 
ax1.set_zlabel('z-axis') 
plt.show()

In [None]:
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode()

trace1 = go.Mesh3d(x=x,y=y,z=z,opacity=0.6, colorscale='Viridis', 
                   intensity=z,text='kingdom:'+location_coords['Kingdom']+'\n ',
                   hoverinfo='x+y+z+text')

layout = go.Layout(scene = dict(xaxis = dict(nticks=4, range = [-15000,15000],),
                    yaxis = dict(nticks=4, range = [-15000,15000],),
                    zaxis = dict(nticks=4, range = [-250,1500],),),
                    width=700,
                    margin=dict(r=20, l=10,b=10, t=10),)

fig = go.Figure(data=[trace1], layout=layout)

iplot(fig)

* With the absence of a shapefile, this dataset cannot give clear info on the official wow map. Still I will use the zone coordinates in order to get a layout of the players locations and built a network of the transitions.

## **3. Graph data**

* I created a dataframe using the locations of the players, with the intension to use them as nodes (similar to the role of airports for the international flight network).

In [None]:
nodes = wowah[['zone']].groupby(['zone']).size().nlargest(n=120).reset_index(name='top')
nodes.index += 1
nodes['id'] = nodes.index
nodes.rename(columns={'top':'values'}, inplace=True)

In [None]:
nodes.head()

* Next, I created a csv file using the movement of the characters between zones (similar to flights between airports for the international flight network), in order to use them as edges. The amount of players in every zone transition is taken as the weight of that particular edge.

In [None]:
zoneCheck = wowah[['char','zone']].drop_duplicates().sort_values('char')
zoneCheck = zoneCheck.rename(columns={'zone':'from'})

nod = wowah[['char','zone']].drop_duplicates()

zoneCheck['diffc'] = zoneCheck['char'].ne(zoneCheck['char'].shift(1).bfill()).astype(int)
zoneCheck['to'] = zoneCheck['from'].shift(1)
movement = zoneCheck.query('diffc == 0').loc[:,'from':'to']

movement['bof']= movement['from'].astype(str) + '_' + movement['to'].astype(str)
edges = movement.groupby('bof').agg('count')

edges.reset_index(inplace=True)
edges.drop(columns =['diffc','to'], inplace = True)
edges = edges.rename(columns={'from':'weight'})

edges['new'] = edges.bof
new = edges['new'].str.split("_", n = 1, expand = True) 
edges['from'] = new[0]
edges['to'] = new[1]
edges.drop(columns =['new'], inplace = True) 
edges.bof = edges.bof.str.split("_")
edges.head()

## **4. Graph 3D visualization**

In [None]:
!pip install pyvis
from pyvis import network as net

In [None]:
myedges = edges[edges['weight'] > 1] 

got_net = net.Network(height="750px", width="100%", font_color="black",notebook=True)
#bgcolor="#222222", 
got_net.barnes_hut()

sources = myedges['from']
targets = myedges['to']
weights = myedges['weight']

edge_data = zip(sources, targets, weights)

In [None]:
for e in edge_data:
    src = e[0]
    dst = e[1]
    w = e[2]
    
    got_net.add_node(src, src, title=src, borderWidth=2,value=w)
    got_net.add_node(dst, dst, title=dst,borderWidth=2,value=w)
    got_net.add_edge(src, dst, value=w)
    got_net.set_edge_smooth('dynamic')
    
got_net.set_options("""var options = {
  "nodes": {
    "color": {
      "border": "rgba(8,28,53,1)"
    },
    "scaling": {
      "max": 200
    }
  },
  "edges": {
    "color": {
      "inherit": true,
      "opacity": 0.85
    },
    "font": {
      "strokeWidth": 0
    },
    "scaling": {
      "max": 203
    },
    "smooth": {
      "forceDirection": "none"
    }
  },
  "physics": {
    "barnesHut": {
      "gravitationalConstant": -5000,
      "springLength": 250,
      "springConstant": 0.001
    },
    "minVelocity": 0.75
  }
}
""")

In [None]:
#got_net.show_buttons()
got_net.show("./zone_traffic.html")

## **5. Directed graph**

* A directed *weighted* graph is created: 

    nodes = locations of wow players

    edges = players' transition between the locations weighted by the traffic of the transition

In [None]:
#G.clear()
G=nx.DiGraph()

G.add_nodes_from(nodes['zone'])

myedges = edges[edges['weight'] > 1][['from','to','weight']]
myedges.index = myedges.reset_index(drop=True)
cols = myedges.columns.tolist()
cols = cols[1:] + cols[:1]
myedges = myedges[cols]
myedges_list = myedges.index.tolist()

G.add_weighted_edges_from(myedges_list)
mypos = nx.random_layout(G)

print("The network has", len(G), "nodes and ", len(G.edges()), "edges")

### Plot degrees

In [None]:
indeg=dict(G.in_degree()).values()
degin_distri=Counter(indeg)
ind = list(indeg)

outdeg=dict(G.out_degree()).values()
degout_distri=Counter(outdeg)
outd = list(outdeg)

In [None]:
xin=[]
yin=[]
for i in sorted(degin_distri):   
    xin.append(i)
    yin.append(degin_distri[i]/len(G))
xin.sort(reverse=True)
yin.sort()

xout=[]
yout=[]
for i in sorted(degout_distri):   
    xout.append(i)
    yout.append(degout_distri[i]/len(G))
xout.sort(reverse=True)
yout.sort()

In [None]:
bin_count=10
def drop_zeros(a_list):
    return [i for i in a_list if i>0]

maxxin = np.log10(np.max(xin))
maxyin = np.log10(np.max(yin))
max_base = np.max([maxxin,maxyin])
minxin = np.log10(np.min(drop_zeros(xin)))
bins = np.logspace(minxin,max_base,num=bin_count)
bin_means_yin = (np.histogram(xin,bins,weights=yin)[0] / np.histogram(xin,bins)[0])
bin_means_xin = (np.histogram(xin,bins,weights=xin)[0] / np.histogram(xin,bins)[0])

maxxout = np.log10(np.max(xout))
maxyout = np.log10(np.max(yout))
max_base2 = np.max([maxxout,maxyout])
minxout = np.log10(np.min(drop_zeros(xout)))
bins2 = np.logspace(minxout,max_base2,num=bin_count)
bin_means_yout = (np.histogram(xout,bins2,weights=yout)[0] / np.histogram(xout,bins2)[0])
bin_means_xout = (np.histogram(xout,bins2,weights=xout)[0] / np.histogram(xout,bins2)[0])

* Plotting the degree distributions:

In [None]:
fig, axs = plt.subplots(2, 2,figsize=(10,10))
axs[0,0].plot(xin ,yin , marker='o',color='firebrick',ls='None',label='in-degree')
axs[0,0].plot(xout, yout,marker='o',color='forestgreen',ls='None',label='out-degree')
axs[0,0].set_title("Degree distributions linear, lin-binning",size=14)
axs[0,0].set_xlabel('in/out-degree $k$',size=14)
axs[0,0].set_ylabel('$P(k)$',size=14)
axs[0,0].legend()
axs[0,1].loglog(xin ,yin , marker='o',color='firebrick',ls='None',label='in-degree')
axs[0,1].loglog(xout, yout,marker='o',color='forestgreen',ls='None',label='out-degree')
axs[0,1].set_title('Degree distributions loglog, lin-binning',size=14)
axs[0,1].set_xlabel('in/out-degree $k$',size=14)
axs[0,1].set_ylabel('$P(k)$',size=14)
axs[0,1].legend()
axs[1,0].loglog(bin_means_xin ,bin_means_yin , marker='o',color='firebrick',ls='None',label='in-degree')
axs[1,0].loglog(bin_means_xout ,bin_means_yout,marker='o',color='forestgreen',ls='None',label='out-degree')
axs[1,0].set_title('Degree distributions loglog, log-binning',size=14)
axs[1,0].set_xlabel('in/out-degree $k$',size=14)
axs[1,0].set_ylabel('$P(k)$',size=14)
axs[1,0].legend()
axs[1,1].loglog(xin, np.cumsum(yin), marker='o',color='firebrick',ls='None',label='in-degree')
axs[1,1].loglog(xout, np.cumsum(yout),marker='o',color='forestgreen',ls='None',label='out-degree')
axs[1,1].set_title('Degree distributions loglog, cumulative',size=14)
axs[1,1].set_xlabel('in/out degree $k$',size=14)
axs[1,1].set_ylabel('$P(k)$',size=14)
axs[1,1].legend()

## **6. H undirected graph**

* An undirected weighted graph is created:

In [None]:
myedges = edges[edges['weight'] > 1][['from','to','weight']]
myedges.index = myedges.reset_index(drop=True)
cols = myedges.columns.tolist()
cols = cols[1:] + cols[:1]
myedges = myedges[cols]
dt = myedges[['from','to','weight']].values.tolist()

H=nx.Graph()
#H.add_nodes_from(myedges['from'].drop_duplicates())
H.add_nodes_from(nodes['zone'])
H.add_weighted_edges_from(dt)
print("The H graph has ",len(H.nodes())," nodes and ",len(H.edges())," edges." )

* Visualize nodes geographicaly by their degree:

    the nodes (zones) with no traffic have degree=0 and are not shown

In [None]:
fig=plt.figure(figsize=(12,8))

s = nx.draw_networkx_nodes(H, mypos,node_size=list(dict(H.degree).values()),
            node_color=list(dict(H.degree).values()),cmap=plt.cm.coolwarm)
cbar=plt.colorbar(s)
cbar.ax.set_ylabel('Degree', size=14)
plt.title('Network nodes',size=14)
plt.axis('off')

* Visualize the edges according to their weight:

    edges with more traffic are bolder than those with lower traffic who are more faded

In [None]:
e1 = [(u, v) for (u, v, d) in H.edges(data=True) if d['weight'] <= 50]
e2 = [(u, v) for (u, v, d) in H.edges(data=True) if d['weight'] <= 150 & d['weight'] > 50]
e3 = [(u, v) for (u, v, d) in H.edges(data=True) if d['weight'] <= 250 & d['weight'] > 150]
e4 = [(u, v) for (u, v, d) in H.edges(data=True) if d['weight'] <= 400 & d['weight'] > 250]
e5 = [(u, v) for (u, v, d) in H.edges(data=True) if d['weight'] <= 600 & d['weight'] > 400]
e6 = [(u, v) for (u, v, d) in H.edges(data=True) if d['weight'] <= 1070 & d['weight'] > 600]


In [None]:
fig=plt.figure(figsize=(12,8))

s=nx.draw_networkx_nodes(H, pos=mypos,node_color=list(dict(H.degree).values()),
            node_size=list(dict(H.degree).values()),cmap=plt.cm.coolwarm)
#nx.draw_networkx_edges(H, pos=mypos, alpha=0.5)
nx.draw_networkx_edges(H, mypos, edgelist=e1, width=0.2, alpha = 0.5)
nx.draw_networkx_edges(H, mypos, edgelist=e2, width=0.5, alpha = 0.5)
nx.draw_networkx_edges(H, mypos, edgelist=e3, width=1.5, alpha = 0.5)
nx.draw_networkx_edges(H, mypos, edgelist=e4, width=3.5, alpha = 0.5)
nx.draw_networkx_edges(H, mypos, edgelist=e5, width=5.5, alpha = 0.5)
nx.draw_networkx_edges(H, mypos, edgelist=e3, width=8.5, alpha = 0.5)
cbar=plt.colorbar(s)
cbar.ax.set_ylabel('Degree', size=14)
plt.title('Network edges',size=14)
plt.axis('off')

#### Nodes' centrality measures:

In [None]:
def draw(H, mypos, measures, measure_name):
    
    nodes = nx.draw_networkx_nodes(H, mypos, node_size=80, cmap=plt.cm.coolwarm, 
                                   node_color=list(dict(measures).values()))
    
    #edges = nx.draw_networkx_edges(H, mypos)
    nx.draw_networkx_edges(H, mypos, edgelist=e1, width=0.2, alpha = 0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e2, width=0.5, alpha = 0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e3, width=1.5, alpha = 0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e4, width=3.5, alpha = 0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e5, width=5.5, alpha = 0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e3, width=8.5, alpha = 0.5)

    plt.title(measure_name,size=16)
    plt.colorbar(nodes)
    plt.axis('off')
    #plt.show()

In [None]:
plt.figure(figsize=(10,8))
plt.subplot(2,2,1)
draw(H, mypos, nx.degree_centrality(H), 'Degree Centrality')
plt.subplot(2,2,2)
draw(H, mypos, nx.eigenvector_centrality(H), 'Eigenvector Centrality')
plt.subplot(2,2,3)
draw(H, mypos, nx.closeness_centrality(H), 'Closeness Centrality')
plt.subplot(2,2,4)
draw(H, mypos, nx.betweenness_centrality(H), 'Betweenness Centrality')
plt.show()

## **7. H Network analysis**

In [None]:
x=[]
y=[]
for i in sorted(degin_distri):   
    x.append(i)
    y.append(degin_distri[i]/len(H))

x.sort(reverse=True)
y.sort()

bin_count =30
maxx = np.log10(np.max(x))
maxy = np.log10(np.max(y))
max_base = np.max([maxx,maxy])
minx = np.log10(np.min(drop_zeros(x)))
bins = np.logspace(minx,max_base,num=bin_count)
bin_means_y = (np.histogram(x,bins,weights=y)[0] / np.histogram(x,bins)[0])
bin_means_x = (np.histogram(x,bins,weights=x)[0] / np.histogram(x,bins)[0])

* Degree distributions: 

In [None]:
fig, axs = plt.subplots(2, 2,figsize=(10,10))
axs[0,0].plot(x ,y , marker='o',color='royalblue',ls='None')
axs[0,0].set_title("Degree distribution linear, lin-binning",size=14)
axs[0,0].set_xlabel('degree $k$',size=14)
axs[0,0].set_ylabel('$P(k)$',size=14)

axs[0,1].loglog(x ,y , marker='o',color='royalblue',ls='None')
axs[0,1].set_title('Degree distribution loglog, lin-binning',size=14)
axs[0,1].set_xlabel('degree $k$',size=14)
axs[0,1].set_ylabel('$P(k)$',size=14)

axs[1,0].loglog(bin_means_x ,bin_means_y , marker='o',color='royalblue',ls='None')
axs[1,0].set_title('Degree distribution loglog, log-binning',size=14)
axs[1,0].set_xlabel('degree $k$',size=14)
axs[1,0].set_ylabel('$P(k)$',size=14)

axs[1,1].loglog(x, np.cumsum(y), marker='o',color='royalblue',ls='None')
axs[1,1].set_title('Degree distribution loglog, cumulative',size=14)
axs[1,1].set_xlabel('degree $k$',size=14)
axs[1,1].set_ylabel('$P(k)$',size=14)

#fig1, ax = plt.subplots(1, 1,figsize=(16,16))
#ax.plot(deg_distri.keys(),deg_distri.values(),'o',color='mediumpurple',ls='None')
#ax.set_xlabel('num of links k',size=16)
#ax.set_ylabel('num of nodes with k links',size=16)


The degrees appear to have a low-degree saturation (a flattened pk for k<ksat). So there are fewer small degree nodes than expected from a power-law. The presence of these k<ksat does not affect the properties of scale-free networks.

Also, there appears a high-degree cutoff (a sudden drop of pk for k>kcut), that limits the size of the largest hub. So there are fewer high-degrees than expected from a power-law.

## **8. Connectivity**

In [None]:
print("N nodes = ", len(H.nodes()))
print( "L edges = ",len(H.edges()))
print("Is the graph simply connected?", nx.is_connected(H))

In [None]:
print("density = ",nx.density(H))
print("for complete graph L max = ", len(H.nodes())*(len(H.nodes())-1)/2)

In [None]:
print("The graph has ", nx.number_connected_components(H),"connected components")

In [None]:
for k in nx.connected_components(H):
    print(len(k))

## **9. Extract largest connected component**

In [None]:
nx.connected_components(H)

In [None]:
graphs = list(nx.connected_components(H))

In [None]:
H1=H.subgraph(graphs[0])

In [None]:
len(H1)

In [None]:
print(len(G)-len(H1))

In [None]:
print("Check that the graph is now connected")
nx.is_connected(H1)

## **10. Global clustering coefficient**

In [None]:
print("The number of triangles that include a node as one vertex:")
print("triangles = ",sum(list(nx.triangles(H1).values()))/3) 

In [None]:
print("Fraction of all possible triangles in graph (1 for complete graph):")
print("transitivity = ",nx.transitivity(H1) )

### **Local and average clustering coefficient:**

In [None]:
deg=dict(H1.degree()).values()
deg_distri=Counter(deg)
wei= nx.get_edge_attributes(H1,'weight')

In [None]:
c = dict(nx.clustering(H1,weight='weight'))
cno = dict(nx.clustering(H1))
clust = np.fromiter(c.values(), dtype=float)
clustno = np.fromiter(cno.values(), dtype=float)
cc = np.sort(clust)
ccno = np.sort(clustno)

k=[]
for i in sorted(list(deg)):   
    k.append(i)  
k.sort(reverse=True)
avercl = nx.average_clustering(H1,weight='weight')#/maxc
averclno = nx.average_clustering(H1)

plt.figure(figsize=(10,5))
plt.subplot(1,2,1)
plt.loglog(k,cc,'bo')
plt.xlabel('degree $k$', fontsize=12)
plt.ylabel('$C(k)$ ', fontsize=12)
plt.title('Local clustering coefficient weighted',size=14)
plt.axhline(avercl, color='k', linestyle='dashed', linewidth=1)
min_xlim, max_xlim = plt.xlim()
plt.text(max_xlim*0.1,avercl*1.1,  '<C>_weighted = {:.4f}'.format(avercl))
plt.subplot(1,2,2)
plt.loglog(k,ccno,'go')
plt.xlabel('degree $k$', fontsize=12)
plt.ylabel('$C(k)$ ', fontsize=12)
plt.title('Local clustering coefficient no weights',size=14)
plt.axhline(averclno, color='k', linestyle='dashed', linewidth=1)
min_xlim, max_xlim = plt.xlim()
plt.text(max_xlim*0.1,averclno*1.01,  '<C> = {:.4f}'.format(averclno))
plt.show()



From the dependence of the clustering coefficient on the node degree we see that the local clustering coefficient is more significant for hubs than for lower degree nodes. The large
degree nodes are located in dense local network neighborhoods, while the
neighborhood of the small degrees is much sparser.

The local clustering coefficient for the weighted graph shows that traffic changes the clustering when compared to unweighted graph. Nontheless, the average is always in the hub territory.

## **11. Distances for connected component of graph:**

In [None]:
aver = nx.average_shortest_path_length(H1,weight='weight')
d = dict(nx.shortest_path_length(H1,weight='weight'))
dist = []
for k, v in d.items():
    for k1, v1 in v.items():
        dist.append(v1)
pdist = Counter(dist)
xdist=[]
ydist=[]
for i in sorted(pdist):   
    xdist.append(i)
    ydist.append(pdist[i]/len(dist))


plt.plot(xdist,ydist, marker='o',ls='-',lw=0.7)
plt.title('Distances distribution',size=14)
plt.xlabel('distance d',size=12)
plt.ylabel('$P_d$',size=12)
plt.axvline(aver, color='k', linestyle='dashed', linewidth=1)

min_ylim, max_ylim = plt.ylim()
plt.text(aver*1.1, max_ylim*0.9, '<d> = {:.4f}'.format(aver))
print(" diameter calculated: dmax = ",nx.diameter(H1))

## **12. The relation of the size of the network and the size of the largest hub**

In [None]:
print('kmax =' ,max(list(deg)))
print("kmin = ",min(list(deg)))
print("<k> = {:.2f}".format(np.average(list(deg)))) 
print("lnN = {:.4f}".format(np.log(len(H.nodes()))))
#print("N-1 = ", len(H.nodes)-1)

For an exponential degree distribution, $k_{max}$ is not very different from $k_{min}$ because there are no hubs. In general $k_{max}$~lnN,but clearly this is not the case here.

For a degree distribution following a power law, there is a polynomial dependence of $k_{max}$ on N : $k_{max}$ is 2 orders of magnitude greater than $k_{min}$ because there are hubs.
In general, $k_{max}$~$N^{\frac{1}{\gamma-1}}$.

## **13. Is there a scale-free property?**

In [None]:
print("<k>   = {:.2f}".format(np.average(list(deg)))) 
k2 = np.power(list(deg),2)
print("<k^2> = {:.2f}".format(np.average(k2)))
k3 = np.power(list(deg),3)
print("<k^3> = {:.2f}".format(np.average(k3)))
k4 = np.power(list(deg),4)
print("<k^4> = {:.2f}".format(np.average(k4)))

Moments are much larger than <k> by several orders of magnitude. There are significant degree variations around the average. An arbitrary node could have a tiny/large k, so there is no scale.
A scale would exist only if <k> had comparable degrees with any k and would serve as a scale.
So, here there is a scale-free property.

In [None]:
print("d max = ",nx.diameter(H1))
print("<d> weighted = {:.4f}".format(nx.average_shortest_path_length(H1,weight='weight')))
print("<d> = {:.4f}".format(nx.average_shortest_path_length(H1)))
print("small world property = {:.4f}".format(np.log(len(H1.nodes()))/np.log(np.average(list(deg)))))

In [None]:
print("lnlnN = {:.4f}".format(np.log(np.log(len(H1.nodes()))))) 
print("lnN/lnlnN = {:.4f}".format(np.log(len(H1.nodes()))/np.log(np.log(len(H1.nodes())))))
print("lnN = {:.4f}".format(np.log(len(H1.nodes()))))

The small world property estimate is more relevant with $<d>$~$\frac{lnlnN}{ln(\gamma-1)}$.
That is the case of ultra-small-world regime, where $<d>$ increases with $lnlnN$ : slower than $lnN$(for the random network) because the hubs reduce the path length.

The network has the scale-free property because there are hubs and the ultra-small-world regime is for degree exponent $2<\gamma<3$.

## **14. Calculating power law $P(k)$~$k^{-\gamma}$:**

In [None]:
logbinx = bin_means_x[np.logical_not(np.isnan(bin_means_x))]
logbiny = bin_means_y[np.logical_not(np.isnan(bin_means_y))]

In [None]:
fit_function = pwl.Fit(logbinx)
print("xmin: ",fit_function.power_law.xmin)
print("degree exponent: {:.4f}".format(fit_function.power_law.alpha))
print("sigma: {:.4f}".format(fit_function.power_law.sigma))
print(" ")
fit_function_fixmin = pwl.Fit(logbinx, xmin=33,xmax=125)
print("new xmin: ",fit_function_fixmin.xmin)
print("new degree exponent: {:.4f}".format(fit_function_fixmin.power_law.alpha))
print("new sigma: {:.4f}".format(fit_function_fixmin.power_law.sigma))
print(" ")
print("KS distance: {:.4f}".format(fit_function.power_law.D))
print("new KS distance: {:.4f}".format(fit_function_fixmin.power_law.D))


I consider the log-binned data for the calculation. The degree exponent estimate with the minimal KS distance, together with its sigma brings the power law close to the anomalous regime.

But after correcting the minimum degree, by leaving out the low degree saturation (even if KS is not minimal) reveals that it lies in the ultra-small-world-regime where $2<\gamma<3$.

### **Fitting the power law to the log-binned data:**

In [None]:
fig, (ax1) = plt.subplots(1, 1)
fig.suptitle('Fits and Power laws: log binning k distribution',size=16)
fig.set_size_inches(8,6)
fit = pwl.Fit(logbinx, discrete=True)
fig = fit.plot_pdf(color='royalblue', linewidth=2)
fit.power_law.plot_pdf(color='royalblue', linestyle='--', ax=fig,label='pdf power law')
ax1.loglog(logbinx, logbiny,'o',color='royalblue',ls='None')

fitmin = pwl.Fit(logbinx, xmin=30,discrete=True, xmax=logbinx[-2])
fig = fitmin.plot_pdf(color='tab:red', linewidth=2)
fitmin.power_law.plot_pdf(color='tab:red', linestyle='--', ax=fig,label='pdf power law with $x_{sat}$ and $x_{cut}$')
ax1.loglog(logbinx[5:-1], logbiny[5:-1],'o',color='tab:red',ls='None')

ax1.set_xlabel('degree $k$',size=16)
ax1.set_ylabel('$P(k)$',size=16)
ax1.set_ybound(0.1,0.005)
ax1.legend()

The fitting is not perfect but still it shows a deviation from the power law for small degrees.
The slope of the fit after leaving out the low degree saturation seems closer to the degree distribution.

### **Fitting again by leaving out the small degree saturation:**

Fitting according to power law uses formula $P_k = k^{-\gamma}$

Fitting with low degree saturation formula $P_k = a(k + k_{sat})^{-\gamma}$ 

Fitting with low degree saturation and high degree cutoff formula $P_k = a(k + k_{sat})^{-\gamma} e^{\frac{-k}{k_{cut}}}$ 


In [None]:
def test_func_ksat(k, a, gamma):
    return a*((k+83)**(-gamma)) #*np.exp(-k/125)
def test_func_pw(k, a, gamma):
    return a * (np.power(k,-gamma+1)) 
def test_func_kcut(k, a, gamma):
    return a*((k+83)**(-gamma)) #*np.exp(-k/125)

params, params_covariance = optimize.curve_fit(test_func_ksat, logbinx, logbiny, p0=[1.,1.])
params1, params_covariance1 = optimize.curve_fit(test_func_pw, logbinx, logbiny, p0=[1.,1.])
params2, params_covariance2 = optimize.curve_fit(test_func_kcut, logbinx, logbiny, p0=[1.,1.])
a_ksat = params[0]
a_pw = params1[0]
gamma_ksat = params[1]
gamma_pw = params1[1]
a_kcut = params2[0]
gamma_kcut = params2[1]

print("Power law: a = {:.6f}".format(a_pw), " gamma = {:.6f}".format(gamma_pw))
print("with low degree saturation: a = {:.6f}".format(a_ksat), " gamma = {:.6f}".format(gamma_ksat))
print("with low degree saturation and high degree cutoff: a = {:.6f}".format(a_kcut), " gamma = {:.6f}".format(gamma_kcut))

plt.figure(figsize=(8, 6))
plt.scatter(logbinx, logbiny, label='degree')
plt.loglog(logbinx, test_func_pw(logbinx, a_pw, gamma_pw),label='power law')
plt.loglog(logbinx[6:], test_func_ksat(logbinx[6:], a_ksat, gamma_ksat),label='power law with $k_{sat}$')
plt.loglog(logbinx[6:-2], test_func_ksat(logbinx[6:-2], a_kcut, gamma_kcut),label='power law with $k_{cut}$')
plt.title("loglog k distribution (log binning) fitted ",size=16)
plt.xlabel('degree $k$', fontsize=16)
plt.ylabel('$P(k)$', fontsize=16)
plt.legend(loc='best')
plt.show()

Fitting with the classic power law formula is problematic: it appears greatly influenced by small degrees.
By correcting with cutoffs, we can say the high degree nodes follow the power law.

## **15. Assortativity**

Measure degree-assortativity of network:

In [None]:
x=[]
y=[]

avg_knn=defaultdict(list)

for n in H1.nodes():
    k=H1.degree(n)
    
    #nn=len(G.neighbors(n))
    total=0
    for j in H1.neighbors(n):
        total+=H1.degree(j)
    
    avg_knn[k].append(float(total)/k)
    
    x.append(k)
    y.append(float(total)/k)

#x.sort(reverse=True)
#y.sort()
z=[]
for k in sorted(avg_knn.keys()):
    knn=np.array(avg_knn[k])
    z.append(np.average(knn))
    
#z.sort(reverse=True)

In [None]:
plt.figure(figsize=(10,7))
plt.scatter(x,y)
plt.plot(sorted(avg_knn.keys()), z,'-r')

plt.xlabel('$k_i$', fontsize=18)
plt.ylabel('$k_{nn}$', fontsize=18)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.yscale('log')
plt.xscale('log')
plt.show()

In [None]:
print("Degree assortativity coefficient of H1 is ",nx.degree_assortativity_coefficient(H1))

similar calculation with pearson correlation coefficient:

In [None]:
r2=nx.degree_pearson_correlation_coefficient(H1)
print(r2)

## **16. Resilience**

In [None]:
def net_attack(graph, ranked_nodes):
    
    fraction_removed=[]#here we store the tuples: (%removed nodes, size of gcc)
    
    graph1=graph.copy()
    nnodes=len(ranked_nodes)
    n=0    
    
    gcc=list(nx.connected_components(graph1))[0]
    
    gcc_size=float(len(gcc))/nnodes
    
    fraction_removed.append( (float(n)/nnodes, gcc_size) )
    
    while gcc_size>0.01:
        
        #we start from the end of the list!
        graph1.remove_node(ranked_nodes.pop())
        
        gcc=list(nx.connected_components(graph1))[0]
        gcc_size=float(len(gcc))/nnodes
        n+=1
        fraction_removed.append( (float(n)/nnodes, gcc_size) )
    
    plt.figure(figsize=(10,6))
    plt.subplot(1,2,1)
    nx.draw_networkx_nodes(H, pos=mypos, node_size=80)
    #nx.draw_networkx_edges(H, pos=mypos, alpha=0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e1, width=0.2, alpha = 0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e2, width=0.5, alpha = 0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e3, width=1.5, alpha = 0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e4, width=3.5, alpha = 0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e5, width=5.5, alpha = 0.5)
    nx.draw_networkx_edges(H, mypos, edgelist=e3, width=8.5, alpha = 0.5)
    plt.title('Network before attack',size=16)
    plt.axis('off')
    
    plt.subplot(1,2,2)
    nx.draw_networkx_nodes(graph1, pos=mypos,node_size=80)
    nx.draw_networkx_edges(graph1, pos=mypos, alpha=0.5)
    plt.title('fraction removed after attack',size=16)
    plt.axis('off')
    
    return fraction_removed

### Random attack

In [None]:
zone_nodes=list(H.nodes())
resilience_random=net_attack(H, zone_nodes)

### Betweenness based attack

In [None]:
zone_nodes_betw=[]

betw=nx.betweenness_centrality(H)
for i in sorted(betw.items(), key=itemgetter(1)):
    zone_nodes_betw.append(i[0])


resilience_betw=net_attack(H, zone_nodes_betw)

### Weighted betweeness based attack

In [None]:
zone_nodes_ebetw=[]

ebetw=nx.betweenness_centrality(H,weight='weight')
for i in sorted(ebetw.items(), key=itemgetter(1)):
    zone_nodes_ebetw.append(i[0])


resilience_ebetw=net_attack(H, zone_nodes_ebetw)

### Degree based attack

In [None]:
zone_nodes_degree=[]

deg=dict(H.degree())
for i in sorted(deg.items(), key=itemgetter(1)):
    zone_nodes_degree.append(i[0])


resilience_deg=net_attack(H, list(zone_nodes_degree))

### Closeness based attack

In [None]:
zone_nodes_cl=[]

cl=nx.closeness_centrality(H)
for i in sorted(cl.items(), key=itemgetter(1)):
    zone_nodes_cl.append(i[0])


resilience_cl=net_attack(H, zone_nodes_cl)

In [None]:
x=[k[0] for k in resilience_random]
y=[k[1] for k in resilience_random]

x1=[k[0] for k in resilience_deg]
y1=[k[1] for k in resilience_deg]

x2=[k[0] for k in resilience_betw]
y2=[k[1] for k in resilience_betw]

x3=[k[0] for k in resilience_ebetw]
y3=[k[1] for k in resilience_ebetw]

x4=[k[0] for k in resilience_cl]
y4=[k[1] for k in resilience_cl]

plt.figure(figsize=(10,7))
plt.plot(x,y, label='random attack')
plt.plot(x1,y1, label='degree based attack')
plt.plot(x2,y2, label='betweeness based attack')
plt.plot(x3,y3, label='weighted betweeness based attack')
plt.plot(x4,y4, label='closeness based attack')


plt.xlabel('$f_{c}$ % removed nodes', fontsize=18)
plt.ylabel('size of LargestConnComp', fontsize=18)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.title("Robustness of H network",size=16)
plt.legend(loc='upper right')

The network shows resilience for random removal of nodes but there is vulnerability for targeted attacks. 
The attacks focused on the closeness centrality seem to cause more damage.

## **17. Epidemic**

SIR model for H network with seed a randomly chosen degree node

In [None]:
mu=0.3    #infectious period
lambd=0.02   #probability of infection given a contact

In [None]:
H.disease_status={} #S=0, I=1, R=-1 
infected_nodes=[]#list of infected nodes
node_list=[] #let's choose a seed

deg=dict(H.degree())
for i in sorted(deg.items(), key=itemgetter(1)):
    node_list.append(i[0])
#seed=node_list[-1]
seed=node_list[100]
print("The seed is", seed)
print("The degree of the seed is", H.degree(seed))

infected_nodes.append(seed) #initialize the network

for n in H.nodes():
    if n in infected_nodes:
        H.disease_status[n]=1   #infected
    else:
        H.disease_status[n]=0   #susceptible

In [None]:
I_net=[]
while len(infected_nodes)>0:
    
    #transmission
    for i in infected_nodes:
        for j in H.neighbors(i):
            if H.disease_status[j]==0: #if susceptible
                p=np.random.random()
                if p<lambd: #if p < prob of infection
                    H.disease_status[j]=1 #then it gets infected
    #recovery
    for k in infected_nodes:
        p=np.random.random()
        if p<mu:         #if p < infectious period
            H.disease_status[k]=-1 #then it recovers
    #update of disease status
    infected_nodes=[]
    for n in H.nodes():
        if H.disease_status[n]==1: #if infected
            infected_nodes.append(n)
 #store output
    I_net.append(len(infected_nodes))
    
recovered=0
for n in H.nodes():
    if H.disease_status[n]==-1:
        recovered+=1

print("The total number of recovered nodes is", recovered)
print("The total number of infected nodes is", len(infected_nodes))
print("The final attack rate is", recovered/len(H.nodes()))

plt.figure(figsize=(10,7))
plt.xlabel('time(days)', fontsize=18)
plt.ylabel('prevalence', fontsize=18)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.plot(range(0,len(I_net)),I_net)

Visualise SIR epidemic spreading on the map

In [None]:
H.disease_status={} #S=0, I=1, R=-1
infected_nodes=[]
infected_nodes.append(seed)

for n in H.nodes():
    if n in infected_nodes:
        H.disease_status[n]=1  #infected
    else:
        H.disease_status[n]=0  #susceptible
t=0
node_color=[H.disease_status[v] for v in H]
s=nx.draw_networkx_nodes(H ,pos=mypos ,node_color='r' ,node_size=list(dict(H.degree).values()))
plt.axis('off')

In [None]:
while len(infected_nodes)>0 and t<15:
    
    for i in infected_nodes:
        for j in H.neighbors(i):
            if H.disease_status[j]==0:
                p=np.random.random()
                if p<lambd:
                    H.disease_status[j]=1
                
    for k in infected_nodes:
        p=np.random.random()
        if p<mu:
            H.disease_status[k]=-1
    
    infected_nodes=[]
    for n in H.nodes():
        if H.disease_status[n]==1:
            infected_nodes.append(n)

    t+=1
    node_color=[H.disease_status[v] for v in H]#color code on disease status
    
    plt.figure(figsize=(10,8))
    nx.draw_networkx_nodes(H, pos=mypos, node_size=list(dict(H.degree).values()),node_color=node_color,cmap=plt.cm.RdBu_r, vmin=-1, vmax=1)
    

#### The epidemic threshold for this network can be approximated as $\lambda_c = \frac{\mu}{\langle k \rangle}$

In [None]:
avg_deg1=2*len(H.edges)/H.number_of_nodes()
lc=mu/avg_deg1
print("lc = ",lc)

#### As expected for this network, we have $\langle k^2 \rangle \sim \langle k \rangle^2 + \langle k \rangle$

In [None]:
N=H.number_of_nodes()
sum_k2=0
for i in H.nodes():
    k=H.degree(i)
    sum_k2+=k*k
avg_k2=sum_k2/N
print("<k^2> = ",avg_k2) 
print("<k>^2 + <k> = ",avg_deg1**2 + avg_deg1)
avg_deg=2*len(H.edges)/N
print("average degree = ",avg_deg1)