In [86]:
using Pkg
pkg"activate .."

In [87]:
using DataDeps

In [88]:
register(DataDep("BlogCatalog",
        """
Authors: Nitin Agarwal+, Xufei Wang*, Huan Liu*
Website: http://socialcomputing.asu.edu/datasets/BlogCatalog 
        
### Data Set Information:

1. nodes.csv: it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains all the node ids used in the dataset.
2. edges.csv :  this is the friendship network among the bloggers. The blogger's friends are represented using edges. Here is an example.
1,2
This means blogger with id "1" is friend with blogger id "2".

### Attribute Information:
This is the data set crawled on July, 2009 from BlogCatalog ( http://www.blogcatalog.com ). BlogCatalog is a social blog directory website. This contains the friendship network crawled. For easier understanding, all the contents are organized in CSV file format.

#### Basic statistics
Number of bloggers : 88,784
Number of friendship pairs: 4,186,390
        
Please cite the paper:
Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang. "A Social Identity Approach to Identify Familiar Strangers in a Social Network", 3rd International AAAI Conference on Weblogs and Social Media (ICWSM09), pp. 2 - 9, May 17-20, 2009. San Jose, California. 
        
Please cite the repository:
R. Zafarani and H. Liu, (2009). Social Computing Data Repository at ASU [http://socialcomputing.asu.edu]. Tempe, AZ: Arizona State University, School of Computing, Informatics and Decision Systems Engineering. 
""",
"http://socialcomputing.asu.edu/uploads/1252092625/BlogCatalog-dataset.zip",
post_fetch_method=unpack
))

DataDep{Nothing,String,typeof(DataDeps.fetch_http),typeof(unpack)}("BlogCatalog", "http://socialcomputing.asu.edu/uploads/1252092625/BlogCatalog-dataset.zip", nothing, DataDeps.fetch_http, DataDeps.unpack, "Authors: Nitin Agarwal+, Xufei Wang*, Huan Liu*\nWebsite: http://socialcomputing.asu.edu/datasets/BlogCatalog \n        \n### Data Set Information:\n\n1. nodes.csv: it's the file of all the users. This file works as a dictionary of all the users in this data set. It's useful for fast reference. It contains all the node ids used in the dataset.\n2. edges.csv :  this is the friendship network among the bloggers. The blogger's friends are represented using edges. Here is an example.\n1,2\nThis means blogger with id \"1\" is friend with blogger id \"2\".\n\n### Attribute Information:\nThis is the data set crawled on July, 2009 from BlogCatalog ( http://www.blogcatalog.com ). BlogCatalog is a social blog directory website. This contains the friendship network crawled. For easier understa

In [89]:
using LightGraphs

In [90]:
nodes = parse.(Int,collect(eachline(datadep"BlogCatalog/BlogCatalog-dataset/data/nodes.csv")))
graph= PathGraph(length(nodes))
for line in eachline(datadep"BlogCatalog/BlogCatalog-dataset/data/edges.csv")
    src, dest = parse.(Int, split(line,","))
    add_edge!(graph, src, dest)
end
graph

{88784, 2180555} undirected simple Int64 graph

In [104]:
using NBInclude
@nbinclude("utils.ipynb")

color_clusters (generic function with 2 methods)

In [93]:
@nbinclude("core.ipynb")

nodeGLoVE (generic function with 2 methods)

In [94]:
W = (Float32.(adjacency_matrix(graph)));


In [105]:
@time prob_norm!(W)

 30.135259 seconds (661.03 k allocations: 84.883 MiB, 0.13% gc time)


88784×88784 SparseMatrixCSC{Float32,Int64} with 4361110 stored entries:
  [2    ,     1]  =  0.00833333
  [3    ,     1]  =  0.00869565
  [4    ,     1]  =  0.000465116
  [5    ,     1]  =  0.00239808
  [6    ,     1]  =  0.00191205
  [7    ,     1]  =  0.00131579
  [8    ,     1]  =  0.000300842
  [9    ,     1]  =  0.000193874
  [1    ,     2]  =  0.125
  [3    ,     2]  =  0.00869565
  [4    ,     2]  =  0.000465116
  [7    ,     2]  =  0.00131579
  ⋮
  [88781, 88780]  =  0.333333
  [904  , 88781]  =  0.000590667
  [88780, 88781]  =  0.333333
  [88782, 88781]  =  0.333333
  [5014 , 88782]  =  0.004
  [88781, 88782]  =  0.333333
  [88783, 88782]  =  0.333333
  [228  , 88783]  =  0.0075188
  [88782, 88783]  =  0.333333
  [88784, 88783]  =  0.5
  [6595 , 88784]  =  0.00273973
  [88783, 88784]  =  0.333333

In [None]:
@time X = cooccurance_matrix(W, 5, 0.25, 0.25)

In [None]:
C,V = nodeGLoVE(X, 128; time_limit=24*60*60)