Conversation
|
This seems pretty domain specific. I'm not sure about having it as a subpackage under graph. It seems like another package built on top of graph. But I don't really feel strongly about it. I'm not really familiar with the algorithms so I can't comment on correctness or anything, but the API looks decent to me. |
|
Both are reasonably broadly applicable to semantic web, social graph, interactome, etc analysis. I see this as the basis for network analysis packages that would also include things like centrality etc. These are the things I needed now. If people strongly object, I'll put them in my own tree. |
|
Maybe consider rename to |
|
Instead of having "a" PageRank implementation, we should support a couple. For example, this could be called ExactPageRank. In the future, we could have an inexact page rank where random walks are performed to estimate the page rank value. Are there (vague) future plans to have personalized page rank? |
|
I didn't have any plans for that, but it seems more like something that should go in a stochastic simulation package. Intuitively, the architecture of such a package feels to me like optimize without the termination mechanisms. |
|
BTW What do you see as a use for a stochastic simulated PageRank?
|
0d5a15d to
4b4bbed
Compare
|
I just reviewed the wikipedia article on PageRank, and I guess I got the solution methods for Page rank and personalized Page rank confused. As far as I'm aware, personalized Page rank is normally computed/estimated by random walks, as the linear solve/ iterative solve is too expensive. It seems like the full graph page rank is usually computed with iterative methods and not with random walks. |
|
Yep. OK. At the moment, we are not in a position to do large graphs anyway so the point is moot. The sparse optimisations are described in the last section of the linked article and are reasonably easy, but I don't need them immediately, so I will add them later. |
|
I'm still coming up to speed on graph, but what are the issues with scaling to large graphs? I know some people who were considering go, and the recent gc improvements to scanning map[int]int really helps the use case. I'm not criticizing, just curious. If it's a matter of "we haven't implemented it yet", it's fine, but are there fundamental issues with the package design and large graphs? |
|
I'm not sure I understand the question. At the moment, I have HITS implemented completely; HITS is intended to be used for small subgraphs, so scaling is not an issue. PageRank is implemented for the small graph case, but the large graph case is explained in the linked article - it is based on the observation that web graphs have on average 10 in edges per node so you do a sparse matrix-vector multiply. If we had a sparse matrix type, I would have already implemented that, but not yet, so I'll do that by hand for this case (to be removed when we do have one - I have been looking into this and I don't want to do sparse BLAS, it's too big - I have been looking at csparse as a possible port target though). I have just realised that perhaps you are taking my comment to be about the graph repo, rather than the network package. I don't see any reason why graph/... won't handle large graphs since it essentially just defines a set of interfaces and algorithms that use those - I intend to port my graph engine over to the gonum/graph interface when it stabilises, and I use that engine to operate on graphs with >40M edges. Does this answer your question? |
|
That does answer the question, thanks. |
network/hits.go
Outdated
There was a problem hiding this comment.
Can we call it "tol" instead of "eps"? It's more like english.
|
LGTM excepting the minor comments. |
|
PTAL Waiting on answer to #50 (comment) as I'd like to add the reference if it's the correct one. |
c3ae1ec to
309d96c
Compare
|
PageRankSparse added. PTAL |
network/page.go
Outdated
There was a problem hiding this comment.
It's a sparse vector element. It only knows a 1-D location.
There was a problem hiding this comment.
There was a problem hiding this comment.
To explain further, there are two major categories of sparse element, only triple elements know anything about more than a linear dimension, and those are reasonably rarely used in practice. When the sparse package is implemented, a type like this will be a matrix and vector element.
@soniakeys @Jragonmiris