network: new package implement PageRank and HITS by kortschak · Pull Request #50 · gonum/graph

kortschak · 2015-05-06T01:43:01Z

UserAB1236872 · 2015-05-06T03:40:11Z

This seems pretty domain specific. I'm not sure about having it as a subpackage under graph. It seems like another package built on top of graph. But I don't really feel strongly about it.

I'm not really familiar with the algorithms so I can't comment on correctness or anything, but the API looks decent to me.

kortschak · 2015-05-06T04:14:02Z

Both are reasonably broadly applicable to semantic web, social graph, interactome, etc analysis. I see this as the basis for network analysis packages that would also include things like centrality etc.

These are the things I needed now. If people strongly object, I'll put them in my own tree.

kortschak · 2015-05-06T04:18:41Z

Maybe consider rename to network and s/Page/PageRank/g to reflect the intention.

btracey · 2015-05-06T14:22:24Z

Instead of having "a" PageRank implementation, we should support a couple. For example, this could be called ExactPageRank. In the future, we could have an inexact page rank where random walks are performed to estimate the page rank value.

Are there (vague) future plans to have personalized page rank?

kortschak · 2015-05-06T18:29:47Z

I didn't have any plans for that, but it seems more like something that should go in a stochastic simulation package. Intuitively, the architecture of such a package feels to me like optimize without the termination mechanisms.

kortschak · 2015-05-06T21:55:58Z

BTW What do you see as a use for a stochastic simulated PageRank?

btracey · 2015-05-07T01:29:53Z

I just reviewed the wikipedia article on PageRank, and I guess I got the solution methods for Page rank and personalized Page rank confused. As far as I'm aware, personalized Page rank is normally computed/estimated by random walks, as the linear solve/ iterative solve is too expensive. It seems like the full graph page rank is usually computed with iterative methods and not with random walks.

kortschak · 2015-05-07T01:36:02Z

Yep. OK. At the moment, we are not in a position to do large graphs anyway so the point is moot. The sparse optimisations are described in the last section of the linked article and are reasonably easy, but I don't need them immediately, so I will add them later.

btracey · 2015-05-08T22:50:37Z

I'm still coming up to speed on graph, but what are the issues with scaling to large graphs? I know some people who were considering go, and the recent gc improvements to scanning map[int]int really helps the use case.

I'm not criticizing, just curious. If it's a matter of "we haven't implemented it yet", it's fine, but are there fundamental issues with the package design and large graphs?

kortschak · 2015-05-09T00:47:52Z

I'm not sure I understand the question. At the moment, I have HITS implemented completely; HITS is intended to be used for small subgraphs, so scaling is not an issue. PageRank is implemented for the small graph case, but the large graph case is explained in the linked article - it is based on the observation that web graphs have on average 10 in edges per node so you do a sparse matrix-vector multiply. If we had a sparse matrix type, I would have already implemented that, but not yet, so I'll do that by hand for this case (to be removed when we do have one - I have been looking into this and I don't want to do sparse BLAS, it's too big - I have been looking at csparse as a possible port target though).

I have just realised that perhaps you are taking my comment to be about the graph repo, rather than the network package. I don't see any reason why graph/... won't handle large graphs since it essentially just defines a set of interfaces and algorithms that use those - I intend to port my graph engine over to the gonum/graph interface when it stabilises, and I use that engine to operate on graphs with >40M edges.

Does this answer your question?

btracey · 2015-05-09T16:17:01Z

That does answer the question, thanks.

btracey · 2015-05-09T16:21:47Z

network/hits.go

Can we call it "tol" instead of "eps"? It's more like english.

btracey · 2015-05-09T16:48:00Z

LGTM excepting the minor comments.

kortschak · 2015-05-09T22:13:30Z

PTAL

Waiting on answer to #50 (comment) as I'd like to add the reference if it's the correct one.

kortschak · 2015-05-10T08:23:48Z

PageRankSparse added. PTAL

btracey · 2015-05-10T22:55:41Z

network/page.go

It's a sparse vector element. It only knows a 1-D location.

Tomatayto/tomahto. I'll change it.

To explain further, there are two major categories of sparse element, only triple elements know anything about more than a linear dimension, and those are reasonably rarely used in practice. When the sparse package is implemented, a type like this will be a matrix and vector element.

kortschak force-pushed the rank branch from 1c63021 to 964ec9c Compare May 6, 2015 02:40

kortschak force-pushed the rank branch 2 times, most recently from 0d5a15d to 4b4bbed Compare May 7, 2015 00:04

kortschak changed the title ~~rank: new package implement PageRank and HITS~~ network: new package implement PageRank and HITS May 7, 2015

btracey reviewed May 9, 2015
View reviewed changes

network/hits.go Outdated

Copy link
Copy Markdown

Member

btracey May 9, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it "tol" instead of "eps"? It's more like english.

kortschak force-pushed the rank branch 5 times, most recently from c3ae1ec to 309d96c Compare May 10, 2015 08:12

btracey reviewed May 10, 2015
View reviewed changes

Conversation

kortschak commented May 6, 2015

Uh oh!

UserAB1236872 commented May 6, 2015

Uh oh!

kortschak commented May 6, 2015

Uh oh!

kortschak commented May 6, 2015

Uh oh!

btracey commented May 6, 2015

Uh oh!

kortschak commented May 6, 2015

Uh oh!

kortschak commented May 6, 2015 via email

Uh oh!

btracey commented May 7, 2015

Uh oh!

kortschak commented May 7, 2015

Uh oh!

btracey commented May 8, 2015

Uh oh!

kortschak commented May 9, 2015

Uh oh!

btracey commented May 9, 2015

Uh oh!

btracey May 9, 2015

Choose a reason for hiding this comment

Uh oh!

btracey commented May 9, 2015

Uh oh!

kortschak commented May 9, 2015

Uh oh!

kortschak commented May 10, 2015

Uh oh!

btracey May 10, 2015

Choose a reason for hiding this comment

Uh oh!

kortschak May 10, 2015 via email

Choose a reason for hiding this comment

Uh oh!

kortschak May 10, 2015

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants