eigenvector_centrality() questions #4650

NapsterInBlue · 2021-03-04T18:13:46Z

Hey there,

Am still very new to networkx and am, generally, getting my bearings on graph/network data, as a whole. All that to say that I'm hardly trying to come off as an A C K T U A L L Y pedant, nor outsource my critical thinking with this issue.

Anyhow, my Linear Algebra chops aren't as developed as I'd like them to be and the concept of Eigenvector Centrality wasn't even remotely as intuitive to me as, say, betweenness. I really like the book I'm using to learn the space, however the section on centrality measures kinda glosses over the mechanics of Eigenvector Centrality, instead opting to share conclusions. Editorializing the three paragraphs a bit:

...the eigenvector centrality uses a recursive definition: 'Tell me who your friends are, and I will tell you who you are.'" ... Mathematically, the eigenvector centrality of a node is the sum of the neighbors' eigenvector centralities divided by lambda-- the largest eigenvalue of the adjacency matrix of the network.

High eigenvector centrality identifies nodes that are surrounded by other nodes with eigenvector centrality. You can use this measure to locate groups of interconnected nodes with high prestige."

And so I took to the source code to better-understand what this was doing and my questions are as follows:

3blue1brown's Linear Algebra series has a great video on Eigenvectors and really helped me cement a geometric intuition for Eigenvalues/vectors. However, I'm coming up blank, trying to superimpose this clear, elegant geometric interpretation to the adjacency matrix. Can someone help me intuit the significance of Eigenvalues/vectors in network science?

I'm pretty green in the realm of numerical analysis, so bear with me on these next two

For this comment chunk

networkx/networkx/algorithms/centrality/eigenvector.py

Lines 115 to 118 in 0ad9394

    
           # Normalize the initial vector so that each entry is in [0, 1]. This is 
        
           # guaranteed to never have a divide-by-zero error by the previous line. 
        
           nstart_sum = sum(nstart.values()) 
        
           x = {k: v / nstart_sum for k, v in nstart.items()}

would it be more apropriate to say

    # Normalize the initial vector so that each entry is in (0, 1), exclusive.
    # This is guaranteed to never have a divide-by-zero error by the previous line.

? And if so, is there any value in making a baby-PR to change it? (Totally cool with "nope, lol", but willing to do it myself, if so)

More generally, is the gist of this method the same as what he shows in this 20-second chunk? And that "convergence" means that the vectors stayed "on the rails of the eigenbasis," compared to the last iteration?

networkx/networkx/algorithms/centrality/eigenvector.py

Lines 136 to 137 in 0ad9394

    
           if sum(abs(x[n] - xlast[n]) for n in x) < nnodes * tol: 
        
               return x

If so, then I'm still unclear what the corresponding v values in x mean-- probably a direct extension of (1), now that I'm thinking about it...

I had to read up on power iteration to follow what I was looking at, for the most part. Was going to ask if there was a compelling reason why the Wikipedia article had a numpy/vectorized implementation and this function didn't... right up until I realized that there was an implementation. So is there a good rule of thumb when I should be using one vs the other?
(Finally) Is there a better avenue for me to ask/rant questions like this? I know a lot of other libraries have Slack/Discord channels, or steer to a particular corner of Stack Overflow, but I didn't see anything either way in the developer guide. And like, totally cool if this is too small a library/community to warrant an off-site communication platform. Just wanted to make sure I wasn't shouting into the void/cluttering up a Bug Queue with... gestures vaguely

Frankly, I've been a Data Scientist for some time now and find myself extremely interested in this domain. I hadn't had any meaningful exposure to it until I attended an awesome workshop, a few PyCons ago. But even then, I went back to my day job, perpetually too busy to dig in more and it fell off my radar (boy, if I had a nickel...). That is, until my new job had me looking into Identity Resolution and brought me back to graphs, in a hurry.

IRL, I'm not sure I know anyone with more than a passing knowlege beyond LinkedIn, FB, and Six Degrees of Kevin Bacon. Which leaves me at the mercy of what I can grok from Google, YouTube, and impulse buying every book I can get my hands on. Which ain't all bad-- book led me to a great tool. Now I'm just looking for a community of people as excited about this stuff as I'm growing to be.

Thank you for coming to my TED Talk 🤷‍♂️

The text was updated successfully, but these errors were encountered:

Aufinal · 2021-03-05T15:14:02Z

Hey there !

Disclaimer : I am not a contributor to networkx, but I can try to answer at least your technical questions.

The entries need not be in (0, 1) ! For example, if nstart = [1, 0, 0, 0], then you can convince yourself that nstart_sum = 1 and therefore x = [1, 0, 0, 0]. The only thing that is ensured is that nstart_sum != 0, so that the division in this line does not throw an error.
The line you highlighted checks for convergence, i.e. "is my new vector close to my previous one" ? If the answer is true, then you've (approximately) found your eigenvector (since when the iterations get close, it means you're approaching the limit).
My guess is that networkx does not explicitly depend on numpy nor scipy, and thus there is a version of this function for when one of those packages is not installed. As a rule of thumb, if you have numpy on your system : use the numpy version !

NapsterInBlue · 2021-03-05T17:31:47Z

Totally makes sense, point for point. Thanks for taking the time to read/reply!

MridulS · 2021-03-05T18:19:45Z

BTW this discussion is more suited towards GitHub discussions (moving this issue to discussions).

My guess is that networkx does not explicitly depend on numpy nor scipy, and thus there is a version of this function for when one of those packages is not installed. As a rule of thumb, if you have numpy on your system : use the numpy version !

Yes numpy/scipy weren't hard dependencies of networkx but now they are and we will be updating the code(#4371) to use numpy version by default whenever someone uses eigenvector_centrality, currently if you want to access the numpy version you need to use eigenvector_centrality_numpy. The python implementation will stay in our codebase for reference (if someone just wants to read it or use it without numpy/scipy).

MridulS closed this as completed Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

eigenvector_centrality() questions #4650

eigenvector_centrality() questions #4650

NapsterInBlue commented Mar 4, 2021

Aufinal commented Mar 5, 2021

NapsterInBlue commented Mar 5, 2021

MridulS commented Mar 5, 2021

This issue was moved to a discussion.

This issue was moved to a discussion.

eigenvector_centrality() questions #4650

eigenvector_centrality() questions #4650

Comments

NapsterInBlue commented Mar 4, 2021

Aufinal commented Mar 5, 2021

NapsterInBlue commented Mar 5, 2021

MridulS commented Mar 5, 2021

This issue was moved to a discussion.