Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: correct calculation of knn(k) in avg_nearest_neighbor_degree() #2419

Merged
merged 13 commits into from Nov 2, 2023

Conversation

szhorvat
Copy link
Member

@szhorvat szhorvat commented Oct 29, 2023

The weighted calculation of $k_{nn}(k)$ was incorrect. What the function did so far was to calculate the weighted mean neighbour degree for each node, then to average these weighted mean neighbour degrees within groups of vertices of the same degree. This does not agree with the definition, e.g. in Barrat et al:

image

Please see here for details:

https://igraph.discourse.group/t/replicating-4-directed-assortativities-in-the-literature-with-knn-mode/1691

CC @suthers

This PR is only draft for now, as I plan to make further changes. This one fix is ready for review.

Further changes will be:

  • It is unclear to me if it makes sense to calculate $k_{nn}(k)$ this way when we only have $k_{nn,i}$ for a subset of vertices (i.e. vids is not set to all vertices). Furthermore, @suthers makes a good argument for using a different "mode" concept when computing $k_{nn}(k)$ (and not $k_{nn,i}$). We need to control the degree calculation of the source and target vertex separately, but we don't really need to control how neighbours are found (other than perhaps specifying whether they should be found in a directed or undirected manner). Therefore I will add an additional function that computes $k_{nn}(k)$ specifically, and always considers all vertices/edges in the graph.
  • This function rejects non-simple graphs, but I think now I have some clarity about how it should be generalized to this case. I plan to do this, at least for self-loops.
  • This will be done in tandem with the joint-degree matrix PR. The JDM, with minor modifications. can be interpreted as the joint degree distribution.

If $P_{ij}$ is the joint degree distribution, then $k_{nn}(k) = \frac{\sum_{j} j P_{kj}}{\sum_{j} P_{kj}}$. I verified that the updated function agrees with this, using an independent weighted computation of $P_{ij}$.

@codecov
Copy link

codecov bot commented Oct 29, 2023

Codecov Report

Merging #2419 (98e7f14) into master (a490853) will increase coverage by 0.01%.
Report is 1 commits behind head on master.
The diff coverage is 91.04%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2419      +/-   ##
==========================================
+ Coverage   83.67%   83.68%   +0.01%     
==========================================
  Files         377      377              
  Lines       62080    62135      +55     
==========================================
+ Hits        51943    51999      +56     
+ Misses      10137    10136       -1     
Files Coverage Δ
src/properties/degrees.c 90.37% <91.04%> (+3.42%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3da71e...98e7f14. Read the comment docs.

@szhorvat szhorvat force-pushed the fix/knnk branch 2 times, most recently from 6ad9008 to 41b7c8d Compare October 29, 2023 13:21
@szhorvat
Copy link
Member Author

szhorvat commented Oct 29, 2023

@suthers Can you have a look through the documentation of the igraph_knnk() function I just added, and suggest improvements and references?

@szhorvat
Copy link
Member Author

szhorvat commented Oct 29, 2023

@ntamas @vtraag This is functionally ready, please review.

What's missing is expanded testing and expanded docs. We also need to settle on a final name for igraph_knnk(). It could be igraph_degree_correlation_function(), as in http://networksciencebook.com/chapter/7#measuring-degree What I dislike about this is that the result is a vector, not a function. This may be confusing to some. Mathematica and NetworkX use "mean/average degree connectivity".

@szhorvat
Copy link
Member Author

Remind me, how do we order parameters in new functions, in particular where does weights go? Please check both here and in #2407

@ntamas
Copy link
Member

ntamas commented Oct 30, 2023

weights is good at its currnt position I think. Typically we have the following order: the graph itself, any out-parameters, then the from and the to parameters (if any), then the weights, then the neighborhood modes and other stuff.

@ntamas
Copy link
Member

ntamas commented Oct 30, 2023

Why igraph_degree_correlation_function(), can't we call it igraph_degree_correlation_vector() if its output is a vector?

@ntamas
Copy link
Member

ntamas commented Oct 30, 2023

Looks good to me but I need some more time to validate the test case. Until then, feel free to go ahead with the documentation and other chores. Thanks!

@szhorvat
Copy link
Member Author

Just added more tests.

@szhorvat
Copy link
Member Author

szhorvat commented Oct 30, 2023

@ntamas Here are the graphs, to ease validation.

ug

image

Degrees are below.

Joint degree distribution:

image

dg

image

In- and out-degrees are below and above.

Joint out-in degree distribution:

image

@vtraag
Copy link
Member

vtraag commented Oct 30, 2023

The name avg_nearest_neighbor_degree sounds most reasonable. I recall us having an earlier discussion on this topic, but I have to find it again.

@szhorvat
Copy link
Member Author

szhorvat commented Oct 30, 2023

The name avg_nearest_neighbor_degree sounds most reasonable. I recall us having an earlier discussion on this topic, but I have to find it again.

@vtraag I am not asking to change the name of the existing function. I am asking for feedback on the name of the new one, currently called igraph_knnk(). This PR contains both a fix to igraph_avg_nearest_neighbor_degree() and a new function, igraph_knnk(), with a more flexible $k_{nn}(k)$ calculation. It makes sense to have both in the same PR, since they are tested against each other.

@vtraag
Copy link
Member

vtraag commented Oct 30, 2023

Sorry, I misunderstood then! I can take an actual closer look later this week probably...

@szhorvat
Copy link
Member Author

I propose igraph_degree_correlation_vector(), and hoping for a quick merge. This is based on "degree correlation function" used in http://networksciencebook.com/chapter/7#measuring-degree It is not used anywhere else, but still the most reasonable name in my opinion.

Other options:

  • igraph_average_degree_connectivity(), in the footsteps for NetworkX and Mathematica. This name is not very descriptive to me.
  • Original papers (ref [4,5] in the book) call it "the nearest-neighbors average connectivity of nodes with connectivity $k$", which is not a phrase I think most people can parse ...
  • Keep igraph_knnk() as it is now.

If I don't hear back in a couple of days, I'll go ahead with the rename to igraph_degree_correlation_vector() and finalize the docs.

@ntamas
Copy link
Member

ntamas commented Nov 1, 2023

igraph_degree_correlation_vector() is okay for me.

@szhorvat
Copy link
Member Author

szhorvat commented Nov 1, 2023

@ntamas I'm not sure if you saw my question in chat. Can you create a new wiki page and write up the preferred argument order very briefly? I always get confused about this and I need some documentation I can refer to.

@szhorvat
Copy link
Member Author

szhorvat commented Nov 1, 2023

@ntamas If the argument order is correct, I would like to merge this as-is, without squashing. I would like to backport the weighted knnk bugfix to the 0.9 branch, so that it can be included in the upcoming R/igraph 1.6.0, and so that we can deal with revdepcheck results soon.

Any additional minor changes an be made after merging.

EDIT: The force push below was to rebase on latest master and resolve conflicts.

@ntamas
Copy link
Member

ntamas commented Nov 2, 2023

I'm not sure if you saw my question in chat. Can you create a new wiki page and write up the preferred argument order very briefly?

Will do, I just need to find some spare time to do it :)

I would like to merge this as-is, without squashing.

Sure, go ahead.

@szhorvat szhorvat marked this pull request as ready for review November 2, 2023 13:22
@szhorvat szhorvat merged commit b912727 into master Nov 2, 2023
35 checks passed
@szhorvat szhorvat deleted the fix/knnk branch November 2, 2023 13:23
@vtraag
Copy link
Member

vtraag commented Nov 2, 2023

Since it's already merged, I won't take a further look anymore...

@szhorvat
Copy link
Member Author

szhorvat commented Nov 2, 2023

Since it's already merged, I won't take a further look anymore...

No need to take a look at the bugfix, I am confident in that.

A read through of the docs, a check of the argument order, and a check of the function name would be very welcome. These can still be improved. I wanted to merge to get the bugfix in ASAP and backport it to 0.9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants