Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does MCL treat weight of edges? #1

Closed
hw449 opened this issue Dec 20, 2017 · 7 comments
Closed

How does MCL treat weight of edges? #1

hw449 opened this issue Dec 20, 2017 · 7 comments

Comments

@hw449
Copy link

hw449 commented Dec 20, 2017

Thank you for creating such a wonder tool for our community!

I have a file containing in each line edge information (for example: nodeX, nodeY, weight). The higher the weight, the shorter the distance between nodeX and nodeY.
I converted this file into a matrix using networkx (add_weighted_edges_from and to_scipy_sparse_matrix function). I am wondering how does MCL treat the weights of edges?

Thank you very much.

Regards,
Hai

@GuyAllard
Copy link
Owner

GuyAllard commented Jan 8, 2018

Hi Hai,

The MCL algorithm treats the edge weights as similarities.
The absolute magnitude of the edge weights is not important, but the relative magnitude is.
If the graph has an edge (a, b) with a weight 2, and another edge (c, d) with weight 20, then c and d are considered 10x more similar than (a,b).

Your graph, where a higher edge weight indicates a shorter distance should work.

Hope that helps

Guy

@Mataivic
Copy link

Hello, How about a network with negative and positive weights ? My network models ecological interactions between species, but these interactions can be conflicting (negative weights) or benefical (positive weight).

@Moonire
Copy link
Contributor

Moonire commented Oct 19, 2018

The idea behind MCL is to simulate a random walk ad infinitum and than looking where the random walker starting as some node 'i' will tend to go more often. This way groups of strongly connected nodes tend to be in the same cluster after you run MCL.
The weights on the edges can then be thought of (once normalized) as transition probabilities from node 'i' to node 'j', and probabilities can't be negative. So right of the bat you break some basic assumption the algorithm makes by allowing negative values.

If you input an adjacency matrix with negative values to the algorithm, the normalization step will make it so that the sum of the absolute values of any column will add up to one, and not the actual values, which again is breaking a very core assumption the algorithm makes.

You can give it a try but the results would be either nonsensical (which is my guess) or need a brand new interpretation under the new assumptions you've made (and that's if you rework the code to accommodate pruning negative values and stuff like that).

I'd go for the easiest work around and softmax the values of your matrix. You juste have to exponentiate the weights and the algorithm will normalize them for you so less work! That way the negative weights will correspond to very small transition probabilities and MCL would be perfectly usable.

@Mataivic
Copy link

@Moonire OK so I have to deal with that.

  • My edges values are [-1;1] ; what if I add +1 to each of them in order to have [0;2] ?
  • Or I could work on a subgraph containing only positive edges ; in my case, it actually doesn't cause a lot of trouble for future interpretations of the clusters.

@Moonire
Copy link
Contributor

Moonire commented Oct 19, 2018

@Mataivic Simply adding 1 to all weights would be a bad idea (it deletes a -1 weighted edge from the graph).
But if you can work in the positif subset only, then I guess it would solve the problem.

Although I'd like to say having negative weights in the 1st place is an indicator that MCL isn't really what you'er looking for. This got a little far from the topic of this issue so if you want to discuss it in further depth I invite you to email me directly.

@Mataivic
Copy link

@Moonire Yes, I had some doubts so I already started to do a review of various clustering algorithms to find the most suited to my case. I'll keep you in touch if you want.

@Moonire
Copy link
Contributor

Moonire commented Oct 19, 2018

Yeah absolutely ! I'd love to know what solution you find and discuss the problem in further detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants