A question about 'initial_partition' property. #317

xiangyh9988 · 2022-08-17T11:24:03Z

Hi, I'm a little confused about the effect of initial_partition of infomap.

I set the initial_partition for partial nodes folloing the example does in this link and it indeed improves the clustering performance.

There is a note in that link, i.e. 'The initial partition is saved between runs. If you want to use an initial partition for one run only, use run(initial_partition=partition)'. In my understanding, if I set initial partition with im.initial_partition = {1: 0, 2:0}, the final module ID of node1 and node2 will be kept same, right?

However, in my experiment, I found some module ID of nodes, which belongs to the same initial partition, are not same in the final results. For example, providing the initial partitions 1: 0, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 1 to im.initial_partition, the final module ID of 1, 2, 3 is 123, while the one of 4, 5 is 78 (different from 123). Of course, the frequency of such changing is not high.

Did I take a wrong way to provide the initial partitions, which results in the inconsistency? Or, maybe the initial_partition property cannot guarantee that the partition does not change?

The text was updated successfully, but these errors were encountered:

danieledler · 2022-08-17T12:16:18Z

Hi, the last interpretation is correct. Providing an initial_partition only sets the initial partition as a starting point for the optimization. If you want it as a final partition you can skip the optimization part with no_infomap=True.

It seems though as if you want something in between: Nodes assigned a module will not be moved out from that module, but all other nodes are free to move. Is that right?

xiangyh9988 · 2022-08-17T12:56:33Z

If you want it as a final partition you can skip the optimization part with no_infomap=True.

Oh, I misunderstood the initial_partition. I thought just setting this property can guarantee the module of assigned nodes will be kept unchangable.

It seems though as if you want something in between: Nodes assigned a module will not be moved out from that module, but all other nodes are free to move. Is that right?

That's correct! In my view, it's like a semi-supervised paradigm. That's to say, the assigned nodes always belong to the initial modules (as the prior ground truth label), and other nodes will be moved to the optimal modules after optimization, leading to the good result of community detection (or clustering).

So, if I want such effects, I need to modify some source codes to make the modules of assigned nodes remain unchanged, right?

- Use --free-initial-partition to not move nodes with assigned modules - Modules are freezed if they contain freezed nodes - Nodes (and modules) with no module assignments are free to move See #317

danieledler · 2022-08-17T14:17:20Z

Yes, I think it seems like a useful feature though, so I did a quick implementation.

We had a similar concept before that we called "hard partition" which was implemented but lacked api to use it. I added that, --hard-partition, but what we do with that option is rebuilding the network to merge all nodes with same module ids before any optimization and unpack those hard modules in the end. This doesn't work in the case where you just want to lock two nodes in different modules as it can only merge nodes with same module, so I did another option to support that: --freeze-initial-partition. What it does is simply skipping trying to move those nodes during optimization.

One problem with this is that we rebuild the network from the modules and try to move those to be able to move bigger chunks of nodes, so to keep modules assigned with different modules in separate modules, all modules with at least one frozen node are also frozen.

I haven't had time to really try it, so please checkout the linked branch and see if any of these options can help you.

xiangyh9988 · 2022-08-22T06:25:55Z

Sorry for the late reply.

I have tried to set freeze-initial-partition but still get the same results. At first, I follow #317 to modify the codes I cloned before but it doesn't work. Then I clone #317 branch and re-compile it while it still fails to do something different. Both two ways didn't work. That's to say, with freeze-initial-partition or without freeze-initial-partition leads to the same clustering results in my test. It seems it fails to make the modules of initialized nodes kept unchanged.

Also, I found there is no log information which should be print by Log() << "\n -> Freezed " << numFreezed << " nodes in assigned modules. "; in InfomapBase::initTree function of InfomapBase.cpp. freezeInitialPartition is undoubtedly 1, so numFreezed should be greater than 0. However, Log() in if (numFreezed>0) doesn't worked. Maybe the program doesn't step in initTree?

danieledler · 2022-08-22T07:02:07Z

Do you run the cli version or the Python version? Can you show the stdout log you get from running?

xiangyh9988 · 2022-08-22T07:21:55Z

Sorry, I'm not sure what cli version means. What I did is to clone the repo and run make python to get infomap.py and _infomap.cpython-37m-x86_64-linux-gnu.so, which are used for python programs.

After checking the log, I found the reason why it doesn't make difference. That's all my bad. I forgot to add the argument to initialize modules for partial nodes in my test. After adding such argument, I test the performance again with freeze-intial-partition or without freeze-initial-partition. With freeze-initial-partition, the clustering F-score is increased by approximate 1.67%. However, I found there are still some module IDs of nodes assigned with initial modules being changed.

Here is the correct log and the important information has been printed by Log() about freeze-initial-partition.

=======================================================
  Infomap v2.6.0 starts at 2022-08-22 07:08:25
  -> Input network: 
  -> No file output!
  -> Configuration: freeze-initial-partition
                    meta-data-rate = 0
                    two-level
                    flow-model = undirected
                    seed = 100
  -> 18004 initial module ids provided
=======================================================
  OpenMP 201307 detected with 48 threads...
  -> Ordinary network input, using the Map Equation for first order network flows
Calculating global network flow using flow model 'undirected'... 
  -> Using undirected links.
  => Sum node flow: 1, sum link flow: 1
Build internal network with 89367 nodes and 1018592 links...
  -> Max node flow: 3.05e-05
  -> Max node degree: 71
  -> Max node entropy: 6.149043813
  -> Entropy rate: 4.617265201
  -> One-level codelength: 16.3300108

================================================
Trial 1/1 starting at 2022-08-22 07:08:26
================================================

 -> 71542 nodes not found in cluster file are put into separate modules. 
 -> Freezed 18004 nodes in assigned modules. Two-level compression: 67% 0.36% 0.0129837312% 
Partitioned to codelength 0.66545562 + 4.62856275 = 5.294018375 in 4994 (4958 non-trivial) modules.

=> Trial 1/1 finished in 2.11384582s with codelength 5.29401837


================================================
Summary after 1 trial
================================================
Best end modular solution in 2 levels:
Per level number of modules:         [       4994,           0] (sum: 4994)
Per level number of leaf nodes:      [          0,       89367] (sum: 89367)
Per level average child degree:      [       4994,     17.8949] (average: 281.252)
Per level codelength for modules:    [0.665455620, 0.000000000] (sum: 0.665455620)
Per level codelength for leaf nodes: [0.000000000, 4.628562755] (sum: 4.628562755)
Per level codelength total:          [0.665455620, 4.628562755] (sum: 5.294018375)

===================================================
  Infomap ends at 2022-08-22 07:08:28
  (Elapsed time: 2.84915983s)
===================================================

For now, in my test, the freeze-initial-partition benefits improving the performance, while some modules are changed when I compare labels. Of course, this branch of freeze-initial-partition is helpful! I think I should check my codes further.

danieledler · 2022-08-22T07:31:24Z

Thanks, the cli version is just the standalone binary you get when running make, which you run on the command line like ./infomap input.net output/ --flags.

Try limit the number of times the core loops are reapplied on existing modular network to search bigger structures with -L 1, then I think the final module ids will be the initial ones for the frozen nodes (if you start from 0 on module ids). When we allow larger L (default), module ids can not be guaranteed to be the same as there may be less nodes to move on that level than the module id.

xiangyh9988 · 2022-08-22T07:37:24Z

I see. Thank you!. I will try your suggestion.

xiangyh9988 · 2022-08-22T08:03:01Z

Try limit the number of times the core loops are reapplied on existing modular network to search bigger structures with -L 1

I test that wheter to add -L 1 makes no difference. For example, in my test, node 1 and node 2 are initialized with the same ground truth label 11784 and frozen, but the final module IDs of them are 894 and 3556 respectively no matter whether to use -L 1. If the freeze works, the module IDs of node1 and node2 should be the same one, i.e. 894 or 3556, right?

danieledler · 2022-08-22T09:36:53Z

If you read the cluster data from a .tree file the module ids are supposed to match. If you read them from a .clu file they are reindexed to make sure a contiguous set.

Do you use a .clu file? If so, are there gaps in the module ids?

xiangyh9988 · 2022-08-22T10:29:59Z

I didn't load cluster data from either .tree or .clu file. Instead, I prepare the data as follows:

Load GT labels of all nodes and indices of labeled nodes.
Set GT labels of labeled nodes (selected by indices) as their initial partition by setting im.initial_partition={...}.

Maybe I didn't express the phenomenon clearly. I thinks it doesn't matter whether the module IDs are contiguous. The question is that, after optimization, frozen node1 and node2 that should have the same module ID are assigned with different IDs with each other, spliting them to different modules.

danieledler · 2022-08-22T11:38:20Z

Aha, I missed the Python case where you can set the initial partition programmatically. In the code that should correspond to the clu case where the module ids are reassigned. But if two nodes belong to the same module in the initial partition they should keep doing so irrespective of recalculated module ids, so they should end up in the same module in the end.

Apparently there are something that doesn't work as it should. Can you replicate the issue with a minimal input network that you can share?

xiangyh9988 · 2022-08-22T12:15:57Z

Can you replicate the issue with a minimal input network that you can share?

You mean I share you with some data of the input network and you can help me test on that, right?

I save the links and weights in .npy files and save the dict of initial_partition in a .pkl file. How can I send the file to you? Could you provide an e-mail? I'll send you a .tar file.

Thank you!

danieledler · 2022-08-22T12:27:04Z

Ok, you can mail them to daniel.edler@umu.se, I will be back later today or tomorrow.

danieledler · 2022-08-24T15:25:27Z

Hi, I found the issue now and pushed a solution. Before I also found a few merges but now I don't get any. Does the --freeze-initial-partition feature solve your problem?

xiangyh9988 · 2022-08-24T15:50:03Z

Hi, freeze-initial-partition now works well in my project. The nodes with initialized modules are all forzen and not moved to any other modules.

I found that, when some nodes have no links, the --freeze-initial-partition will fail to freeze nodes. For example, when I set links: (0, 1), (0, 3), (3, 4), (3, 5) and set initiap_partition with {0: 0, 1: 0, 2:0} , the node 2 has no any links, leading the nodes added in infomap to be omitted. If I add links containing node 2, such as (0, 2), the freeze can indeed work. I think maybe the lack of explicit insertion of node 2 makes some indexing operation invalid? I'm not sure, just a guess.

So, in my project, I eventually manually added nodes that are not shown in links (e.g., info.add_links(2)) and then freeze indeed works.

danieledler · 2022-08-24T17:46:34Z

Good. You can have non-contiguous node ids without any indexing problem in Infomap, but otherwise use im.add_node(node_id) to add nodes without any links.

xiangyh9988 · 2022-08-25T02:58:39Z

For now, im.add_node(node_id) indeed helps me work well with --freeze-initial-partition, and I have no questions.

Thanks for your kind help and great work! I'm going to close the issue.

xiangyh9988 closed this as completed Aug 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about 'initial_partition' property. #317

A question about 'initial_partition' property. #317

xiangyh9988 commented Aug 17, 2022 •

edited

danieledler commented Aug 17, 2022

xiangyh9988 commented Aug 17, 2022

danieledler commented Aug 17, 2022

xiangyh9988 commented Aug 22, 2022

danieledler commented Aug 22, 2022

xiangyh9988 commented Aug 22, 2022

danieledler commented Aug 22, 2022

xiangyh9988 commented Aug 22, 2022

xiangyh9988 commented Aug 22, 2022

danieledler commented Aug 22, 2022

xiangyh9988 commented Aug 22, 2022 •

edited

danieledler commented Aug 22, 2022

xiangyh9988 commented Aug 22, 2022

danieledler commented Aug 22, 2022

danieledler commented Aug 24, 2022

xiangyh9988 commented Aug 24, 2022

danieledler commented Aug 24, 2022

xiangyh9988 commented Aug 25, 2022

A question about 'initial_partition' property. #317

A question about 'initial_partition' property. #317

Comments

xiangyh9988 commented Aug 17, 2022 • edited

danieledler commented Aug 17, 2022

xiangyh9988 commented Aug 17, 2022

danieledler commented Aug 17, 2022

xiangyh9988 commented Aug 22, 2022

danieledler commented Aug 22, 2022

xiangyh9988 commented Aug 22, 2022

danieledler commented Aug 22, 2022

xiangyh9988 commented Aug 22, 2022

xiangyh9988 commented Aug 22, 2022

danieledler commented Aug 22, 2022

xiangyh9988 commented Aug 22, 2022 • edited

danieledler commented Aug 22, 2022

xiangyh9988 commented Aug 22, 2022

danieledler commented Aug 22, 2022

danieledler commented Aug 24, 2022

xiangyh9988 commented Aug 24, 2022

danieledler commented Aug 24, 2022

xiangyh9988 commented Aug 25, 2022

xiangyh9988 commented Aug 17, 2022 •

edited

xiangyh9988 commented Aug 22, 2022 •

edited