Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about 'initial_partition' property. #317

Closed
xiangyh9988 opened this issue Aug 17, 2022 · 18 comments
Closed

A question about 'initial_partition' property. #317

xiangyh9988 opened this issue Aug 17, 2022 · 18 comments

Comments

@xiangyh9988
Copy link

xiangyh9988 commented Aug 17, 2022

Hi, I'm a little confused about the effect of initial_partition of infomap.

I set the initial_partition for partial nodes folloing the example does in this link and it indeed improves the clustering performance.

There is a note in that link, i.e. 'The initial partition is saved between runs. If you want to use an initial partition for one run only, use run(initial_partition=partition)'. In my understanding, if I set initial partition with im.initial_partition = {1: 0, 2:0}, the final module ID of node1 and node2 will be kept same, right?

However, in my experiment, I found some module ID of nodes, which belongs to the same initial partition, are not same in the final results. For example, providing the initial partitions 1: 0, 2: 0, 3: 0, 4: 0, 5: 1, 6: 1, 7: 1 to im.initial_partition, the final module ID of 1, 2, 3 is 123, while the one of 4, 5 is 78 (different from 123). Of course, the frequency of such changing is not high.

Did I take a wrong way to provide the initial partitions, which results in the inconsistency? Or, maybe the initial_partition property cannot guarantee that the partition does not change?

@danieledler
Copy link
Contributor

Hi, the last interpretation is correct. Providing an initial_partition only sets the initial partition as a starting point for the optimization. If you want it as a final partition you can skip the optimization part with no_infomap=True.

It seems though as if you want something in between: Nodes assigned a module will not be moved out from that module, but all other nodes are free to move. Is that right?

@xiangyh9988
Copy link
Author

If you want it as a final partition you can skip the optimization part with no_infomap=True.

Oh, I misunderstood the initial_partition. I thought just setting this property can guarantee the module of assigned nodes will be kept unchangable.

It seems though as if you want something in between: Nodes assigned a module will not be moved out from that module, but all other nodes are free to move. Is that right?

That's correct! In my view, it's like a semi-supervised paradigm. That's to say, the assigned nodes always belong to the initial modules (as the prior ground truth label), and other nodes will be moved to the optimal modules after optimization, leading to the good result of community detection (or clustering).

So, if I want such effects, I need to modify some source codes to make the modules of assigned nodes remain unchanged, right?

danieledler added a commit that referenced this issue Aug 17, 2022
- Use --free-initial-partition to not move nodes with assigned modules
- Modules are freezed if they contain freezed nodes
- Nodes (and modules) with no module assignments are free to move

See #317
@danieledler
Copy link
Contributor

Yes, I think it seems like a useful feature though, so I did a quick implementation.

We had a similar concept before that we called "hard partition" which was implemented but lacked api to use it. I added that, --hard-partition, but what we do with that option is rebuilding the network to merge all nodes with same module ids before any optimization and unpack those hard modules in the end. This doesn't work in the case where you just want to lock two nodes in different modules as it can only merge nodes with same module, so I did another option to support that: --freeze-initial-partition. What it does is simply skipping trying to move those nodes during optimization.

One problem with this is that we rebuild the network from the modules and try to move those to be able to move bigger chunks of nodes, so to keep modules assigned with different modules in separate modules, all modules with at least one frozen node are also frozen.

I haven't had time to really try it, so please checkout the linked branch and see if any of these options can help you.

@xiangyh9988
Copy link
Author

Sorry for the late reply.

I have tried to set freeze-initial-partition but still get the same results. At first, I follow #317 to modify the codes I cloned before but it doesn't work. Then I clone #317 branch and re-compile it while it still fails to do something different. Both two ways didn't work. That's to say, with freeze-initial-partition or without freeze-initial-partition leads to the same clustering results in my test. It seems it fails to make the modules of initialized nodes kept unchanged.

Also, I found there is no log information which should be print by Log() << "\n -> Freezed " << numFreezed << " nodes in assigned modules. "; in InfomapBase::initTree function of InfomapBase.cpp. freezeInitialPartition is undoubtedly 1, so numFreezed should be greater than 0. However, Log() in if (numFreezed>0) doesn't worked. Maybe the program doesn't step in initTree?

@danieledler
Copy link
Contributor

Do you run the cli version or the Python version? Can you show the stdout log you get from running?

@xiangyh9988
Copy link
Author

Sorry, I'm not sure what cli version means. What I did is to clone the repo and run make python to get infomap.py and _infomap.cpython-37m-x86_64-linux-gnu.so, which are used for python programs.

After checking the log, I found the reason why it doesn't make difference. That's all my bad. I forgot to add the argument to initialize modules for partial nodes in my test. After adding such argument, I test the performance again with freeze-intial-partition or without freeze-initial-partition. With freeze-initial-partition, the clustering F-score is increased by approximate 1.67%. However, I found there are still some module IDs of nodes assigned with initial modules being changed.

Here is the correct log and the important information has been printed by Log() about freeze-initial-partition.

=======================================================
  Infomap v2.6.0 starts at 2022-08-22 07:08:25
  -> Input network: 
  -> No file output!
  -> Configuration: freeze-initial-partition
                    meta-data-rate = 0
                    two-level
                    flow-model = undirected
                    seed = 100
  -> 18004 initial module ids provided
=======================================================
  OpenMP 201307 detected with 48 threads...
  -> Ordinary network input, using the Map Equation for first order network flows
Calculating global network flow using flow model 'undirected'... 
  -> Using undirected links.
  => Sum node flow: 1, sum link flow: 1
Build internal network with 89367 nodes and 1018592 links...
  -> Max node flow: 3.05e-05
  -> Max node degree: 71
  -> Max node entropy: 6.149043813
  -> Entropy rate: 4.617265201
  -> One-level codelength: 16.3300108

================================================
Trial 1/1 starting at 2022-08-22 07:08:26
================================================

 -> 71542 nodes not found in cluster file are put into separate modules. 
 -> Freezed 18004 nodes in assigned modules. Two-level compression: 67% 0.36% 0.0129837312% 
Partitioned to codelength 0.66545562 + 4.62856275 = 5.294018375 in 4994 (4958 non-trivial) modules.

=> Trial 1/1 finished in 2.11384582s with codelength 5.29401837


================================================
Summary after 1 trial
================================================
Best end modular solution in 2 levels:
Per level number of modules:         [       4994,           0] (sum: 4994)
Per level number of leaf nodes:      [          0,       89367] (sum: 89367)
Per level average child degree:      [       4994,     17.8949] (average: 281.252)
Per level codelength for modules:    [0.665455620, 0.000000000] (sum: 0.665455620)
Per level codelength for leaf nodes: [0.000000000, 4.628562755] (sum: 4.628562755)
Per level codelength total:          [0.665455620, 4.628562755] (sum: 5.294018375)

===================================================
  Infomap ends at 2022-08-22 07:08:28
  (Elapsed time: 2.84915983s)
===================================================

For now, in my test, the freeze-initial-partition benefits improving the performance, while some modules are changed when I compare labels. Of course, this branch of freeze-initial-partition is helpful! I think I should check my codes further.

@danieledler
Copy link
Contributor

Thanks, the cli version is just the standalone binary you get when running make, which you run on the command line like ./infomap input.net output/ --flags.

Try limit the number of times the core loops are reapplied on existing modular network to search bigger structures with -L 1, then I think the final module ids will be the initial ones for the frozen nodes (if you start from 0 on module ids). When we allow larger L (default), module ids can not be guaranteed to be the same as there may be less nodes to move on that level than the module id.

@xiangyh9988
Copy link
Author

I see. Thank you!. I will try your suggestion.

@xiangyh9988
Copy link
Author

Try limit the number of times the core loops are reapplied on existing modular network to search bigger structures with -L 1

I test that wheter to add -L 1 makes no difference. For example, in my test, node 1 and node 2 are initialized with the same ground truth label 11784 and frozen, but the final module IDs of them are 894 and 3556 respectively no matter whether to use -L 1. If the freeze works, the module IDs of node1 and node2 should be the same one, i.e. 894 or 3556, right?

@danieledler
Copy link
Contributor

If you read the cluster data from a .tree file the module ids are supposed to match. If you read them from a .clu file they are reindexed to make sure a contiguous set.

Do you use a .clu file? If so, are there gaps in the module ids?

@xiangyh9988
Copy link
Author

xiangyh9988 commented Aug 22, 2022

I didn't load cluster data from either .tree or .clu file. Instead, I prepare the data as follows:

  1. Load GT labels of all nodes and indices of labeled nodes.
  2. Set GT labels of labeled nodes (selected by indices) as their initial partition by setting im.initial_partition={...}.

Maybe I didn't express the phenomenon clearly. I thinks it doesn't matter whether the module IDs are contiguous. The question is that, after optimization, frozen node1 and node2 that should have the same module ID are assigned with different IDs with each other, spliting them to different modules.

@danieledler
Copy link
Contributor

Aha, I missed the Python case where you can set the initial partition programmatically. In the code that should correspond to the clu case where the module ids are reassigned. But if two nodes belong to the same module in the initial partition they should keep doing so irrespective of recalculated module ids, so they should end up in the same module in the end.

Apparently there are something that doesn't work as it should. Can you replicate the issue with a minimal input network that you can share?

@xiangyh9988
Copy link
Author

Can you replicate the issue with a minimal input network that you can share?

You mean I share you with some data of the input network and you can help me test on that, right?

I save the links and weights in .npy files and save the dict of initial_partition in a .pkl file. How can I send the file to you? Could you provide an e-mail? I'll send you a .tar file.

Thank you!

@danieledler
Copy link
Contributor

Ok, you can mail them to daniel.edler@umu.se, I will be back later today or tomorrow.

@danieledler
Copy link
Contributor

Hi, I found the issue now and pushed a solution. Before I also found a few merges but now I don't get any. Does the --freeze-initial-partition feature solve your problem?

@xiangyh9988
Copy link
Author

Hi, freeze-initial-partition now works well in my project. The nodes with initialized modules are all forzen and not moved to any other modules.

I found that, when some nodes have no links, the --freeze-initial-partition will fail to freeze nodes. For example, when I set links: (0, 1), (0, 3), (3, 4), (3, 5) and set initiap_partition with {0: 0, 1: 0, 2:0} , the node 2 has no any links, leading the nodes added in infomap to be omitted. If I add links containing node 2, such as (0, 2), the freeze can indeed work. I think maybe the lack of explicit insertion of node 2 makes some indexing operation invalid? I'm not sure, just a guess.

So, in my project, I eventually manually added nodes that are not shown in links (e.g., info.add_links(2)) and then freeze indeed works.

@danieledler
Copy link
Contributor

Good. You can have non-contiguous node ids without any indexing problem in Infomap, but otherwise use im.add_node(node_id) to add nodes without any links.

@xiangyh9988
Copy link
Author

For now, im.add_node(node_id) indeed helps me work well with --freeze-initial-partition, and I have no questions.

Thanks for your kind help and great work! I'm going to close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants