Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved performance for group isomorphism/automorphisms #896

Merged
merged 35 commits into from
Nov 23, 2016

Conversation

hulpke
Copy link
Contributor

@hulpke hulpke commented Sep 5, 2016

Ongoing improvements for automorphism group/group isomorphism:

Use PatheticIsomorphism also for pc groups.

Permit specifying characteristic subgroups in pc group automorphism case. Try to do so without computing long orbits.

Improvements to permutation representation of automorphism group.


Added by @alex-konovalov: this PR is described in the commit message in 9e38dd1:

Further enhancements to group automorphisms

This PR contains a number of improvements and additions for testing of group isomorphism, respectively calculation of automorphism groups. It has been tested over several months and is merged to avoid too much divergence between master and development branches.

The main effect of this change should be a significantly improved performance for group isomorphism/automorphisms.

The main changes are:

  • The automorphism based isomorphism test is also used for solvable groups, unless they can be generated by few elements.
  • To this end, the automorphism group calculation allows the (internal) specification of characteristic (or to-be-stabilized) subgroups.
  • When calculating subspace stabilizers in GL, write down a generating set instead of performing an orbit/stabilizer calculation.
  • Improvements of heuristics in the code which represents factor groups as permutation groups and determines smaller degree permutation representations, to avoid falling into certain long-runtime traps. Code may utilize maximal subgroups.
  • A dedicated test file

@codecov-io
Copy link

codecov-io commented Sep 5, 2016

Current coverage is 48.88% (diff: 47.29%)

Merging #896 into master will increase coverage by 0.11%

@@             master       #896   diff @@
==========================================
  Files           424        424          
  Lines        222125     222745   +620   
  Methods        3430       3430          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits         108344     108899   +555   
- Misses       113781     113846    +65   
  Partials          0          0          

Powered by Codecov. Last update 2908745...1c9c9fa

@hulpke hulpke force-pushed the additions branch 4 times, most recently from 064dba3 to ce3ba7b Compare September 11, 2016 19:12
@hulpke hulpke force-pushed the additions branch 2 times, most recently from 7cbe7ce to 0bfda64 Compare September 22, 2016 23:24
@hulpke hulpke mentioned this pull request Sep 22, 2016
@olexandr-konovalov
Copy link
Member

Thank you @hulpke! Could you please write a few words what this PR does ? Even if it's not needed to add an entry for the changes overview, some text which will help others to review the PR will be very useful.

@fingolfin
Copy link
Member

This is work in progress, I think, and as such does not (yet) need to be reviewed. Ultimately, it is what @hulpke wrote in the initial descritption.

# space is fixed
return G;
fi;
bas:=OnSubspacesByCanonicalBasis(bas,One(G)); # canonical form
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe replace this by the equivalent, but clearer (and slightly faster) TriangulizeMat(bas);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I use OnSubspacesByCanonicalBasis is because we don't promise that the canonical form is exactly the result of TriangulizeMat (even though it currently is). As an action by OnSubspaces... follows using it this way avoids issues, even if the definition of one of the two operations ever changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a matter of fact, we do promise that. To quote the documentation of OnSubspacesByCanonicalBasis: "The function returns a mutable matrix which gives the basis of the image of the subspace in Hermite normal form. (In other words: it triangulizes the product of bas with mat.)"


one:=IdentityMat(n,field);
sub:=[]; # space so far
spaces:=Unique(List(spaces,x->OnSubspacesByCanonicalBasis(x,one)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace OnSubspacesByCanonicalBasis(x,one) by TriangulizedMat(x)

Add(spaces,List(one,ShallowCopy));
fi;

osporb:=List(osporb,x->List(x,x->OnSubspacesByCanonicalBasis(x,one)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace OnSubspacesByCanonicalBasis(x,one) by TriangulizedMat(x).

s:=SumIntersectionMat(spaces[i],spaces[j]);
for k in s do
if Length(k)>0 and not ForAny(spaces,x->Length(x)=Length(k) and
RankMat(Concatenation(x,k))=Length(x)) then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to insert TriangulizeMat(k); before the if, and then replace the RankMat check by ... and x = k. The speed should the same or sightly better (as the matrix is already close to HNF at this point, and only half as big as the concatenated matrix; and anyway, we don't have to create that new matrix). In addition, later on, when we handle new, we have to triangulize its members anyway!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course we should only triangulize if k is not empty. So full code might look like this (and since:

if Length(k) = 0 then continue; fi;
TriangulizeMat(k);
if not ForAny(spaces,x->Length(x)=Length(k) and x = k) then

and perhaps the if could even be simplified to if not x in k, though I am not sure how this would affect performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly, I should mention that I run into a case where spaces contains at least 48000 elements, so this part of the code is a major bottleneck... Unfortunately, its complexity is at least O(N^3), where N is the number of spaces. We could reduce this quite a lot by rewriting this code.

For starters, instead of keeping the single list spaces, organize it in two levels: The first level is index by the dimension of the subspace, the second contains a set of spaces of the corresponding dimension.

This way, the second levels are sets, and GAP can peform a binary search. So the check would become

if not k in spaces[Length(k)] then

od;
od;
if Length(new)>0 then
new:=List(new,x->OnSubspacesByCanonicalBasis(x,one)); # canonize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace OnSubspacesByCanonicalBasis(x,one) by TriangulizedMat(x). Or, since we replace new anyway, use a loop which uses TriangulizeMat (no 'd', i.e. makes no copy). Or even better, make the modifications to the loop creating new that I suggested above :-)


spaceincl:=function(big,small)
return RankMat(Concatenation(big,small))=Length(big);
end;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If big and small have equal length, then we don't need to invoke RankMat. This can avoid quite some computational overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course this becomes irrelevant if spaceincl is removed, as I suggest below :-)

SortBy(spaces,Length);
fi;
until Length(new)=0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restarting like this can throw away a lot of perfectly fine information. In a bad case I have, we start out with about 4000 spaces, which is bad enough; but it then grows to NNN spaces, and starts over.

Why not instead have a workqueue: Initially, all spaces we in this workqueue, and spaces is empty. Instead of iterating over spaces, you iterate over the queue, and gradually add its entries to spaces as well as their sum+intersection with all elements already in spaces at that point. This way, you don't have to restart the loop.

The queue should perhaps also be grouped by dimensions (just as I suggested for spaces, so that we can deal with 1-dimensional spaces first, if we want to, as those are easier to handle

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I just realized one issue with that: when adding a new space of dimension d to spaces, you'd have to either re-sort spaces[d] (and that would invalidate incl, unless you went through the trouble of applying the sort permutation there...). Or you don't sort it, but then you loose the fast check as to whether a space is already in there... Hrm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the NNN in my first comment should be 40000; though this is not the end, before I aborted the computation, new already contained an additional 8000 unique spaces)

else
# check for meet/join closed
s:=SumIntersectionMat(spaces[i],spaces[j]);
for k in s do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both SumIntersectionMat and RankMat (whcih is what spaceincl calls) end up using SemiEchelonMatDestructive. So we compute it twice in the worst case! Let's not do that :-). The easiest (albeit not most efficient, though hopefully it won't mattter) way woud be to always compute SumIntersectionMat, and then check if s[2] = spaces[i].

@fingolfin
Copy link
Member

Now I get this warning:

Syntax warning: Unbound global variable in /Users/mhorn/Projekte/GAP/gap.github/lib/autsr.gi:324
  ma:=MaximalSubgroupClassesSol(G);

@hulpke
Copy link
Contributor Author

hulpke commented Nov 12, 2016

@fingolfin: Harmless warning which is fixed now in 7eae649

@fingolfin
Copy link
Member

Another error I run into (should be easy to fix):

Error, Append: <list1> must be a mutable list in
  Append( jorb, Orbit( AQP, j[1], C[2], C[1], asAutom ) ); at /Users/mhorn/Projekte/GAP/gap.github/lib/autsr.gi:735 called from
AutomGrpSR( G ) at /Users/mhorn/Projekte/GAP/gap.github/lib/morpheus.gi:1950 called from
AutomorphismGroup( d ) at /Users/mhorn/Projekte/GAP/gap.github/lib/autsr.gi:897 called from

Ensure generating sets remain consistent.
Fixed variable name in copied code bit

Conflicts:
	lib/autsr.gi
Utilize a given `autactbase' to help with pc group automorphisms.
… numbers.

Instead of abelian invariants use factors of index of G'.
Also ensure that automorphism group is automorphismgroup and fix comment.

Two small fixes in ordering if there is trivial orbit on lines
Allow degree up to 100000 (computers are bigger now) before hard
Use maximal subgroups in trying perm rep.

(Do not use maxsub in factor if called from maxsub)

FIX: another recursion avoidance.
There are two different situations in which PcSeries is called. The first is
a good pcgs and we happen to want the series -- this is the existing code.

The second case is that the series is needed to do anything with the pcgs.
In this case inducing caused bad recursions. Avoid this.
In some situations, it seems that ClosurePermGroup can iterate very long
time with evrification failing repeatedly. In this situation we now simply
force a new stabilizer chain computation from scratch, than trying to bend
the (apparently confused) chain in the right way.
This is not a beautiful fix but a workaround, albeit one that should have
litlle cost implications.
…roup.

Relax threshold for new permrep force.

FIX: Change reading order to avoid forward declaration

Also ensure a maximal subgroups computation for smaller permutation degree
stays out of expensive operations.
@hulpke hulpke merged commit 9e38dd1 into gap-system:master Nov 23, 2016
@olexandr-konovalov olexandr-konovalov added this to the GAP 4.9.0 milestone Nov 23, 2016
@olexandr-konovalov
Copy link
Member

@hulpke thanks - testing is currently underway (seems there will be diffs - repot will follow). Is there any text that you're suggesting for the overview of changes in GAP 4.9?

@hulpke
Copy link
Contributor Author

hulpke commented Nov 23, 2016

@alex-konovalov
The message of the merge
9e38dd1
contains test for exactly this purpose.

@olexandr-konovalov
Copy link
Member

@hulpke thanks - I've copied it to the 1st comment above, to make it easily seen.

@olexandr-konovalov
Copy link
Member

@hulpke first, if you run make testinstall, the test without packages (when GAP is started with -A option) has 8 info messages:

#I  Alternating recognition needed!
  • if they are needed, I suggest to move them to a higher info level to avoid being displayed in the test.

@olexandr-konovalov
Copy link
Member

@hulpke second, the manual examples test with default packages has diffs below. They look ok to me (I have checked that groups in RelatorsOfFpGroup and ConstituentsCompositionMapping are isomorphic) so I will update examples to the new output.

########> Diff in [ "./group.xml", 831, 838 ]
# Input is:
blocks[1];
# Expected output:
[ (1,2)(3,4)(5,6)(7,8), (1,3)(2,4)(5,8)(6,7), (1,4)(2,3)(5,7)(6,8), 
  (1,5)(2,6)(3,8)(4,7), (1,6)(2,5)(3,7)(4,8), (1,7)(2,8)(3,6)(4,5), 
  (1,8)(2,7)(3,5)(4,6) ]
# But found:
[ (1,2)(3,4)(5,6)(7,8), (1,3)(2,4)(5,7)(6,8), (1,4)(2,3)(5,8)(6,7), 
  (1,5)(2,6)(3,7)(4,8), (1,6)(2,5)(3,8)(4,7), (1,7)(2,8)(3,5)(4,6), 
  (1,8)(2,7)(3,6)(4,5) ]
########
########> Diff in [ "./group.xml", 1171, 1179 ]
# Input is:
IsomorphismGroups( niceaut, SymmetricGroup( 4 ) );
# Expected output:
[ (1,4,2,3), (1,5,4)(2,6,3), (1,2)(3,4), (3,4)(5,6) ] -> 
[ (1,4,3,2), (1,4,2), (1,3)(2,4), (1,4)(2,3) ]
# But found:
[ (1,4,2,3), (1,5,4)(2,6,3), (1,2)(3,4), (3,4)(5,6) ] -> 
[ (1,2,4,3), (2,4,3), (1,4)(2,3), (1,2)(3,4) ]
########

and

########> Diff in [ "./../../lib/grp.gd", 3249, 3253 ]
# Input is:
s:=SubnormalSeries(g,Group((1,2)(3,4)));
# Expected output:
[ Group([ (1,2,3,4), (1,2) ]), Group([ (1,2)(3,4), (1,4)(2,3) ]), 
  Group([ (1,2)(3,4) ]) ]
# But found:
[ Group([ (1,2,3,4), (1,2) ]), Group([ (1,2)(3,4), (1,3)(2,4) ]), 
  Group([ (1,2)(3,4) ]) ]
########
########> Diff in [ "./grpfp.xml", 266, 279 ]
# Input is:
h := IsomorphismPermGroup( g );
# Expected output:
[ a, b ] -> [ (1,2)(3,5), (2,3,4) ]
# But found:
[ a, b ] -> [ (1,2)(4,5), (2,3,4) ]
########
########> Diff in [ "./../../lib/grp.gd", 4151, 4160 ]
# Input is:
iso := IsomorphismFpGroupByGenerators( g, [ (1,2), (1,2,3,4,5) ] );
# Expected output:
#I  the image group has 2 gens and 5 rels of total length 39
[ (1,2), (1,2,3,4,5) ] -> [ F1, F2 ]
# But found:
#I  the image group has 2 gens and 4 rels of total length 50
[ (1,2), (1,2,3,4,5) ] -> [ F1, F2 ]
########
########> Diff in [ "./../../lib/grp.gd", 4151, 4160 ]
# Input is:
RelatorsOfFpGroup( fp );
# Expected output:
[ F1^2, F2^5, (F2^-1*F1)^4, (F1*F2^-1*F1*F2)^3, (F1*F2^2*F1*F2^-2)^2 ]
# But found:
[ F1^2, (F2*F1*F2^-2*F1)^3, 
  F2*F1*F2^-1*(F1*F2)^2*F2^2*(F1*F2^-1)^2*F2^-1*F1, 
  (F2*F1*F2^-1*F1)^2*F2^-1*F1*F2^2*F1*F2^-2*F1*F2*F1 ]
########
########> Diff in [ "./../../lib/grp.gd", 4180, 4189 ]
# Input is:
iso := IsomorphismFpGroupByGenerators( M12, gens );;
# Expected output:
#I  the image group has 3 gens and 20 rels of total length 464
# But found:
#I  the image group has 3 gens and 21 rels of total length 459
########
########> Diff in [ "./../../lib/grp.gd", 4180, 4189 ]
# Input is:
iso := IsomorphismFpGroupByGenerators( M12, gens );;
# Expected output:
#I  the image group has 3 gens and 19 rels of total length 491
# But found:
#I  the image group has 3 gens and 18 rels of total length 337
########
########> Diff in [ "./../../lib/grp.gd", 4231, 4238 ]
# Input is:
iso := IsomorphismFpGroupByGenerators( M12, gens : 
                                          method := "fast" );;
# Expected output:
#I  the image group has 3 gens and 179 rels of total length 4099
# But found:
#I  the image group has 3 gens and 153 rels of total length 3374
########
########> Diff in [ "./../../lib/grp.gd", 4250, 4280 ]
# Input is:
iso := IsomorphismFpGroupByGenerators( G, gens );;
# Expected output:
#I  the image group has 2 gens and 9 rels of total length 94
# But found:
#I  the image group has 2 gens and 10 rels of total length 122
########
########> Diff in [ "./../../lib/grp.gd", 4250, 4280 ]
# Input is:
ConstituentsCompositionMapping(iso);
# Expected output:
[ <action isomorphism>, 
[ (2,3,5,9,16,29)(4,7,13,24,19,32)(6,11,20,34,40,57)(8,15,28,46,42,
    59)(10,18,25,41,49,67)(12,22,37,53,48,66)(14,26,31)(17,30,35,
    50,58,38)(21,36,33)(23,39,56)(27,44,61,72,43,60)(45,62,51,68,
    54,70)(47,64,73)(52,69)(55,71,75,78,77,76)(65,74), 
  (1,2,4,8)(3,6,12,23)(5,10,19,33)(7,14,27,45)(9,17,18,31)(11,21,
    16,28)(13,25,42,57)(20,35,51,67)(22,38,55,70)(24,40,26,43)(29,
    37,54,39)(30,47,65,68)(32,48)(34,49,36,52)(41,58,56,61)(44,50,
    53,64)(46,63,69,59)(60,66,75,79)(62,73,72,77)(71,76,80,74) 
 ] -> [ F1, F2 ] ]
# But found:
[ <action isomorphism>, 
  [ (2,3,4)(5,6)(8,9,10), (1,2,3,5)(6,7,8,9) ] -> [ F1, F2 ] ]
########
########> Diff in [ "./../../lib/ctbl.gd", 4363, 4389 ]
# Input is:
kernel:= KernelOfCharacter( irr[3] );
# Expected output:
Group([ (1,2)(3,4), (1,3)(2,4) ])
# But found:
Group([ (1,2)(3,4), (1,4)(2,3) ])
########
########> Diff in [ "./../../lib/ctbl.gd", 4363, 4389 ]
# Input is:
NormalSubgroupClassesInfo( tbl );
# Expected output:
rec( nsg := [ Group([ (1,2)(3,4), (1,3)(2,4) ]) ],
  nsgclasses := [ [ 1, 3 ] ], nsgfactors := [  ] )
# But found:
rec( nsg := [ Group([ (1,2)(3,4), (1,4)(2,3) ]) ], 
  nsgclasses := [ [ 1, 3 ] ], nsgfactors := [  ] )
########
########> Diff in [ "./../../lib/ctbl.gd", 4363, 4389 ]
# Input is:
NormalSubgroupClassesInfo( tbl );
# Expected output:
rec( nsg := [ Group([ (1,2)(3,4), (1,3)(2,4) ]) ],
  nsgclasses := [ [ 1, 3 ] ], nsgfactors := [ Group([ f1, f2 ]) ] )
# But found:
rec( nsg := [ Group([ (1,2)(3,4), (1,4)(2,3) ]) ], 
  nsgclasses := [ [ 1, 3 ] ], nsgfactors := [ Group([ f1, f2 ]) ] )
########

@olexandr-konovalov
Copy link
Member

olexandr-konovalov commented Nov 24, 2016

@hulpke finally, I haven't seen grpauto.tst completed at all in the tests. The 32-bit version terminates with gap: cannot extend the workspace any more! and the 64-bit version just runs out of time (I've specified that this Jenkins test runs for 10 hours before it is automatically terminated, which is usually more than enough to build GAP and run make test<something>, but not in this case). Is it possible to find a smaller instance of the problem for the test file? Or is it the problem with assertions which should be moved from level 2 to 3?

@olexandr-konovalov
Copy link
Member

@hulpke for the record, the grpauto.tst test is really hard even with the default assertion level. I've commented out START_TEST and GAP started with bin/gap.sh -r -A -x 80 -m 100m -o 1g to run it as we do in make teststandard, and it runs for 2 hours already.

If there is a way to give more lightweight tests, these heavy tests may go into the benchmark directory and we may still run them periodically, requesting more memory when needed.

@olexandr-konovalov
Copy link
Member

I've tried to run grpauto.tst with -K 2g option added to start GAP with the default assertion level (by commenting out START_TEST in the test file). It runs until line 92 then:

# line 90, input:
IsomorphismGroups(G,H);
# line 92, input:
IsomorphismGroups(G,PcGroupCode(CodePcGroup(G),Size(G)))=fail;
gap: will not extend workspace above -K limit, bye!

though it takes about an hour to get there. Just suggested to use -K 2g in regression tests in #959. With that PR merged, of course, there will be still a need in changing grpauto.tst, but at least the test infrastructure will not choke on it that hard.

@hulpke
Copy link
Contributor Author

hulpke commented Nov 25, 2016

@alex-konovalov
The error message "Alternating recognition needed" gets triggered because the code cannot find subgroup information in the table of marks library and the A_n is not natural. So this must be a huge example for which to ask for subgroups or maximal subgroups, or an error in the tables of marks.

@olexandr-konovalov
Copy link
Member

@hulpke:

The error message "Alternating recognition needed" gets triggered because the code cannot find subgroup information in the table of marks library and the A_n is not natural. So this must be a huge example for which to ask for subgroups or maximal subgroups, or an error in the tables of marks.

This happens only with -r -A, as I've mentioned above. Perhaps test redesign intended for GAP 4.9 should treat those examples differently as requiring TomLib, or TomLib should be needed for GAP. My choice would be the former option rather then the latter.

@hungaborhorvath
Copy link
Contributor

The message "Alternating recognition needed" is triggered also in #763 without packages for the example A5 x A5. So this seems to make a problem for consistent tests between with and without packages.

olexandr-konovalov pushed a commit that referenced this pull request Nov 28, 2016
@olexandr-konovalov olexandr-konovalov changed the title Further enhancements to group automorphisms Improved performance for group isomorphism/automorphisms Jan 29, 2018
@olexandr-konovalov olexandr-konovalov added the release notes: added PRs introducing changes that have since been mentioned in the release notes label Jan 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: added PRs introducing changes that have since been mentioned in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants