Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory canary #2293

Merged
merged 2 commits into from
Mar 28, 2018
Merged

Memory canary #2293

merged 2 commits into from
Mar 28, 2018

Conversation

ChrisJefferson
Copy link
Contributor

This is an old piece of code I had, simplified and resurrected.

This PR is split into two pieces:

The first, bigger but simpler commit, introduces to GASMAN the concept of MptrEndBags, which denotes the end of the master pointers. Previously, the start of bag memory (called OldBags) was also the end of the master pointers. This lets the two move independantly. This is useful for part 2, but also might be useful for other things (I know I, and others, have considered changing GASMAN to rearrange objects for locality as it GCs. That would be much easier after this PR).

The second, crazier but more self contained part, adds a configure-time switch ( --enable-memory-checking), which adds to GAP the ability to find places where pointers into bag data are incorrectly accessed after a call to GASMAN.

The easiest way to see this in action is using the 3rd commit (which I will remove before merging!) which adds a simple but extremely hard to track down bug (if you didn't know where it was) in GASMAN_LIMITS(), where we try to write to a bag location after making a new plist (note the code also catches invalid reads as well as writes).

Do:

gap> GASMAN_LIMITS();
[ 131072, 2097152, 0 ]
gap> GASMAN_MEM_CHECK(1); # Enables checks
true
gap> GASMAN_LIMITS();
Segmentation fault

If you run GAP in a debugger, it will drop you onto the bad line.

You can start checks at startup by running GAP with --enableMemCheck. Note that GAP will take a LONG time to start (I'm talking a couple of days).

While this code has some slightly crazy memory things, I've tried to keep them tightly defined within #ifdef GAP_MEM_CHECK. I've also tried to keep the code short (there are a number of things one could do more cleverly, but the code is already quite scary).

@codecov
Copy link

codecov bot commented Mar 25, 2018

Codecov Report

Merging #2293 into master will decrease coverage by 0.02%.
The diff coverage is 81.08%.

@@            Coverage Diff             @@
##           master    #2293      +/-   ##
==========================================
- Coverage   72.33%    72.3%   -0.03%     
==========================================
  Files         479      479              
  Lines      250960   251956     +996     
==========================================
+ Hits       181538   182185     +647     
- Misses      69422    69771     +349
Impacted Files Coverage Δ
src/system.h 66.66% <ø> (-33.34%) ⬇️
lib/system.g 67.13% <ø> (ø) ⬆️
src/gasman.h 90% <ø> (ø) ⬆️
src/system.c 68.62% <0%> (-0.51%) ⬇️
src/gap.c 80.41% <100%> (+0.03%) ⬆️
lib/init.g 71.54% <66.66%> (-0.04%) ⬇️
src/saveload.c 64.21% <66.66%> (ø) ⬆️
src/gasman.c 85.47% <95.83%> (+0.18%) ⬆️
hpcgap/lib/hpc/queue.g 66.4% <0%> (-3.2%) ⬇️
... and 8 more

@fingolfin
Copy link
Member

The first commit makes perfect sense to me, and I'd merge it instantly if it was a separate PR -- in case the second commit takes longer to review, it might be worthwhile to get the first one in ASAP, to avoid bitrot.

Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks sane and very useful to me. I have not yet actually tried it yet, though. Only have minor code nitpicks.

[AS_HELP_STRING([--enable-memory-checking], [enable memory checking])],
[AC_DEFINE([GAP_MEM_CHECK], [1], [define if building with memory checking])],
[enable_memory_checking=no]
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is not available in HPC-GAP, is it? Both because it is GASMAN specific, and because it does not perform any thread locking.

It would be kind of nice if this gave an error here if HPCGAP is enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps also add in "WARNING: this makes GAP super slow" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing: you already put in some kind of valgrind based memory checking in the past. Looking at this configure option, it is not clear which of the various memory debugging techniques this refers to...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to tidy up and document the valgrind stuff at some point. I have added a message on GAP startup saying it takes a long time to start up.

lib/init.g Outdated
@@ -560,6 +560,9 @@ BindGlobal( "ShowKernelInformation", function()
if GAPInfo.KernelInfo.KernelDebug then
Add( config, "KernelDebug" );
fi;
if GAPInfo.KernelInfo.MemCheck then
Add(config, "MemCheck (EXPERIMENTAL)");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the "EXPERIMENTAL"? When would it be removed, if ever?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Experimental might be a bad word. If we later changed GCs to boehm it would get removed, and I could imagine there could be later changes which significantly change the behaviour. Also I don't really want people to start using this as their standard GAP, because it is slower and more limited in a number of ways (doesn't give memory back to the OS for example).

src/gap.c Outdated
Obj FuncGASMAN_MEM_CHECK(Obj self, Obj newval)
{
EnableMemCheck = INT_INTOBJ(newval);
return True;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this return true, instead of 0?

src/gasman.c Outdated
* a reference to the internals of a GAP object are kept across
* a garbage collection (which moves GAP objects around).
*
* MultiHeap provides a method of detecting such problems, at
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time the word "MultiHeap" appears, which I find confusing. Either get rid of this; or else "define" it before use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change this. I used to call this MultiHeap.

src/gasman.c Outdated
*
* MultiHeap provides a method of detecting such problems, at
* the cost of GREATLY decreased performance (Starting GAP in
* MultiHeap mode takes hours, rather than seconds).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the PR, you say startup takes "days", here "hours" -- which is it? On what kind of system (clearly I can find old systems where it's much slower, so here really some info on the machine would be useful to gauge the severity of this slowdown)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on the machine. about 18 hours on Linux, 4 days on the linux subsystem on windows, haven't timed it on a mac yet. I'll just say 'days'.

src/gasman.c Outdated
return;

Int newBase = oldBase + 1;
// 0 is special, skip it when cycling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation seems to start in the middle of a thought, i.e. the reader has to already understand what's going on generally to make sense of it. Perhaps this is clearer?

  // cycle to next memory area, wrapping around at end, and skipping area 0 (which is special)
  Int newBase = oldBase + 1;
  if (newBase >= GetMembufCount())
    newBase = 1;

As you can see, I also use >= instead of == for paranoia -- the interface does not seem to guarantee that the return value of GetMembufCount() is constant. This could be "fixed" by replacing GetMembufCount() by a global constant (enum).

src/gasman.c Outdated

MoveBagMemory(GetMembuf(oldBase), GetMembuf(newBase));

// Block access to old memory, provide access to new memory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consistency: "Block" -> "block", remove trailing full stop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, perhaps "provide" -> "enable" ?

*
* The following code is used by GAP_MEM_CHECK support, which is
* documented in gasman.c
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps move this comment before the #if/#error, as it is already relevant to those?

src/system.c Outdated
}
}

// Make the first membufs the earlier in memory. This is where the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that valid english? In any case, I find it confusing, how about this alternative instead? "Sort the memory buffers ascending by their address, so that the first buffer comes first in memory. This is the one where the master pointers will live".

src/system.c Outdated
{ 0 , "quitonbreak", toggle, &SyQuitOnBreak, 0}, /* Quit GAP if we enter the break loop */
{ 0, "", 0, 0, 0}};
{ 0, "", 0, 0, 0 }
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that running git clang-format is quite convenient, but here it just makes reviewing this part of the PR hard (or impossible). I also find the formatted output quite ugly in parts, du to the way it wraps. So perhaps first have a commit which reformats this struct (ideally with some manual movement of comments, to make it less ugly, but that's optional); and then modify it in a commit after that?

@ChrisJefferson ChrisJefferson force-pushed the new-canary-2 branch 2 times, most recently from 485d291 to 1c33c19 Compare March 25, 2018 18:50
@ChrisJefferson
Copy link
Contributor Author

The only missing part from the PR now (I believe) is that it does not forbid building with HPC-GAP, but the better fix there is I believe to block use with BOEHM_GC, which is related to another open patch.

@fingolfin
Copy link
Member

This needs to be rebased now. Also, that other PR which introduced USE_GASMAN and renamed BOEHM_GC to USE_BOEHM_GC has landed.

src/gasman.c Outdated

UInt GetMembufCount(void);
void * GetMembuf(UInt i);
UInt GetMembufSize(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit wary about declaring external function prototypes outside a .h file -- this way, it is very easy to change the function implementation, while forgetting to adjust the prototype here, and there will be no warning. So I'd feel better if these were declared in e.g. system.h, inside an #ifdef GAP_MEM_CHECK / #endif block.

This allows for a seperation between the end of the master
pointers, and the beginning of the bag data area
Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Nice work!

This adds a new configure option to GAP which enables an
extremely slow debug mode, which can find problems where
references to memory are incorrectly kept across
garbage collections.
@fingolfin fingolfin merged commit 562896c into gap-system:master Mar 28, 2018
@ChrisJefferson ChrisJefferson deleted the new-canary-2 branch March 28, 2018 22:23
@fingolfin fingolfin added kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements topic: kernel labels May 2, 2018
@fingolfin fingolfin added the release notes: added PRs introducing changes that have since been mentioned in the release notes label Sep 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement Label for issues suggesting enhancements; and for pull requests implementing enhancements release notes: added PRs introducing changes that have since been mentioned in the release notes topic: kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants