Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCOL-3713: Join abort fix #1001

Merged
merged 4 commits into from Jan 15, 2020
Merged

MCOL-3713: Join abort fix #1001

merged 4 commits into from Jan 15, 2020

Conversation

pleblanc1976
Copy link
Contributor

Noticed a couple problems. 1) segregateJoiners() would double-lock djsMutex if any join exceeded the mem limit; 2) the logic to detect the mem overflow might have flagged joins as disk-joins even if that were disabled, requiring a longer than necessary abort process. Also it was fairly unreadble; cleaned it up & added a little inline documentation.

To verify I set the session var columnstore_um_mem_limit first between the PM and UM join limits and executed a large join, then set it below the PM join limit. Verified with gdb that no threads were orphaned; verified with valgrind & top that no memory was leaked. Enabled disk join and verified the same.

Patrick LeBlanc added 2 commits January 6, 2020 18:03
Seems that segregateJoiners would double-lock a mutex.  Not sure why
we're only seeing it now.

This is a checkpoint commit, probably not final for this bug.
A little cleanup.  It's working.
@pleblanc1976 pleblanc1976 changed the title Join abort fix MCOL-3713: Join abort fix Jan 7, 2020
Patrick LeBlanc and others added 2 commits January 8, 2020 16:05
Noticed a race on djsJoiners in abort() and segregateJoiners().
Re-added the mutex lock, but after risk of double lock grab.
djsJoiners.push_back(joiners[i]);
djsJoinerMap.push_back(i);
}
return;
}
#endif

boost::mutex::scoped_lock sl(djsLock);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we take lock here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lock protects djs and djsJoiners. Abort() needs a consistent view of those, and it can come asynchronously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants