Skip to content

Conversation

@zhengyu123
Copy link
Contributor

@zhengyu123 zhengyu123 commented Nov 22, 2021

There are several issues with ThreadStackTracker.

  1. Following assertion:
    assert(MemTracker::tracking_level() >= NMT_summary, "Must be");
    in ThreadStackTracker::record/release_thread_stack() is unreliable. The full fence after downgrading tracking level is not sufficient to avoid the racy.

  2. NMT tracking level is downgraded without ThreadCritical lock held. But, it does require ThreadCritical lock to be held when it actually downgrade internal tracking data structure, so checking internal state is reliable to determine current tracking state.
    Add assertion to ensure correct tracking state

  3. _thread_counter is updated with ThreadCritical lock, but is read without the lock. Change to atomic update to ensure reader will not read stale value.

  4. NMT events are relative rare. So far, I have yet seen (1) assertion failure but add new test may help to spot such problem.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6504/head:pull/6504
$ git checkout pull/6504

Update a local copy of the PR:
$ git checkout pull/6504
$ git pull https://git.openjdk.java.net/jdk pull/6504/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6504

View PR using the GUI difftool:
$ git pr show -t 6504

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6504.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Nov 22, 2021

👋 Welcome back zgu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8277486: NMT: Cleanup ThreadStackTracker 8277486: NMT: Cleanup ThreadStackTracker Nov 22, 2021
@openjdk openjdk bot added the rfr Pull request is ready for review label Nov 22, 2021
@openjdk
Copy link

openjdk bot commented Nov 22, 2021

@zhengyu123 The following label will be automatically applied to this pull request:

  • hotspot-runtime

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-runtime hotspot-runtime-dev@openjdk.org label Nov 22, 2021
@mlbridge
Copy link

mlbridge bot commented Nov 22, 2021

Webrevs

@dholmes-ora
Copy link
Member

_thread_counter is updated with ThreadCritical lock, but is read without the lock. Change to atomic update to ensure reader will not read stale value.

Use of atomic updates with MO_RELAXED will not affect whether an Atomic::load sees a stale value or not.

@zhengyu123
Copy link
Contributor Author

_thread_counter is updated with ThreadCritical lock, but is read without the lock. Change to atomic update to ensure reader will not read stale value.

Use of atomic updates with MO_RELAXED will not affect whether an Atomic::load sees a stale value or not.

I thought we settled this in PR 6065, as you stated:

I see now that the C++ "Modification Order" definition requires the write to the counter to be (for want of a better term) "immediately visible" to any subsequent read - so no stale value could be read.

If you are referring the inaccurate read of the counter w/o the lock, then yes, it is a general problem with NMT, where various counters are not strictly sync'd to avoid taking locks.

@dholmes-ora
Copy link
Member

Sorry @zhengyu123 but every time I re-read the C++ memory model stuff I get a different take on exactly what different MO's provide. The write-read coherence in the modification order requires that the write happens-before the read. And the inter-thread happens-before ordering then states the five conditions, any of which, establish the happens-before ordering. But I do not see any of those conditions being met for the Atomic::load that does the read and the Atomic::inc or dec that last updated the counter. They do not synchronize-with each other (unlike C++ atomics), there is no dependency order and no obvious transitive happens-before or synchronizes-with ordering. The atomic load is AFAICS no different from a memory ordering or visibility perspective, to a plain load as it takes no MO parameter and is not specified to provide any specific MO semantics - unlike in C++ where the default is seq-cst. So unless the Atomic::inc/dec provide some form of full memory synchronization barrier, the load need not see the most recent write.

assert_post_init();
if (tracking_level() < NMT_summary) return;
if (addr != NULL) {
ThreadCritical tc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not require, to prevent the assert in (new|delete)_thread_stack, to repeat the tracking level check inside the ThreadCritical scope? Since the tracking level could have changed between lines 268 and 270?

Copy link
Contributor Author

@zhengyu123 zhengyu123 Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With/without ThreadCritical lock, the assertion in (new|delete)_thread_stack is unreliable, because tracking level is undergraded without ThreadCritical lock, therefore, there is no synchronize-with relationship. So to check tracking level inside (new|delete)_thread_stack, it can only rely on

Early return at line #268 avoids very expensive ThreadCritcal lock, given NMT is off by default and also tracking level can only be downgraded monotonically.

Anyway, I believe current shutdown logic for virtual memory tracking is racy, filed JDK-8277788.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also would like one less use of ThreadCritical lock, since it's used for malloc if needed for resource allocation.

if (_simple_thread_stacks == NULL) {
assert(MemTracker::tracking_level() < NMT_summary, "Must be");
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at MemTracker::transition_to, the downgrade path:

  } else if (current_level > level) {
    // Downgrade tracking level, we want to lower the tracking level first
    _tracking_level = level;
    // Make _tracking_level visible immediately.
    OrderAccess::fence();
    VirtualMemoryTracker::transition(current_level, level);
    MallocTracker::transition(current_level, level);
    ThreadStackTracker::transition(current_level, level);

Is the comment about "immediately" correct? I would have assumed OrderAccess::fence() has not have anything to do with timing, just with the order the writes are perceptible from outside?

So, concurrent threads may observe tracking level == minimal but the pointers to the infrastructure are still there?

I see that you do not guard MemTracker::transition_to with ThreadCritical. You do guard teardown inside ThreadStackTracker::transition and VirtualMemoryTracker::transition but it looks like teardown in MallocTracker::transition is unguarded by ThreadCritical at least.

Just a proposal, would guarding upstairs in MemTracker::transition_to not be clearer and safer, especially if it includes changing the tracking level?

Alternatively, do we even have to delete all those structures on shutdown? They are not that costly. We shut down either manually - nobody cares then - or if we encounter native OOM. In the latter case, those detail structures are arguably important since they contain data that may help analyze the OOM. The malloc site table, for instance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to that, I always wondered why we even expose shutdown via jcmd... especially since this is a one-way operation, so its use is limited.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadStackTracker could be tracked as malloc memory, and yes, MallocTracker::transition is not guarded by ThreadCritical, because it is too expensive on other operations.

Yes, I believe we have yet seen any bug reported, solely due to limited use of shutdown function :-)

static void new_thread_stack(void* base, size_t size, const NativeCallStack& stack);
static void delete_thread_stack(void* base, size_t size);

static bool track_as_vm() { return AIX_ONLY(false) NOT_AIX(true); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to your patch: I wonder whether this could be controlled as a diagnostic switch, to be able to test both implementations on normal platforms.

@dholmes-ora
Copy link
Member

@tstuefe :

I would have assumed OrderAccess::fence() has not have anything to do with timing, just with the order the writes are perceptible from outside?

fence() is supposed to be a sledgehammer that forces a consistent view of memory for all threads/processors. Any writes before a fence should be made visible to all other processors before the fence returns, such that loads of any of this locations will see the value written. It is "immediate" in that sense.

@zhengyu123
Copy link
Contributor Author

Sorry @zhengyu123 but every time I re-read the C++ memory model stuff I get a different take on exactly what different MO's provide. The write-read coherence in the modification order requires that the write happens-before the read. And the inter-thread happens-before ordering then states the five conditions, any of which, establish the happens-before ordering. But I do not see any of those conditions being met for the Atomic::load that does the read and the Atomic::inc or dec that last updated the counter. They do not synchronize-with each other (unlike C++ atomics), there is no dependency order and no obvious transitive happens-before or synchronizes-with ordering. The atomic load is AFAICS no different from a memory ordering or visibility perspective, to a plain load as it takes no MO parameter and is not specified to provide any specific MO semantics - unlike in C++ where the default is seq-cst. So unless the Atomic::inc/dec provide some form of full memory synchronization barrier, the load need not see the most recent write.

@dholmes-ora I am with you. In fact, our internal discussion started when I questioned assertions in NMT, e.g. MemoryCounter::deallocation() it asserts allocation count > 0 and allocation amount >= deallocation amount, where loads are pain loads.

From another perspective, in PR 6065, I mentioned that atomic r-m-w update at single location is total order, guaranteed by coherence protocol (AFAICT, all supported platforms are coherence based multiple processor systems).

For a processor to perform atomic r-m-w operation, it has to acquire corresponding cache line in exclusive state. Therefore, other processors who happen to own the cache line, have to invalidate the cache line before atomic operation can be performed.

After atomic r-m-w operation, when other thread tries to load the value, coherence protocol ensures that it sees new value.

@tstuefe
Copy link
Member

tstuefe commented Nov 25, 2021

Hi Zhengyu,

Could we not simply remove the shutdown logic from NMT altogether? I think its not needed and makes things more complex than necessary.

We use shutdown in two cases:

  1. When user commands us to, with jcmd. As I argued before, I think this has no real use in practice. Why would you do that? In fact, I use NMT a lot, for many years, and have never used it.
  2. when NMT shuts down as a reaction to either MallocSiteTable allocation failure, or MST overflow. Both are very rare, almost impossibly so (see below). In both cases I argue shutdown does not help, makes things complicated, and actually hinders analyzing the OOM afterward:
    2.1) Failing to malloc a node for the MST: If that happens the C-heap is exhausted. Shutting down NMT will not affect the outcome. NMT is very frugal. So we could just ignore the failing allocation and go on with our life: either C-heap recovers (highly unlikely) or not. If it recovers, no harm done, NMT continues working. If it does not recover, the VM will get a native OOM in the immediate future. But in the OOM case, NMT report is very useful. I don't want NMT to shut down and delete the MST. I want to be able to see all my malloc sites.
    2.2) MST overflow: for that to happen the bucket length of one individual bucket chain has to reach its limit. After https://bugs.openjdk.java.net/browse/JDK-8275320, that limit is 64k nodes for both 32-but and 64-bit platforms. This plain cannot happen. But even if it were to happen, we would have a problem since MST lookup would become extremely slow because of O(n) traversal on every malloc/free.

Note how rare Point (2) is: It depends on MST fill size, which depends on NMT stack depth. With the default stack depth of 4, atm we never even reach ~1000 individual call sites. I once did an experiment with stack depth=16 and reached about 10000 call sites. For (2.1), the chance of one of the few MST node mallocs to run into OOM is astronomically low. Especially since if MST is that full, we will see tons of user mallocs, and one of those will certainly hit OOM first. And (2.2) cannot happen at all: even with an MST size of 1, we could accommodate 65k entries, a lot more than even the 10k I reached with stack depth = 16. But of course, MST size is not 1. In fact, I think we should even assert against MST overflow, not handle it gracefully.

So, let us remove shutdown. It would simplify NMT quite a bit, remove quite a bit of code, testing too. And the tracking level would be constant throughout VM life, making concurrency much simpler.

Cheers, Thomas

@zhengyu123
Copy link
Contributor Author

Hi Zhengyu,

Could we not simply remove the shutdown logic from NMT altogether? I think its not needed and makes things more complex than necessary.

We use shutdown in two cases:

  1. When user commands us to, with jcmd. As I argued before, I think this has no real use in practice. Why would you do that? In fact, I use NMT a lot, for many years, and have never used it.
  2. when NMT shuts down as a reaction to either MallocSiteTable allocation failure, or MST overflow. Both are very rare, almost impossibly so (see below). In both cases I argue shutdown does not help, makes things complicated, and actually hinders analyzing the OOM afterward:
    2.1) Failing to malloc a node for the MST: If that happens the C-heap is exhausted. Shutting down NMT will not affect the outcome. NMT is very frugal. So we could just ignore the failing allocation and go on with our life: either C-heap recovers (highly unlikely) or not. If it recovers, no harm done, NMT continues working. If it does not recover, the VM will get a native OOM in the immediate future. But in the OOM case, NMT report is very useful. I don't want NMT to shut down and delete the MST. I want to be able to see all my malloc sites.
    2.2) MST overflow: for that to happen the bucket length of one individual bucket chain has to reach its limit. After https://bugs.openjdk.java.net/browse/JDK-8275320, that limit is 64k nodes for both 32-but and 64-bit platforms. This plain cannot happen. But even if it were to happen, we would have a problem since MST lookup would become extremely slow because of O(n) traversal on every malloc/free.

Note how rare Point (2) is: It depends on MST fill size, which depends on NMT stack depth. With the default stack depth of 4, atm we never even reach ~1000 individual call sites. I once did an experiment with stack depth=16 and reached about 10000 call sites. For (2.1), the chance of one of the few MST node mallocs to run into OOM is astronomically low. Especially since if MST is that full, we will see tons of user mallocs, and one of those will certainly hit OOM first. And (2.2) cannot happen at all: even with an MST size of 1, we could accommodate 65k entries, a lot more than even the 10k I reached with stack depth = 16. But of course, MST size is not 1. In fact, I think we should even assert against MST overflow, not handle it gracefully.

So, let us remove shutdown. It would simplify NMT quite a bit, remove quite a bit of code, testing too. And the tracking level would be constant throughout VM life, making concurrency much simpler.

Cheers, Thomas

Hi Thomas,

I am with you, I don't see it is very useful. I believe it is a remnant of old implementation, that NMT could consume excessive amount of memory and needed to shutdown itself to ensure health of JVM.

How we proceed to remove the command? need CSR?

Thanks,

-Zhengyu

@coleenp
Copy link
Contributor

coleenp commented Nov 29, 2021

The last comments attracted my attention. It does seem like the shutdown logic was because of the 1.0 implementation and shouldn't be needed anymore. If you write a CSR to deprecated it for 18 (quickly), you could remove it in a couple of weeks.

@zhengyu123 zhengyu123 closed this Dec 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-runtime hotspot-runtime-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants