Skip to content

Conversation

@bbernhar
Copy link
Contributor

@bbernhar bbernhar commented Jul 13, 2022

Creating a resource which made the current usage equal to budget would result in IsOverBudget never terminating. This fix replaces IsOverBudget with GetBudgetLeft so created resources never exceed the budget.

@bbernhar bbernhar requested a review from bjjones July 13, 2022 16:38
@github-actions github-actions bot added D3D12 DirectX 12 Backend Change Test Changes in tests. labels Jul 13, 2022
Copy link
Contributor

@bjjones bjjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Note that I did find the the reason tests were looping forever on my machine: In D3D12ResidencyManagerTests.cpp L91, the return statement should be an OR, not an AND. If I'm remembering correctly - I think we may be seeing (valid) behavior differences between hardware. On my Nvidia dGPU, evicted local memory does not get counted towards the non-local memory budget. I think this is different from how Intel integrated GPUs do it and I think this might be why you used an AND here.

@bbernhar
Copy link
Contributor Author

Question -

evicted local memory does not get counted towards the non-local memory budget.

We only evict using the same segment the budget uses, though.

ReturnIfFailed(pResidencyManager->Evict(descriptor.SizeInBytes, memorySegmentGroup));

This check

return local->Budget <= local->CurrentUsage && nonLocal->Budget <= nonLocal->CurrentUsage;

The AND was because I don't care which segment goes over budget, so long all of them are under.

Does that sound correct to you?

@bjjones
Copy link
Contributor

bjjones commented Jul 13, 2022

...nevermind. What I first said was wrong - I misread the code and jumped to a conclusion. Whats happening on my machine is that local->Budget is exactly equal to local->CurrentUsage - so we're stiil technically not exceeding the budget, and returning false.

I'm getting this in a loop:

GPGMM Info (tid:8740): GPU page-out(5): Number of allocations: 1 (1048576 bytes).
GPGMM Debug (tid:8740): SlabMemoryAllocator(0): Slab size exceeds available memory: 1048576 vs 0 bytes.
GPGMM Info (tid:8740): SlabCacheAllocator(3): Failed to allocate memory for request

The issue may be that we're calling Evict, and it doesn't immediately evict (as expected). Then in SlabMemoryAllocator.cpp L104 we bail out of allocating more because still availableForAllocation < slabSize.

@bbernhar
Copy link
Contributor Author

Where do you see an error?

SlabMemoryAllocator.cpp bails out only when there are no slabs left to re-use from the cache.

GPGMM Info (tid:8740): SlabCacheAllocator(3): Failed to allocate memory for request // Not fatal!

uint64_t SlabMemoryAllocator::FindNextFreeSlabOfSize(uint64_t slabSize) const {

When it falls-back, you get a heap exactly sized to the resource, increasing CurrentUsage, which usually succeeds a bit longer.

// StandaloneMemoryAllocator sub-allocates memory with exactly one block.

In D3D12ResidencyTests.cpp, Evict always happens on creation:

desc.Flags |= gpgmm::d3d12::ALLOCATOR_FLAG_ALWAYS_IN_BUDGET;

I suspect either the segment wasn't being updated or we are allocating from the wrong segment. See if GetPreferredMemorySegmentGroup() returns a different segment type here:

DXGI_MEMORY_SEGMENT_GROUP ResidencyManager::GetPreferredMemorySegmentGroup(

@bjjones
Copy link
Contributor

bjjones commented Jul 13, 2022

GetPreferredMemorySegmentGroup appears to return the correct thing (local).

I think what I may be seeing is that we're doing this:

while(currentUsage <= budget) // and currentUsage is exactly the budget here
{
    see that we can't allocate another resource and stay under budget, so evict one resource 
    allocate one resource //which is the exact same size as evicted resource
}

We'll never exceed the budget when doing this - usage will always be equal to the budget when we check (and I think this is correct behavior).

Is using IsOverBudget() the correct method for this test? The purpose of the residency manager is to ensure we don't go over budget. It seems like we should instead be testing that the segment budget isn't exceeded after we've allocated enough resources to exceed the budget.

@bbernhar
Copy link
Contributor Author

The test only fails for notifications, correct?

If so, I suspect this is due to a race condition. If the notification occurred sometime after a CreateHeap [going over budget] but before Evict is called for the next CreateHeap then this loop cannot terminate. You can test this theory by setting a breakpoint here:

std::vector<ID3D12Pageable*> objectsToEvict;

If true, we can make the test become deterministic by changing the termination condition to use the same check in GPGMM_SKIP_TEST_IF_NOT_CREATED_IN_BUDGET.

@bjjones
Copy link
Contributor

bjjones commented Jul 13, 2022

I've been referring to D3D12ResidencyManagerTests.OverBudget, which appears to be an infinite loop on my machine.

D3D12ResidencyManagerTests.OverBudgetUsingBudgetNotifications seems to loop forever too, but I am getting the additional message that is not present in the other test: GPGMM Debug (tid:20736): ResourceAllocator(0): Current usage exceeded budget (23275970560 vs 23274641408 bytes). so you could be right about the race condition for this test.

@bbernhar bbernhar changed the title Skip tests when OS budget doesn't allow new heaps to be resident Fix OverBudget tests from never terminating. Jul 14, 2022
@bbernhar
Copy link
Contributor Author

@bjjones Thanks, I see the issue now. Looks like I got lucky with my heap sizes. PTAL again.

Copy link
Contributor

@bjjones bjjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that D3D12ResidencyManagerTests.OverBudgetUsingBudgetNotifications fails for me due to allocations not being resident. I think it may just be that we're going very close to the budget and normal budget fluctuations are causing things to be evicted before we expect them to. I do see the test passing when I increased the allocation size to 50MB and made the change on the condition I suggested in this review. edit: the test is still flaky after making these changes.

@bjjones
Copy link
Contributor

bjjones commented Jul 14, 2022

I was able to make D3D12ResidencyManagerTests.OverBudgetUsingBudgetNotifications pass reliably by changing
while (resourceAllocator->GetInfo().UsedMemoryUsage + GPGMM_MB_TO_BYTES(1) < memoryUnderBudget) { to
while (resourceAllocator->GetInfo().UsedMemoryUsage + bufferDesc.Width <= GetBudgetLeft(residencyManager.Get(), bufferMemorySegment))

@bbernhar bbernhar force-pushed the fix_test branch 2 times, most recently from 869668c to cb18b5e Compare July 14, 2022 18:44
@bbernhar
Copy link
Contributor Author

Yup, good catch! I believe the underlying issue was the created heap sizes would not exactly fit the budget left and spilled a bit which caused the test to fail.

Thanks again.

@bbernhar
Copy link
Contributor Author

@bjjones Could you PTAL again? Thanks!

@bbernhar bbernhar requested a review from bjjones July 15, 2022 17:50
Copy link
Contributor

@bjjones bjjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still seeing flaky failures on the notification test. Everything else LGTM.

Copy link
Contributor

@bjjones bjjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Creating a resource which made the current usage equal to budget would result in IsOverBudget never terminating. This fix replaces IsOverBudget with GetBudgetLeft so created resources never exceed the budget.
@bbernhar bbernhar merged commit 54451a7 into main Jul 16, 2022
@bbernhar bbernhar deleted the fix_test branch July 16, 2022 00:14
@bbernhar
Copy link
Contributor Author

@bjjones Merging, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

D3D12 DirectX 12 Backend Change Test Changes in tests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants