Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce mcdc::TVIdxBuilder (LLVM side, NFC) #80676

Merged
merged 17 commits into from
Feb 26, 2024
Merged

Conversation

chapuni
Copy link
Contributor

@chapuni chapuni commented Feb 5, 2024

This is a preparation of incoming Clang changes (#82448) and just checks TVIdx is calculated correctly. NFC.

TVIdxBuilder calculates deterministic Indices for each Condition Node. It is used for clang to emit TestVector indices (aka ID) and for llvm-cov to reconstruct TestVectors.

This includes the unittest CoverageMappingTest.TVIdxBuilder.

See also
https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798

Deprecate `TestVectors`, since no one uses it.

This affects the output order of ExecVectors.
The current impl emits sorted by binary value of ExecVector.
This impl emits along the traversal of `buildTestVector()`.
This accepts current version of profdata. The output might be different.

See also
https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798
@chapuni
Copy link
Contributor Author

chapuni commented Feb 6, 2024

I reworked to align the current implementation.

@chapuni chapuni changed the title Implement MCDCTVIdxBuilder and MCDCTestVectorBuilder (LLVM side) Implement MCDCTVIdxBuilder (LLVM side) Feb 6, 2024
@chapuni chapuni marked this pull request as ready for review February 6, 2024 10:20
Nodes[0].Width = 1;
Q.push_back(0);

unsigned Ord = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a bit more commentary on the overall process of the algorithm (and MCDCTVIdxBuilder)? What are the expected inputs and the expected output, generally? I think it would help with the readability a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added comments and the unittest. Hope it helps!

// `Index` encodes the bitmask of true values and is initially 0.
MCDCRecord::TestVector TV(NumConditions, MCDCRecord::MCDC_DontCare);
buildTestVector(TV, 1, 0);
buildTestVector(TV, 0, 0, 0);
assert(TVIdxs.size() == NumTestVectors && "TVIdxs wasn't fulfilled");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good assert

Copy link
Contributor Author

@chapuni chapuni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look. I will update this evening.

llvm/lib/ProfileData/Coverage/CoverageMapping.cpp Outdated Show resolved Hide resolved
public:
struct MCDCNode {
int InCount = 0; /// Reference count; temporary use
int Width; /// Number of paths (>= 1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not name it NumPaths?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was W in my prototype design. I have to update my imagination if this would be renamed.
Or, does the comment mislead?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/Width_(disambiguation) Width has many meanings for graphs. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is better for me to rename to easy N or to define Width here.
I haven't learnt the graph theory though, looks like I have introduced similar concept, as far as I read introductions of the theory.

Let me think more.

* Split out `Indices[ID][Cond]`
* Let `Nodes` debug-only.
* Introduce `Offset`
* Introduce `HardMaxTVs`
* Sink `TVIdxBuilder` into `mcdc::`.
* The ctor accepts `SmallVector<ConditionIDs>` indexed by `ID`.
* `class NextIDsBuilder` provides `NextIDs` as`SmallVector<ConditionIDs>`,
  for `TVIdxBuilder` to use it before `MCDCRecordProcessor()`.
  It was `BranchParamsMap` or `Map` as `DenseMap<Branch>`.
* `NodeIDs` and `Fetcher` function are deprecated.
Indices[ID][I] = NextNode.Width;
auto NextWidth = int64_t(NextNode.Width) + Node.Width;
if (NextWidth > HardMaxTVs) {
NumTestVectors = HardMaxTVs; // Overflow
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add debug output when there is an overflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess overflow is impossible in llvm-cov.
OTOH, Clang side will catch such a overflow and tell it with HardMaxTVs for warnings/errors. See MCDCCoverageBuilder in CoverageMappingGen.cpp, and mcdc-error-conditions.cpp.

I wonder it would be valuable to add debug message here.

}
}

std::sort(Decisions.begin(), Decisions.end());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use llvm::sort?

I think it takes ranges so you can just do llvm::sort(Decisions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've forgot it.

@chapuni chapuni changed the title Implement MCDCTVIdxBuilder (LLVM side) Introduce mcdc::TVIdxBuilder (LLVM side, NFC) Feb 26, 2024
@chapuni chapuni merged commit c087beb into llvm:main Feb 26, 2024
3 of 4 checks passed
@chapuni chapuni deleted the mcdc/tvidx branch February 26, 2024 04:23
}
}

// Sort key ordered by <-Width, Ord>
Copy link

@ZhuUx ZhuUx Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse me. Why end nodes of the decision need to be sorted in descending order of width?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So motivation of the sort is to prepare for truncating all conditions with width=1, is it?
If so, why not just remove them then with an O(n) method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you think they could be removed? It's the prerequisite to assign identical IDs between clang and llvm-cov.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, if we did not sort them, we can not reduce size of bitmaps effectively since there might be other conditions with width>1 getting large index.

I'm trying to remove the sort because I find it breaks some nature if the topological relationship of conditions is abnormal (e.g conditions from pattern matching in rust) and generates troubled index maps. I would investigate it more. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Could you introduce me the discussion, or paste the problematic cond tree anywhere?

Copy link

@ZhuUx ZhuUx Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☺️ Sorry I found I made a mistake. I previously encountered a example with decision tree:

flowchart TD
C0 -- T --> C1
C1 -- F --> F1
C1 -- T --> C3
C0 -- F --> C2
C2 -- T --> C3
C2 -- F --> F2
C3 -- F --> C4
C3 -- T --> T3
C4 -- T --> T4
C4 -- F --> F4
Loading

Normal boolean expressions can not generate such decision tree. In normal boolean decisions, either true next of a condition is a next (true or false) of its false next, or false next of a condition is a next of its true next.

I tried to disable the sort and fixed some cases. So I guessed something was broken by the sort. But that's wrong. The real problem is the way to update cond bitmap, which is specified for previous cond bitmap but is inappropriate for current. This implementation still works for pattern matching, thanks to your excellent efforts!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance, F1 and F2 have w=1, and others have w=2. Correct?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The index map is right. The key difference is at most one of C0 and C2 can be true, which not likes (C0 && C1) || C2. In previous I updated false bit of C2 when C0 is true since if C0 is true, C2 must be false. While at present we can not do this.

@ZhuUx
Copy link

ZhuUx commented Jun 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants