Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[pseudo] Placeholder disambiguation strategy: always choose second
Mostly mechanics here. Interesting decisions: - apply disambiguation in-place instead of copying the forest debatable, but even the final tree size is significant - split decide/apply into different functions - this allows the hard part (decide) to be tested non-destructively and combined with HTML forest easily - add non-const accessors to forest to enable apply - unit tests but no lit tests: my plan is to test actual C++ disambiguation heuristics with lit, generic disambiguation mechanics without the C++ grammar Differential Revision: https://reviews.llvm.org/D132487
- Loading branch information
1 parent
4a56470
commit 56c54cf
Showing
10 changed files
with
269 additions
and
11 deletions.
There are no files selected for viewing
64 changes: 64 additions & 0 deletions
64
clang-tools-extra/pseudo/include/clang-pseudo/Disambiguate.h
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
//===--- Disambiguate.h - Find the best tree in the forest -------*- C++-*-===// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
// | ||
// A GLR parse forest represents every possible parse tree for the source code. | ||
// | ||
// Before we can do useful analysis/editing of the code, we need to pick a | ||
// single tree which we think is accurate. We use three main types of clues: | ||
// | ||
// A) Semantic language rules may restrict which parses are allowed. | ||
// For example, `string string string X` is *grammatical* C++, but only a | ||
// single type-name is allowed in a decl-specifier-sequence. | ||
// Where possible, these interpretations are forbidden by guards. | ||
// Sometimes this isn't possible, or we want our parser to be lenient. | ||
// | ||
// B) Some constructs are rarer, while others are common. | ||
// For example `a<b>::c` is often a template specialization, and rarely a | ||
// double comparison between a, b, and c. | ||
// | ||
// C) Identifier text hints whether they name types/values/templates etc. | ||
// "std" is usually a namespace, a project index may also guide us. | ||
// Hints may be within the document: if one occurrence of 'foo' is a variable | ||
// then the others probably are too. | ||
// (Text need not match: similar CaseStyle can be a weak hint, too). | ||
// | ||
//----------------------------------------------------------------------------// | ||
// | ||
// Mechanically, we replace each ambiguous node with its best alternative. | ||
// | ||
// "Best" is determined by assigning bonuses/penalties to nodes, to express | ||
// the clues of type A and B above. A forest node representing an unlikely | ||
// parse would apply a penalty to every subtree is is present in. | ||
// Disambiguation proceeds bottom-up, so that the score of each alternative | ||
// is known when a decision is made. | ||
// | ||
// Identifier-based hints within the document mean some nodes should be | ||
// *correlated*. Rather than resolve these simultaneously, we make the most | ||
// certain decisions first and use these results to update bonuses elsewhere. | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#include "clang-pseudo/Forest.h" | ||
|
||
namespace clang::pseudo { | ||
|
||
struct DisambiguateParams {}; | ||
|
||
// Maps ambiguous nodes onto the index of their preferred alternative. | ||
using Disambiguation = llvm::DenseMap<const ForestNode *, unsigned>; | ||
|
||
// Resolve each ambiguous node in the forest. | ||
// Maps each ambiguous node to the index of the chosen alternative. | ||
// FIXME: current implementation is a placeholder and chooses arbitrarily. | ||
Disambiguation disambiguate(const ForestNode *Root, | ||
const DisambiguateParams &Params); | ||
|
||
// Remove all ambiguities from the forest, resolving them according to Disambig. | ||
void removeAmbiguities(ForestNode *&Root, const Disambiguation &Disambig); | ||
|
||
} // namespace clang::pseudo |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
//===--- Disambiguate.cpp - Find the best tree in the forest --------------===// | ||
// | ||
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
// See https://llvm.org/LICENSE.txt for license information. | ||
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
// | ||
//===----------------------------------------------------------------------===// | ||
|
||
#include "clang-pseudo/Disambiguate.h" | ||
|
||
namespace clang::pseudo { | ||
|
||
Disambiguation disambiguate(const ForestNode *Root, | ||
const DisambiguateParams &Params) { | ||
// FIXME: this is a dummy placeholder strategy, implement a real one! | ||
Disambiguation Result; | ||
for (const ForestNode &N : Root->descendants()) { | ||
if (N.kind() == ForestNode::Ambiguous) | ||
Result.try_emplace(&N, 1); | ||
} | ||
return Result; | ||
} | ||
|
||
void removeAmbiguities(ForestNode *&Root, const Disambiguation &D) { | ||
std::vector<ForestNode **> Queue = {&Root}; | ||
while (!Queue.empty()) { | ||
ForestNode **Next = Queue.back(); | ||
Queue.pop_back(); | ||
switch ((*Next)->kind()) { | ||
case ForestNode::Sequence: | ||
for (ForestNode *&Child : (*Next)->elements()) | ||
Queue.push_back(&Child); | ||
break; | ||
case ForestNode::Ambiguous: { | ||
assert(D.count(*Next) != 0 && "disambiguation is incomplete!"); | ||
ForestNode *ChosenChild = (*Next)->alternatives()[D.lookup(*Next)]; | ||
*Next = ChosenChild; | ||
Queue.push_back(Next); | ||
break; | ||
} | ||
case ForestNode::Terminal: | ||
case ForestNode::Opaque: | ||
break; | ||
} | ||
} | ||
} | ||
|
||
} // namespace clang::pseudo |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.