Skip to content

Conversation

@tpoterba
Copy link
Contributor

Improve the parse trie to use iteration instead of parser recursion.

Improve the parse trie to use iteration instead of parser recursion.
Copy link
Member

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just have a couple easy optimization ideas you can take or leave.

Comment on lines 13 to 23
class ParseTrieNode(val prev: Char, var hasEnd: Boolean, var children: ArrayBuffer[ParseTrieNode]) {
def search(next: Char): ParseTrieNode = {
var i = 0
while (i < children.size) {
val child = children(i)
if (child.prev == next)
return child
i += 1
}
null
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this has good enough performance for now, I'm happy to approve as is, but when I suggested linear search would probably be faster than binary search or hash lookup, I was assuming linear search through contiguous memory.

That would be easy to achieve here: just remove the prev field, and add a childLabels: Array[Char]. Then you search through childLabels, and descend to the corresponding element of children on a match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'll work, nice.


object ParseTrieNode {
def generate(data: Array[String]): ParseTrieNode = {
val root = new ParseTrieNode(null.asInstanceOf[Char], false, new ArrayBuffer[ParseTrieNode])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the null here. Char is a primitive type, I wouldn't have thought this was allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it probably produces null char. null.asInstanceOf[Int] == 0. Did this to be clear that the char here is a placeholder / unused, but using your array[char] makes this unnecessary!

else
Some((s.charAt(0), s.substring(1)))
}.groupBy(_._1).mapValues { v => oneOfLiteral(v.map(_._2)) }
val sb: StringBuilder = new StringBuilder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential optimization: to avoid allocating in the parser (and to effectively intern the literal strings), the hasEnd flag on the trie nodes could be replaced by a possibly null value: String, which is a reference to the element of a that led to that node. Then the parser can just return value, or failure if null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that's nice.

@danking danking merged commit ceb27f3 into hail-is:main Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants