Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Fix performance of parse_locus_interval #11216

Merged
merged 2 commits into from
Jan 14, 2022

Conversation

tpoterba
Copy link
Contributor

Improve the parse trie to use iteration instead of parser recursion.

Improve the parse trie to use iteration instead of parser recursion.
Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just have a couple easy optimization ideas you can take or leave.

Comment on lines 13 to 23
class ParseTrieNode(val prev: Char, var hasEnd: Boolean, var children: ArrayBuffer[ParseTrieNode]) {
def search(next: Char): ParseTrieNode = {
var i = 0
while (i < children.size) {
val child = children(i)
if (child.prev == next)
return child
i += 1
}
null
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this has good enough performance for now, I'm happy to approve as is, but when I suggested linear search would probably be faster than binary search or hash lookup, I was assuming linear search through contiguous memory.

That would be easy to achieve here: just remove the prev field, and add a childLabels: Array[Char]. Then you search through childLabels, and descend to the corresponding element of children on a match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'll work, nice.


object ParseTrieNode {
def generate(data: Array[String]): ParseTrieNode = {
val root = new ParseTrieNode(null.asInstanceOf[Char], false, new ArrayBuffer[ParseTrieNode])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the null here. Char is a primitive type, I wouldn't have thought this was allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it probably produces null char. null.asInstanceOf[Int] == 0. Did this to be clear that the char here is a placeholder / unused, but using your array[char] makes this unnecessary!

else
Some((s.charAt(0), s.substring(1)))
}.groupBy(_._1).mapValues { v => oneOfLiteral(v.map(_._2)) }
val sb: StringBuilder = new StringBuilder()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another potential optimization: to avoid allocating in the parser (and to effectively intern the literal strings), the hasEnd flag on the trie nodes could be replaced by a possibly null value: String, which is a reference to the element of a that led to that node. Then the parser can just return value, or failure if null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that's nice.

@danking danking merged commit ceb27f3 into hail-is:main Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants