-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] Fix performance of parse_locus_interval #11216
Conversation
Improve the parse trie to use iteration instead of parser recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just have a couple easy optimization ideas you can take or leave.
class ParseTrieNode(val prev: Char, var hasEnd: Boolean, var children: ArrayBuffer[ParseTrieNode]) { | ||
def search(next: Char): ParseTrieNode = { | ||
var i = 0 | ||
while (i < children.size) { | ||
val child = children(i) | ||
if (child.prev == next) | ||
return child | ||
i += 1 | ||
} | ||
null | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this has good enough performance for now, I'm happy to approve as is, but when I suggested linear search would probably be faster than binary search or hash lookup, I was assuming linear search through contiguous memory.
That would be easy to achieve here: just remove the prev
field, and add a childLabels: Array[Char]
. Then you search through childLabels
, and descend to the corresponding element of children
on a match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that'll work, nice.
|
||
object ParseTrieNode { | ||
def generate(data: Array[String]): ParseTrieNode = { | ||
val root = new ParseTrieNode(null.asInstanceOf[Char], false, new ArrayBuffer[ParseTrieNode]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by the null
here. Char
is a primitive type, I wouldn't have thought this was allowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it probably produces null char. null.asInstanceOf[Int] == 0
. Did this to be clear that the char here is a placeholder / unused, but using your array[char] makes this unnecessary!
else | ||
Some((s.charAt(0), s.substring(1))) | ||
}.groupBy(_._1).mapValues { v => oneOfLiteral(v.map(_._2)) } | ||
val sb: StringBuilder = new StringBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another potential optimization: to avoid allocating in the parser (and to effectively intern the literal strings), the hasEnd
flag on the trie nodes could be replaced by a possibly null value: String
, which is a reference to the element of a
that led to that node. Then the parser can just return value
, or failure if null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, that's nice.
Improve the parse trie to use iteration instead of parser recursion.