Skip to content

Ensure constant time random access in ElemNameSeqReader. #544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 30, 2020

Conversation

omatzcoveo
Copy link
Contributor

Hi there,

Parsing large sequences has performance issues : I traced it back to a quadratic behavior in a loop using ElemNameSeqReader.

ElemNameSeqReader's use of isDefinedAt presuppose constant time random access in its internal seq, but it ends up doing it on a List, I believe fed from AnyElemNameParser. This diff both builds a Vector instead of a List from AnyElemNameParser, and enforces a Vector in ElemNameSeqReader internals. This is successful in fixing the performance issue we've observed at my company.

Let me know if you need me to do something else in the pull request (do I have to do something special to update src_managed/scalaxb/scalaxb.scala ?).

Thanks

@eed3si9n
Copy link
Owner

@omatzcoveo Thanks for the contribution. Could you fix the failing integration tests please?

@omatzcoveo omatzcoveo force-pushed the sequence-parsing-performance branch from 159e645 to 1065537 Compare November 27, 2020 16:02
@omatzcoveo
Copy link
Contributor Author

Last friday I opted to reduce the scope of the change in order not to have any impact on tests and such, and only keep the important bit that removes the quadratic behavior. Thanks!

@eed3si9n eed3si9n merged commit 1d123ed into eed3si9n:master Nov 30, 2020
@eed3si9n
Copy link
Owner

Nicely done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants