Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xml Parsing with large embedded SVG and Data URI can be very slow #7

Closed
Nanonid opened this issue Sep 9, 2014 · 8 comments
Closed

Comments

@Nanonid
Copy link

Nanonid commented Sep 9, 2014

Bountysource

I could really use this performance fix. :-)

I don't have a file I can share just as yet, but this diagram.xml file has a large embedded SVG and images encoded as URI.

Unfortunately, those large strings are attributes and not text elements.

I realize I should probably change the XML model, but I can't anytime soon.

Diagram: 2420642 diagram_export_1.xml

void main(){
  group( "diagram", (){
    test( "parse visit", (){
      String xml = readFileTextSync("diagram_export_1.xml");
      Stopwatch stopwatch = new Stopwatch()..start();
      XmlDocument doc = parse(xml);
      print( "parse time ${stopwatch.elapsed}");
      stopwatch.reset();
      DiaTransformer xform = new DiaTransformer();
      DiaDoc dia = xform.visitDocument(doc);
      print( "visit ${stopwatch.elapsed}");
    });
  });
}
unittest-suite-wait-for-done
parse time 0:00:20.558550
visit 0:00:00.010319
PASS: diagram parse visit

All 1 tests passed.
unittest-suite-success
@Nanonid
Copy link
Author

Nanonid commented Sep 9, 2014

_JSSyntaxRegExp._ExecuteMatch 98% CPU time

@Nanonid
Copy link
Author

Nanonid commented Sep 9, 2014

Reduced file to just large icons. And left in a tag by mistake.
It still took 20 seconds for this error.
Expected </dia:Model>, but found </Icons>

@Nanonid
Copy link
Author

Nanonid commented Sep 9, 2014

Hacking the parser

@Nanonid
Copy link
Author

Nanonid commented Sep 9, 2014

Thought maybe quote scanning was an issue. So I hacked in a probably not correct anyExcept scanner. Still 20 seconds.

/**
 * Returns a parser that accepts any input element except element.
 *
 * For example, `anyExcept()` succeeds and consumes any given letter
 * except element. It only fails for an empty input.
 * Equivalent to any().starLazy(char(element.codeUnitAt(0))
 */
Parser anyExcept(element,[String message = 'input expected']) {
  return new AnyExceptParser(element,message);
}

class AnyExceptParser extends Parser {

  final String _message;
  final String _element;
  final int _code;

  AnyExceptParser(element, this._message) :
    _element = element,
    _code = element.codeUnitAt(0);

  @override
  Result parseOn(Context context) {
    var start = context.position;
    var buffer = context.buffer;
    if( start >= buffer.length ){
      return context.failure(_message);
    }
    int len = buffer.length;
    int stop = start;
    while( stop < len && !identical(buffer[stop],_element) ) {
      stop++;
    }
    var result = context.buffer is String
        ? context.buffer.substring(start, stop)
        : context.buffer.sublist(start, stop);
    return context.success(result, stop);
  }

  @override
  Parser copy() => new AnyExceptParser(_element,_message);

  @override
  bool equalProperties(AnyExceptParser other) {
    return super.equalProperties(other)
        && _code == other._code
        && _message == other._message;
  }
}

@Nanonid
Copy link
Author

Nanonid commented Sep 9, 2014

It is _decodeXml!

      .seq(any().starLazy(char(DOUBLE_QUOTE)).flatten())
//      .seq(any().starLazy(char(DOUBLE_QUOTE)).flatten().map(_decodeXml))
unittest-suite-wait-for-done
parse time 0:00:00.363150
PASS: diagram parse visit

@Nanonid
Copy link
Author

Nanonid commented Sep 9, 2014

Sent a pull request which ~fixes this issue.
If you want to continue improving performance and usability, I'll let the bounty stand.
If you care.

@Nanonid
Copy link
Author

Nanonid commented Sep 9, 2014

Final results after new decode.

unittest-suite-wait-for-done
parse time 0:00:00.591667
visit 0:00:00.009809
PASS: diagram parse visit

All 1 tests passed.
unittest-suite-success

@Nanonid
Copy link
Author

Nanonid commented Sep 9, 2014

Working for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant