Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use more StringScanner based API to parse XML #114

Commits on Feb 24, 2024

  1. Changed to frozen_string_literal: true.

    ## Why?
    Because `s.check("a")` is slower than `s.check("a".freeze)`.
    
    - benchmark/stringscan_2.yaml
    ```
    loop_count: 100000
    contexts:
      - name: No YJIT
        prelude: |
          $LOAD_PATH.unshift(File.expand_path("lib"))
          require 'rexml'
    
    prelude: |
      require 'strscan'
      s = StringScanner.new('abcdefg hijklmn opqrstu vwxyz')
      ptn = "a"
    benchmark:
      'check("a")'            : s.check("a")
      'check("a".freeze)'     : s.check("a".freeze)
      'ptn="a";s.check(ptn)'  : |
        ptn="a"
        s.check(ptn)
      'check(ptn)'            : s.check(ptn)
    ```
    
    ```
    $benchmark-driver benchmark/stringscan_2.yaml
    Comparison:
              check(ptn):  13524479.4 i/s
       check("a".freeze):  13433638.1 i/s - 1.01x  slower
              check("a"):  10231225.8 i/s - 1.32x  slower
    ptn="a";s.check(ptn):  10013017.0 i/s - 1.35x  slower
    ```
    naitoh committed Feb 24, 2024
    Configuration menu
    Copy the full SHA
    0656925 View commit details
    Browse the repository at this point in the history

Commits on Feb 26, 2024

  1. Changed processing in REXML::Parsers::BaseParser#pull_event from regu…

    …lar expression to processing using StringScanner.
    
    ## Why
    Improve maintainability by optimizing the process so that the parsing process proceeds using StringScanner#scan.
    
    # Changed
    - Added Source#string= method for error message output.
    - Added TestParseDocumentTypeDeclaration#test_no_name test case.
    - Of the `intSubset` of DOCTYPE, "<!" added consideration for processing `Comments` that begin with "<!".
    
    [intSubset Spec]
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-doctypedecl
    > [28] 	doctypedecl   ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>'
    
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-intSubset
    > [28b] intSubset   ::=  (markupdecl | DeclSep)*
    
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-markupdecl
    > [29]  markupdecl   ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment
    
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-elementdecl
    > [45]  elementdecl   ::=   '<!ELEMENT' S Name S contentspec S? '>'
    
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-AttlistDecl
    > [52] 	AttlistDecl   ::=   '<!ATTLIST' S Name AttDef* S? '>'
    
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-EntityDecl
    > [70] 	EntityDecl   ::=   GEDecl | PEDecl
    > [71] 	GEDecl	   ::=   '<!ENTITY' S Name S EntityDef S? '>'
    > [72] 	PEDecl	   ::=   '<!ENTITY' S '%' S Name S PEDef S? '>'
    
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-NotationDecl
    > [82] 	NotationDecl   ::=   '<!NOTATION' S Name S (ExternalID | PublicID) S? '>'
    
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PI
    > [16] 	PI	   ::=   '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
    
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Comment
    > [15] 	Comment	   ::=   '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
    
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-DeclSep
    > [28a] DeclSep	   ::=   PEReference | S
    
    https://www.w3.org/TR/2006/REC-xml11-20060816/#NT-PEReference
    > [69]  PEReference   ::=   '%' Name ';'
    
    [Benchmark]
    
    ```
    RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
    ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22]
    Calculating -------------------------------------
                             before       after  before(YJIT)  after(YJIT)
                     dom     11.240      10.569        17.173       18.219 i/s -     100.000 times in 8.896882s 9.461267s 5.823007s 5.488884s
                     sax     31.812      30.716        48.383       52.532 i/s -     100.000 times in 3.143500s 3.255655s 2.066861s 1.903600s
                    pull     36.855      36.354        56.718       61.443 i/s -     100.000 times in 2.713300s 2.750693s 1.763099s 1.627523s
                  stream     34.176      34.758        49.801       54.622 i/s -     100.000 times in 2.925991s 2.877065s 2.008003s 1.830779s
    
    Comparison:
                                  dom
             after(YJIT):        18.2 i/s
            before(YJIT):        17.2 i/s - 1.06x  slower
                  before:        11.2 i/s - 1.62x  slower
                   after:        10.6 i/s - 1.72x  slower
    
                                  sax
             after(YJIT):        52.5 i/s
            before(YJIT):        48.4 i/s - 1.09x  slower
                  before:        31.8 i/s - 1.65x  slower
                   after:        30.7 i/s - 1.71x  slower
    
                                 pull
             after(YJIT):        61.4 i/s
            before(YJIT):        56.7 i/s - 1.08x  slower
                  before:        36.9 i/s - 1.67x  slower
                   after:        36.4 i/s - 1.69x  slower
    
                               stream
             after(YJIT):        54.6 i/s
            before(YJIT):        49.8 i/s - 1.10x  slower
                   after:        34.8 i/s - 1.57x  slower
                  before:        34.2 i/s - 1.60x  slower
    
    ```
    
    - YJIT=ON : 1.06x - 1.10x faster
    - YJIT=OFF : 0.94x - 1.01x faster
    
    Co-authored-by: Sutou Kouhei <kou@clear-code.com>
    naitoh and kou committed Feb 26, 2024
    Configuration menu
    Copy the full SHA
    54b0298 View commit details
    Browse the repository at this point in the history