Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locally defined structs are slow in RacketCS #3535

Closed
yjqww6 opened this issue Dec 4, 2020 · 4 comments
Closed

Locally defined structs are slow in RacketCS #3535

yjqww6 opened this issue Dec 4, 2020 · 4 comments
Labels
performance Related to how fast things run racket-cs Specific to Racket-on-Chez

Comments

@yjqww6
Copy link
Contributor

yjqww6 commented Dec 4, 2020

What version of Racket are you using?
7.9 [cs]

What program did you run?

#lang racket
(require racket/unsafe/ops)
(define-syntax-rule (test)
  (begin
    (struct A (a b c d) #:authentic)
    (define (build n)
      (cond
        [(> n 0)
         (define a (build (sub1 n)))
         (A a a a a)]
        [else 1]))
    (define (f x)
      (cond
        [(A? x) (unsafe-fx+ (f (A-a x))
                            (f (A-b x))
                            (f (A-c x))
                            (f (A-d x)))]
        [else x]))

    (define t (build 13))
    (collect-garbage)
    (time (f t))))

(test)

(let () (test))

What should have happened?
I expect they have similar performance, but the local version is 3-4x slower.

cpu time: 172 real time: 193 gc time: 0
67108864
cpu time: 677 real time: 689 gc time: 0
67108864

I consider this behavior comes from the lift pass of schemify. The cp0 output is
module-level (seems optimal):

[f.1 (lambda (x_17)
       (if (#3%record? x_17 struct:A.1)
           (let ([app_21 (f.1 (#3%$object-ref
                               'scheme-object
                               x_17
                               9))])
             (let ([app_22 (f.1 (#3%$object-ref
                                 'scheme-object
                                 x_17
                                 17))])
               (let ([app_23 (f.1 (#3%$object-ref
                                   'scheme-object
                                   x_17
                                   25))])
                 (#3%fx+
                  app_21
                  app_22
                  app_23
                  (f.1 (#3%$object-ref
                        'scheme-object
                        x_17
                        33))))))
           x_17))]

local (struct operations are not inlined):

[f_31 (lambda (A-a_25 A-b_26 A-c_27 A-d_28 A?_24 x_41)
        (if (A?_24 x_41)
            (let ([app_34 (f_31 A-a_25 A-b_26 A-c_27
                                A-d_28 A?_24
                                (A-a_25 x_41))])
              (let ([app_35 (f_31 A-a_25 A-b_26 A-c_27
                                  A-d_28 A?_24
                                  (A-b_26 x_41))])
                (let ([app_36 (f_31 A-a_25 A-b_26 A-c_27
                                    A-d_28 A?_24
                                    (A-c_27 x_41))])
                  (#3%fx+
                   app_34
                   app_35
                   app_36
                   (f_31 A-a_25 A-b_26 A-c_27 A-d_28
                         A?_24 (A-d_28 x_41))))))
            x_41))]
@jackfirth jackfirth added performance Related to how fast things run racket-cs Specific to Racket-on-Chez labels Dec 5, 2020
@mflatt
Copy link
Member

mflatt commented Dec 7, 2020

I see two possibilities here: re-implement Racket-style lifting (currently in schemify) at the Chez Scheme layer some time after cp0, or just inline struct operations at the schemify layer. The later looks much easier.

Racket-style lifting ensures that when a function is locally defined and only referenced as a called function (i.e., not as a first-class function), then a closure is never allocated. Instead, values that would otherwise be in the closure are converted to arguments that are passed at each call site. There are cases where creating a closure can be faster than passing extra arguments, but usually not. Experience from the first couple of years of porting Racket to Chez Scheme convinced me that this part of Racket's cost model is important to keep. There's even a macro-based implementation of lifting for the Rumble layer in "racket/src/cs/rumble/define.ss" so that the Racket cost model works there. (Otherwise, for example, the Rumble layer's equal? allocates too much.)

This lifting interferes with inlining, and that was the original motivation for inlining support in schemify. However, schemify doesn't currently try to inline the application of struct operations like predicates and selectors. The authentic operations are more opaque to schemify than to cp0, but schemify could replace an authentic struct selector with (record-accessor <rtd> <port>) and effectively force cp0 to inline.

Adding a Racket-style lifting pass to Chez Scheme would be more general, and that might reduce the need for inlining in schemify. Then again, unless support for cross-linklet optimization is also pushed down to the Chez Scheme level, then inlining is still needed at schemify. Finally, lifting at schemify is probably still needed for code that has to be interpreted (when the enclosing module is too large), unless we find another solution to that problem.

@mflatt
Copy link
Member

mflatt commented Dec 8, 2020

It turns out that inlining at the schemify layer is especially easy: it's just a matter of removing the constraint on the existing inlining implementation that makes it apply only to imported variables.

I'm looking into this in combination with potential changes to avoid the tables of constructors, predicates, accessors, and mutators.

@mflatt mflatt closed this as completed in e02c417 Dec 9, 2020
@yjqww6
Copy link
Contributor Author

yjqww6 commented Dec 17, 2020

Just experimented on a lift pass for chez (working on L6). Some outputs from chez bootstrapping:

(time (for-each (lambda (...) ...) ...))
    526 collections
    11.601378041s elapsed cpu time, including 4.044462000s collecting
    11.633799000s elapsed real time, including 4.056826000s collecting
    8384797152 bytes allocated, including 8382437424 bytes reclaimed

vs

(time (for-each (lambda (...) ...) ...))
    638 collections
    12.246427944s elapsed cpu time, including 4.239168000s collecting
    12.308074000s elapsed real time, including 4.262574000s collecting
    10216838816 bytes allocated, including 10214522000 bytes reclaimed

The gc pressure was reduced by about 18%, though the influence on running time is small.

@mflatt
Copy link
Member

mflatt commented Dec 18, 2020

Nice! I look forward to your results.

maueroats pushed a commit to maueroats/racket that referenced this issue Jun 17, 2021
Avoid a global table to register structure procedures, and instead use
a wrapper procedure. At the same time, adjust schemify to more
agressively inline structure operations, which can avoid a significant
performance penalty for local structure types.

Closes racket#3535
maueroats pushed a commit to maueroats/racket that referenced this issue Jun 17, 2021
Avoid a global table to register structure procedures, and instead use
a wrapper procedure. At the same time, adjust schemify to more
agressively inline structure operations, which can avoid a significant
performance penalty for local structure types.

Closes racket#3535
mflatt added a commit to mflatt/ChezScheme that referenced this issue Oct 10, 2023
Avoid a global table to register structure procedures, and instead use
a wrapper procedure. At the same time, adjust schemify to more
agressively inline structure operations, which can avoid a significant
performance penalty for local structure types.

Closes racket/racket#3535

Original commit: racket/ChezScheme@497351b
mflatt added a commit to mflatt/ChezScheme that referenced this issue Oct 10, 2023
Avoid a global table to register structure procedures, and instead use
a wrapper procedure. At the same time, adjust schemify to more
agressively inline structure operations, which can avoid a significant
performance penalty for local structure types.

Closes racket/racket#3535

Original commit: racket/ChezScheme@54b5839
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Related to how fast things run racket-cs Specific to Racket-on-Chez
Projects
None yet
Development

No branches or pull requests

3 participants