Skip to content

Commit

Permalink
Make (declare (not poll-on-return)) the default
Browse files Browse the repository at this point in the history
  • Loading branch information
feeley committed Jul 26, 2018
1 parent 354298e commit dfe0b57
Show file tree
Hide file tree
Showing 3 changed files with 6,684 additions and 7,505 deletions.
2 changes: 1 addition & 1 deletion doc/gambit.txi
Original file line number Diff line number Diff line change
Expand Up @@ -4089,7 +4089,7 @@ The default declarations used by the compiler are equivalent to:
(run-time-bindings)
(safe)
(interrupts-enabled)
(poll-on-return)
(not poll-on-return)
(not debug) ;; depends on debugging command line options
(debug-location) ;; depends on debugging command line options
(debug-source) ;; depends on debugging command line options
Expand Down
2 changes: 1 addition & 1 deletion gsc/_ptree1.scm
Original file line number Diff line number Diff line change
Expand Up @@ -555,7 +555,7 @@
(env-declare env (list interrupts-enabled-sym #f)))

(define (poll-on-return? env) ; true when interrupt checks should be generated on procedure returns
(declaration-value poll-on-return-sym #f #t env))
(declaration-value poll-on-return-sym #f #f env))

(define (debug? env) ; true iff debugging information should be generated
(declaration-value debug-sym #f compiler-option-debug env))
Expand Down
Loading

10 comments on commit dfe0b57

@gambiteer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, did this make a measurable difference in benchmark times?

@feeley
Copy link
Member Author

@feeley feeley commented on dfe0b57 Aug 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On some benchmarks yes. A more detailed analysis will have to wait for the CPU backend when we have more control over the generated machine code.

@vyzo
Copy link
Contributor

@vyzo vyzo commented on dfe0b57 Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the effect of this on the semantics of code?

@feeley
Copy link
Member Author

@feeley feeley commented on dfe0b57 Aug 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the (not poll-on-return) the polling interval is no longer bound by a constant. I haven't done an in-depth analysis, but I believe that in practice it will have a small effect on the average polling latency.

@gambiteer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a test:

(declare (standard-bindings)
         (extended-bindings)
         (block)
         (fixnum)
         (not safe))

(define (poll-on-return n)
  (declare (poll-on-return))
  (if (zero? n)
      n
      (+ 1 (poll-on-return (- n 1)))))

(define (no-poll-on-return n)
  (declare (not poll-on-return))
  (if (zero? n)
      n
      (+ 1 (poll-on-return (- n 1)))))

which results

> (time (poll-on-return 10000000))   
(time (poll-on-return 10000000))
    306 ms real time
    306 ms cpu time (262 user, 45 system)
    7 collections accounting for 195 ms real time (177 user, 18 system)
    257720512 bytes allocated
    59259 minor faults
    no major faults
10000000
> (time (no-poll-on-return 10000000))
(time (no-poll-on-return 10000000))
    179 ms real time
    179 ms cpu time (151 user, 28 system)
    1 collection accounting for 79 ms real time (75 user, 4 system)
    219203776 bytes allocated
    24625 minor faults
    no major faults
10000000

This is on

model name      : Intel(R) Xeon(R) CPU E3-1271 v3 @ 3.60GHz

with

gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3) 

and

> gsi -v
v4.8.9 20170203122653 x86_64-unknown-linux-gnu "./configure 'CC=gcc -march=native -D___CAN_IMPORT_CLIB_DYNAMICALLY' '--enable-single-host' '--enable-multiple-versions' '--enable-shared'"

@feeley
Copy link
Member Author

@feeley feeley commented on dfe0b57 Aug 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't measuring what you think you are measuring... this is measuring GC time... your first measurement of poll-on-return is growing the heap to a large size to accommodate the 10 million stack frames... this causes lots of GCs... the second measurement just cruises along because the heap is already very large.

@gambiteer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Here are times with a 1GB minimum heap.

> (time (poll-on-return 10000000))   
(time (poll-on-return 10000000))
    55 ms real time
    55 ms cpu time (18 user, 37 system)
    no collections
    no bytes allocated
    39775 minor faults
    no major faults
10000000
> (time (poll-on-return 10000000))
(time (poll-on-return 10000000))
    36 ms real time
    36 ms cpu time (36 user, 0 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults
10000000
> (time (poll-on-return 10000000))
(time (poll-on-return 10000000))
    30 ms real time
    30 ms cpu time (30 user, 0 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults
10000000
> (time (poll-on-return 10000000))
(time (poll-on-return 10000000))
    29 ms real time
    29 ms cpu time (29 user, 0 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults
10000000
> (time (no-poll-on-return 10000000))
(time (no-poll-on-return 10000000))
    34 ms real time
    34 ms cpu time (34 user, 0 system)
    no collections
    no bytes allocated
    1 minor fault
    no major faults
10000000
> (time (no-poll-on-return 10000000))
(time (no-poll-on-return 10000000))
    28 ms real time
    28 ms cpu time (28 user, 0 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults
10000000
> (time (no-poll-on-return 10000000))
(time (no-poll-on-return 10000000))
    33 ms real time
    33 ms cpu time (33 user, 0 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults
10000000
> (time (no-poll-on-return 10000000))
(time (no-poll-on-return 10000000))
    33 ms real time
    33 ms cpu time (33 user, 0 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults
10000000
> (time (no-poll-on-return 10000000))
(time (no-poll-on-return 10000000))
    30 ms real time
    30 ms cpu time (30 user, 0 system)
    no collections
    no bytes allocated
    1 minor fault
    no major faults
10000000

I thought the call/return cpu times looked big.

@feeley
Copy link
Member Author

@feeley feeley commented on dfe0b57 Aug 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to measure the overhead, use a small n (say around 100 to 1000) and loop that many times. That way the stack overflow and underflow handling will be factored out. And run for >= 1 sec please.

@gambiteer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier for me if you just told me all the things I'd have to look out for at once.
Let's try:

(declare (standard-bindings)
         (extended-bindings)
         (block)
         (fixnum)
         (not safe))

(define (poll-on-return n)
  (declare (poll-on-return))
  (do ((i 0 (+ i 1)))
      ((= i n))
    (let loop ((n 100))
      (if (zero? n)
          n
          (+ 1 (loop (- n 1)))))))

(define (no-poll-on-return n)
  (declare (not poll-on-return))
  (do ((i 0 (+ i 1)))
      ((= i n))
    (let loop ((n 100))
      (if (zero? n)
          n
          (+ 1 (loop (- n 1)))))))

which leads to

> (load "tail-poll-test")            
"/home/lucier/programs/gambit/gambit/tail-poll-test.o5"
> (time (poll-on-return 100000000))   
(time (poll-on-return 100000000))
    9494 ms real time
    9494 ms cpu time (9493 user, 0 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults
> (time (no-poll-on-return 100000000))
(time (no-poll-on-return 100000000))
    9085 ms real time
    9085 ms cpu time (9085 user, 0 system)
    no collections
    no bytes allocated
    no minor faults
    no major faults

If I make the inner loop 1000, then there's almost no difference:

> (time (poll-on-return 10000000))    
(time (poll-on-return 10000000))
    7063 ms real time
    7052 ms cpu time (7052 user, 0 system)
    no collections
    no bytes allocated
    3 minor faults
    no major faults
> (time (no-poll-on-return 10000000))
(time (no-poll-on-return 10000000))
    7001 ms real time
    6991 ms cpu time (6986 user, 5 system)
    no collections
    no bytes allocated
    1 minor fault
    no major faults

@feeley
Copy link
Member Author

@feeley feeley commented on dfe0b57 Aug 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be easier for me if you just told me all the things I'd have to look out for at once.

As I said a more detailed analysis will have to wait for the CPU backend when we have more control over the generated machine code. With the C backend it is hard to draw any conclusions as the C compiler's optimizations cause interference.

Please sign in to comment.