Skip to content

macdice/some-io-tests

Repository files navigation

Here I plan to collect some small programs for measuring various I/O effects,
and log some simple results.  This isn't real benchmarking and isn't on server
class hardware, so take it with a pinch of salt, lime and tequila.

For ZFS, the following experimental fadvise support patch is applied (FreeBSD
legacy and OpenZFS FreeBSD/Linux versions):

   https://github.com/freebsd/freebsd/compare/master...macdice:fadvise
   https://github.com/zfsonfreebsd/ZoF/compare/projects/pr-rebase...macdice:fadvise
   openzfs/zfs#9807

The tests are run on a couple of different machines with 1GB or 4GB of memory and
NVMEe storage, using a 10GB data file so it doesn't fit in memory.

Experiment 2019-12-23: Does prefetch interfere with forward sequential prefetch heuristics?
===========================================================================================

Linux/ext4: yes, very much
FreeBSD/zfs (patched): not much (maybe just syscall overhead?)

tmunro@debian10:~/projects/some-io-tests$ time ./sequential-read teng 0 1

real    0m5.595s
user    0m0.050s
sys     0m2.708s

tmunro@debian10:~/projects/some-io-tests$ time ./sequential-read teng 1 1

real    1m22.506s
user    0m1.091s
sys     0m31.512s


tmunro@freebsd13:~/projects/some-io-tests $ time ./sequential-read /tank1/test/teng 0
       25.03 real         0.07 user        24.28 sys

tmunro@freebsd13:~/projects/some-io-tests $ time ./sequential-read /tank1/test/teng 1
       28.06 real         0.13 user        27.38 sys

(These numbers are quite repeatable, I'm just showing one result for brevity;
the ext4 and zfs results can't be compared, rather different systems, that
isn't the point here).

Conclusion: PostgreSQL should be careful never to generate sequential
fadvise() calls (so long as we use buffered I/O).

Experiment 2019-12-23: Does prefetching backwards help?
=======================================================

Linux/ext4: yes
FreeBSD/zfs (patched): yes

$ ./sequential-read /tank1/test/teng 0 -1

Very bad:

  sequential-read                                     pread ns                                          
           value  ------------- Distribution ------------- count    
           65536 |                                         0        
          131072 |@                                        1594     
          262144 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    42704    
          524288 |@@                                       2230     
         1048576 |                                         130      
         2097152 |                                         10       
         4194304 |                                         3        
         8388608 |                                         0        
        16777216 |                                         1        
        33554432 |                                         0    

And with prefetching:

$ ./sequential-read /tank1/test/teng 20 -1

Very good:

  sequential-read                                     posix_fadvise ns
           value  ------------- Distribution ------------- count
            1024 |                                         0
            2048 |                                         2559
            4096 |@@@@@@@@@@@@@@@@@@@@@@                   182352
            8192 |@@@@@@@@@@@                              94481
           16384 |@@@@@@                                   48742
           32768 |@                                        8443
           65536 |                                         797
          131072 |                                         426
          262144 |                                         325
          524288 |                                         148
         1048576 |                                         55
         2097152 |                                         9
         4194304 |                                         0

  sequential-read                                     pread ns
           value  ------------- Distribution ------------- count
            1024 |                                         0
            2048 |                                         1
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@                213351
            8192 |@@@@@@@@                                 67696
           16384 |@@@@                                     30043
           32768 |@                                        4272
           65536 |@                                        4713
          131072 |@                                        7630
          262144 |@                                        8522
          524288 |                                         1775
         1048576 |                                         174
         2097152 |                                         31
         4194304 |                                         6
         8388608 |                                         0
        16777216 |                                         1
        33554432 |                                         0

Similar effects are visible on Linux/ext4.

Conclusion: PostgreSQL should should still generate fadvise() calls for
backwards scans.

Experiment 2019-12-22: ZFS with explicit prefetching of random pages
====================================================================

root@freebsd13:~/some-io-tests # time ./random-read /tank/db/teng 0 100000 0
0.001u 0.172s 0:18.81 0.9%      175+5784k 89140+0io 0pf+0w
root@freebsd13:~/some-io-tests # time ./random-read /tank/db/teng 0 100000 0
0.001u 0.170s 0:18.79 0.9%      175+5784k 88872+0io 0pf+0w
root@freebsd13:~/some-io-tests # time ./random-read /tank/db/teng 0 100000 0
0.003u 0.171s 0:20.00 0.8%      168+5567k 89242+0io 0pf+0w

Prefetching 20 blocks ahead with posix_fadvise(POSIX_FADV_WILLNEED), I see a ~4x speed-up:

root@freebsd13:~/some-io-tests # time ./random-read /tank/db/teng 20 100000 0
0.001u 0.142s 0:04.02 3.4%      108+3564k 89218+0io 0pf+0w
root@freebsd13:~/some-io-tests # time ./random-read /tank/db/teng 20 100000 0
0.002u 0.191s 0:06.21 3.0%      110+3633k 89177+0io 0pf+0w
root@freebsd13:~/some-io-tests # time ./random-read /tank/db/teng 20 100000 0
0.004u 0.187s 0:05.79 3.1%      132+3807k 89190+0io 0pf+0w

With dtrace we can see the ARC hits and misses pretty clearly:

  random-read                                         pread ns                                          
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |                                         42       
            8192 |                                         1207     
           16384 |@@@                                      8337     
           32768 |                                         829      
           65536 |                                         555      
          131072 |@@@@@@@@@@@@@@@@@@@@@@@@@@               65995    
          262144 |@@@@@@@@                                 19639    
          524288 |@                                        2395     
         1048576 |                                         788      
         2097152 |                                         126      
         4194304 |                                         69       
         8388608 |                                         12       
        16777216 |                                         7        
        33554432 |                                         0   

When the stream of "advice" calls is added, the ARC misses clustered around 131us
are gone, and all pread() calls cluster around the 8us bucket:

  random-read                                         posix_fadvise ns                                  
           value  ------------- Distribution ------------- count    
            1024 |                                         0        
            2048 |                                         280      
            4096 |@@                                       2564     
            8192 |@@@@@@@@@@@@@@@@@@                       23685    
           16384 |@@@@@@@@@@@@@@@@@@                       23669    
           32768 |@@                                       2890     
           65536 |                                         196      
          131072 |                                         40       
          262144 |                                         17       
          524288 |                                         11       
         1048576 |                                         2        
         2097152 |                                         0        

  random-read                                         pread ns                                          
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |                                         165      
            8192 |@@@@@@@@@@@@@@@@@@@                      25734    
           16384 |@@@@@@@@@@@@@@@@@                        23085    
           32768 |@@                                       2307     
           65536 |                                         643      
          131072 |@                                        865      
          262144 |                                         386      
          524288 |                                         79       
         1048576 |                                         27       
         2097152 |                                   

Similar effects visible on Linux/ext4.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published