Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark Radxa Rock5 model B #8

Closed
geerlingguy opened this issue Nov 27, 2022 · 4 comments
Closed

Benchmark Radxa Rock5 model B #8

geerlingguy opened this issue Nov 27, 2022 · 4 comments

Comments

@geerlingguy
Copy link
Owner

geerlingguy commented Nov 27, 2022

Running the benchmark now. Surprisingly, Radxa's apt repository seems to not have the key pre-shipped with their Debian OS image download, so I had to add it to the keyring manually before running the playbook.

@geerlingguy
Copy link
Owner Author

geerlingguy commented Nov 28, 2022

First run, no particular optimizations:

    ================================================================================
    HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
    Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
    Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
    Modified by Julien Langou, University of Colorado Denver
    ================================================================================
  
    An explanation of the input/output parameters follows:
    T/V    : Wall time / encoded variant.
    N      : The order of the coefficient matrix A.
    NB     : The partitioning blocking factor.
    P      : The number of process rows.
    Q      : The number of process columns.
    Time   : Time in seconds to solve the linear system.
    Gflops : Rate of execution for solving the linear system.
  
    The following parameter values will be used:
  
    N      :   14745
    NB     :     256
    PMAP   : Row-major process mapping
    P      :       1
    Q      :       4
    PFACT  :   Right
    NBMIN  :       4
    NDIV   :       2
    RFACT  :   Crout
    BCAST  :  1ringM
    DEPTH  :       1
    SWAP   : Mix (threshold = 64)
    L1     : transposed form
    U      : transposed form
    EQUIL  : yes
    ALIGN  : 8 double precision words
  
    --------------------------------------------------------------------------------
  
    - The matrix A is randomly generated for each test.
    - The following scaled residual check will be computed:
          ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
    - The relative machine precision (eps) is taken to be               1.110223e-16
    - Computational tests pass if scaled residuals are less than                16.0
  
    ================================================================================
    T/V                N    NB     P     Q               Time                 Gflops
    --------------------------------------------------------------------------------
    WR11C2R4       14745   256     1     4              45.80             4.6669e+01
    HPL_pdgesv() start time Mon Nov 28 06:53:03 2022
  
    HPL_pdgesv() end time   Mon Nov 28 06:53:49 2022
  
    --------------------------------------------------------------------------------
    ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   1.67524587e-03 ...... PASSED
    ================================================================================
  
    Finished      1 tests with the following results:
                  1 tests completed and passed residual checks,
                  0 tests completed and failed residual checks,
                  0 tests skipped because of illegal input values.
    --------------------------------------------------------------------------------
  
    End of Tests.
    ================================================================================

So 46.669 Gflops, not bad at all!

@geerlingguy
Copy link
Owner Author

And running with P set to 2 and Q set to 4 (to spread load so it hits all 8 cores and not just the 4x higher performance A73 cores), I could eke out just a tiny bit more at 16W:

    ================================================================================
    HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
    Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
    Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
    Modified by Julien Langou, University of Colorado Denver
    ================================================================================
  
    An explanation of the input/output parameters follows:
    T/V    : Wall time / encoded variant.
    N      : The order of the coefficient matrix A.
    NB     : The partitioning blocking factor.
    P      : The number of process rows.
    Q      : The number of process columns.
    Time   : Time in seconds to solve the linear system.
    Gflops : Rate of execution for solving the linear system.
  
    The following parameter values will be used:
  
    N      :   14745
    NB     :     256
    PMAP   : Row-major process mapping
    P      :       2
    Q      :       4
    PFACT  :   Right
    NBMIN  :       4
    NDIV   :       2
    RFACT  :   Crout
    BCAST  :  1ringM
    DEPTH  :       1
    SWAP   : Mix (threshold = 64)
    L1     : transposed form
    U      : transposed form
    EQUIL  : yes
    ALIGN  : 8 double precision words
  
    --------------------------------------------------------------------------------
  
    - The matrix A is randomly generated for each test.
    - The following scaled residual check will be computed:
          ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
    - The relative machine precision (eps) is taken to be               1.110223e-16
    - Computational tests pass if scaled residuals are less than                16.0
  
    ================================================================================
    T/V                N    NB     P     Q               Time                 Gflops
    --------------------------------------------------------------------------------
    WR11C2R4       14745   256     2     4              45.45             4.7029e+01
    HPL_pdgesv() start time Mon Nov 28 15:37:59 2022
  
    HPL_pdgesv() end time   Mon Nov 28 15:38:45 2022
  
    --------------------------------------------------------------------------------
    ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   1.72233956e-03 ...... PASSED
    ================================================================================
  
    Finished      1 tests with the following results:
                  1 tests completed and passed residual checks,
                  0 tests completed and failed residual checks,
                  0 tests skipped because of illegal input values.
    --------------------------------------------------------------------------------
  
    End of Tests.
    ================================================================================

47.029 Gflops at 16W - 2.94 Gflops/W
And 46.669 Gflops at 15W - 3.11 Gflops/W

@geerlingguy
Copy link
Owner Author

geerlingguy commented Sep 7, 2023

Re-running with Blis instead of ATLAS... (see #15 and #14)

@geerlingguy geerlingguy reopened this Sep 7, 2023
@geerlingguy
Copy link
Owner Author

51.382 Gflops at 12.0W = 4.32 Gflops/W

  mpirun_output.stdout: |-
    ================================================================================
    HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
    Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
    Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
    Modified by Julien Langou, University of Colorado Denver
    ================================================================================
  
    An explanation of the input/output parameters follows:
    T/V    : Wall time / encoded variant.
    N      : The order of the coefficient matrix A.
    NB     : The partitioning blocking factor.
    P      : The number of process rows.
    Q      : The number of process columns.
    Time   : Time in seconds to solve the linear system.
    Gflops : Rate of execution for solving the linear system.
  
    The following parameter values will be used:
  
    N      :   14745
    NB     :     256
    PMAP   : Row-major process mapping
    P      :       1
    Q      :       4
    PFACT  :   Right
    NBMIN  :       4
    NDIV   :       2
    RFACT  :   Crout
    BCAST  :  1ringM
    DEPTH  :       1
    SWAP   : Mix (threshold = 64)
    L1     : transposed form
    U      : transposed form
    EQUIL  : yes
    ALIGN  : 8 double precision words
  
    --------------------------------------------------------------------------------
  
    - The matrix A is randomly generated for each test.
    - The following scaled residual check will be computed:
          ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
    - The relative machine precision (eps) is taken to be               1.110223e-16
    - Computational tests pass if scaled residuals are less than                16.0
  
    ================================================================================
    T/V                N    NB     P     Q               Time                 Gflops
    --------------------------------------------------------------------------------
    WR11C2R4       14745   256     1     4              41.60             5.1382e+01
    HPL_pdgesv() start time Thu Sep  7 20:47:42 2023
  
    HPL_pdgesv() end time   Thu Sep  7 20:48:23 2023
  
    --------------------------------------------------------------------------------
    ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   4.07780599e-03 ...... PASSED
    ================================================================================
  
    Finished      1 tests with the following results:
                  1 tests completed and passed residual checks,
                  0 tests completed and failed residual checks,
                  0 tests skipped because of illegal input values.
    --------------------------------------------------------------------------------
  
    End of Tests.
    ================================================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant