Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the wrong results bug for certain tile sizes. #17

Merged
merged 2 commits into from
Feb 7, 2024
Merged

Conversation

kabicm
Copy link
Collaborator

@kabicm kabicm commented Feb 7, 2024

The previous version of Tiled-MM was producing wrong results for some of the matrix multiplication parameters occurring in the RPA simulations:

m = 5427, n = 217, k = 2170, alpha = 1, beta = 0, tile_m = tile_n = tile_k = 5000

After a careful analysis, it was shown that the problem occurs when the tile size is a bit smaller than the corresponding matrix dimension, and only in some cases. Here, we had the tile_m size 5000 and the dimension m = 5427.

Further reducing the problem size to a smaller case where the bug is reproducible, led to the following case, giving wrong results:

srun -N 1 -n 1 ./tests/test-multiply -m 5 -n 2 -k 1 --tile_m 4 --tile_n 2 --tile_k 1 --beta 0 --n_streams 2
==================================================
                Benchmarking Tiled-MM
==================================================
         MATRIX SIZES
=============================
 A = (5, 1)
 B = (1, 2)
 C = (5, 2)
=============================
         LEADING DIMS
=============================
 LD_A = 5
 LD_B = 1
 LD_C = 5
=============================
      SCALING CONSTANTS
=============================
 alpha = 1
 beta  = 1
=============================
      TRANSPOSE FLAGS
=============================
 trans_a = N
 trans_b = N
=============================
         TILE SIZES
=============================
 tile_m = 4
 tile_n = 2
 tile_k = 1
=============================
      ADDITIONAL OPTIONS
=============================
 num. of gpu streams = 2
 num. of repetitions = 1
=============================
Initial values in matrix A:
3
7
9
1
7

Initial values in matrix B:
7	5

Initial values in matrix C:
5	0
1	4
4	8
1	3
0	6

Correct result C = beta*C + alpha*A*B:
21	15
49	35
63	45
7	5
49	35

Time [ms] with copying C back: 0
Computed result by Tiled-MM with copying C back :
21	15
49	35
63	45
7	5
49	0

Time [ms] without copying C back: 0
Computed result by Tiled-MM without copying C back :
21	0
49	0
63	0
7	5
49	35

The result is NOT CORRECT

This bug was present in the older version of Tiled-MM as well. The reason it was not showing up in unit tests, was that all the tests in the previous versions were using the default tile size of 5000.

After debugging, we realized the problem was in the way how the actual tile sizes were calculated here. The actual tile sizes can be smaller than the original tile sizes, especially if the matrix dimension is not divisible by the corresponding tile size.

We fixed this problem by specifying the tile coordinates in the function which computes the actual tile sizes (here and here). This function already existed, but it seems the tile coordinates were accidentally left out in the function invocation.

Another issue that we discovered was a small typo where the value of the alpha parameter was printed instead of beta (here).

This PR fixes these problems and produces correct results for all the unit tests:

srun -N 1 -n 1 ./tests/test-multiply -m 5 -n 2 -k 1 --tile_m 4 --tile_n 2 --tile_k 1 --beta 0 --n_streams 2
==================================================
                Benchmarking Tiled-MM
==================================================
         MATRIX SIZES
=============================
 A = (5, 1)
 B = (1, 2)
 C = (5, 2)
=============================
         LEADING DIMS
=============================
 LD_A = 5
 LD_B = 1
 LD_C = 5
=============================
      SCALING CONSTANTS
=============================
 alpha = 1
 beta  = 0
=============================
      TRANSPOSE FLAGS
=============================
 trans_a = N
 trans_b = N
=============================
         TILE SIZES
=============================
 tile_m = 4
 tile_n = 2
 tile_k = 1
=============================
      ADDITIONAL OPTIONS
=============================
 num. of gpu streams = 2
 num. of repetitions = 1
=============================
Initial values in matrix A:
3
7
9
1
7

Initial values in matrix B:
7	5

Initial values in matrix C:
5	0
1	4
4	8
1	3
0	6

Correct result C = beta*C + alpha*A*B:
21	15
49	35
63	45
7	5
49	35

Time [ms] with copying C back: 0
Computed result by Tiled-MM with copying C back :
21	15
49	35
63	45
7	5
49	35

Time [ms] without copying C back: 0
Computed result by Tiled-MM without copying C back :
21	15
49	35
63	45
7	5
49	35

The result is CORRECT

@kabicm kabicm merged commit 85331eb into master Feb 7, 2024
1 check failed
@simonpintarelli
Copy link
Member

@kabicm Thank you very much for the help! I've merged the CI already, seems everything is in good shape again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants