Fixed the wrong results bug for certain tile sizes. #17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The previous version of Tiled-MM was producing wrong results for some of the matrix multiplication parameters occurring in the RPA simulations:
After a careful analysis, it was shown that the problem occurs when the tile size is a bit smaller than the corresponding matrix dimension, and only in some cases. Here, we had the tile_m size 5000 and the dimension m = 5427.
Further reducing the problem size to a smaller case where the bug is reproducible, led to the following case, giving wrong results:
This bug was present in the older version of Tiled-MM as well. The reason it was not showing up in unit tests, was that all the tests in the previous versions were using the default tile size of 5000.
After debugging, we realized the problem was in the way how the actual tile sizes were calculated here. The actual tile sizes can be smaller than the original tile sizes, especially if the matrix dimension is not divisible by the corresponding tile size.
We fixed this problem by specifying the tile coordinates in the function which computes the actual tile sizes (here and here). This function already existed, but it seems the tile coordinates were accidentally left out in the function invocation.
Another issue that we discovered was a small typo where the value of the alpha parameter was printed instead of beta (here).
This PR fixes these problems and produces correct results for all the unit tests: