-
-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement ASR pass for promoting allocatables to non-allocatables #2734
Conversation
53a61b9
to
1520984
Compare
Do you see any speedups? |
I will check tomorrow. |
Nopes not in |
Thanks! I have a few questions:
The code in this PR seems ok, I just want to have good testing of this, or understanding what exactly it does and only call it when it makes sense. |
Here's one example where it helps in speeding up the code, program debug
implicit none
integer :: i, s
s = 0
do i = 1, 200000000
call prg(10, 10, 10, s)
end do
print *, s
contains
subroutine prg(d1, d2, d3, s)
integer, intent(in) :: d1, d2, d3
integer, intent(out) :: s
integer, allocatable :: d(:, :, :)
allocate(d(d1, d2, d3))
s = s + size(d)
end subroutine
end program Without the pass (lf) 18:53:01:~/lfortran_project/lfortran % lfortran /Users/czgdp1807/lfortran_project/debug.f90 --skip-pass=promote_allocatable_to_nonallocatable -o a.out
(lf) 18:53:05:~/lfortran_project/lfortran % time ./a.out
-1863462912
./a.out 4.28s user 0.01s system 99% cpu 4.287 total With the pass (lf) 18:53:40:~/lfortran_project/lfortran % lfortran /Users/czgdp1807/lfortran_project/debug.f90 -o a.out
(lf) 18:53:56:~/lfortran_project/lfortran % time ./a.out
-1863462912
./a.out 3.53s user 0.01s system 99% cpu 3.537 total That's 1.2x speed up. Because you are avoiding one allocate and one deallocate statement. |
Beautiful! I think this is an optimization pass then. Let's only run it for |
The speedup seems to disappear when you use
I can reproduce the speedup without Further analysis would be needed. However, since this is an optimization pass, we need to find a benchmark that is spedup with In Debug mode (the default) I think this should not be run even if it speeds things up. It might be that our allocatable and non-allocatable arrays are the same fast, since both are allocated on the heap in the same way. However, we still want this pass, since eventually the non-allocatable arrays should be faster, and they provide further optimization possibilities (to be implemented). So to move forward:
|
P.S. I tested this with subroutine prg(d1, d2, d3, s)
integer(4), dimension(:, :, :), allocatable :: d
integer(4), intent(in) :: d1
integer(4), intent(in) :: d2
integer(4), intent(in) :: d3
integer(4), intent(out) :: s
allocate(d(d1, d2, d3))
s = s + size(d)
deallocate(d) ! Implicit deallocate
deallocate(d) ! Implicit deallocate
end subroutine prg Into: subroutine prg(d1, d2, d3, s)
integer(4), intent(in) :: d1
integer(4), intent(in) :: d2
integer(4), intent(in) :: d3
integer(4), dimension(d1, d2, d3) :: d
integer(4), intent(out) :: s
s = s + size(d)
end subroutine prg So the pass clearly works. One can see that there are two implicit Btw, the new ASR->Fortran output is awesome, it's my favorite way of seeing what the ASR passes do (thanks to @Thirumalai-Shaktivel, @khushi-411, @Shaikh-Ubaid, @Pranavchiku and others). |
One more advantage of this pass, it helps in avoiding memory leaks wherever possible like, program debug
implicit none
integer :: i, s
s = 0
call prg(10, 10, 10, s)
print *, s
contains
subroutine prg(d1, d2, d3, s)
integer, intent(in) :: d1, d2, d3
integer, intent(out) :: s
integer, allocatable :: d(:, :, :)
s = 0
do i = 1, 20
allocate(d(d1, d2, d3))
s = s + size(d)
end do
end subroutine
end program With this pass (lf) 13:06:04:~/lfortran_project/lfortran % lfortran /Users/czgdp1807/lfortran_project/debug.f90 --fast -o a.out
(lf) 13:06:17:~/lfortran_project/lfortran % leaks --atExit -- ./a.out
a.out(3901) MallocStackLogging: could not tag MSL-related memory as no_footprint, so those pages will be included in process footprint - (null)
a.out(3901) MallocStackLogging: recording malloc and VM allocation stacks using lite mode
20000
Process 3901 is not debuggable. Due to security restrictions, leaks can only show or save contents of readonly memory of restricted processes.
Process: a.out [3901]
Path: /Users/USER/*/a.out
Load Address: 0x100f1c000
Identifier: a.out
Version: 0
Code Type: ARM64
Platform: macOS
Parent Process: leaks [3900]
Date/Time: 2023-10-31 13:06:19.833 +0530
Launch Time: 2023-10-31 13:06:19.797 +0530
OS Version: macOS 14.0 (23A344)
Report Version: 7
Analysis Tool: /usr/bin/leaks
Physical footprint: 3265K
Physical footprint (peak): 3265K
Idle exit: untracked
----
leaks Report Version: 4.0, multi-line stacks
Process 3901: 191 nodes malloced for 15 KB
Process 3901: 0 leaks for 0 total leaked bytes. Without this ASR pass (lf) 12:59:40:~/lfortran_project/lfortran % lfortran /Users/czgdp1807/lfortran_project/debug.f90 --skip-pass=promote_allocatable_to_nonallocatable --fast -o a.out
(lf) 12:59:48:~/lfortran_project/lfortran % leaks --atExit -- ./a.out
a.out(3822) MallocStackLogging: could not tag MSL-related memory as no_footprint, so those pages will be included in process footprint - (null)
a.out(3822) MallocStackLogging: recording malloc and VM allocation stacks using lite mode
20000
Process 3822 is not debuggable. Due to security restrictions, leaks can only show or save contents of readonly memory of restricted processes.
Process: a.out [3822]
Path: /Users/USER/*/a.out
Load Address: 0x102404000
Identifier: a.out
Version: 0
Code Type: ARM64
Platform: macOS
Parent Process: leaks [3821]
Date/Time: 2023-10-31 12:59:50.290 +0530
Launch Time: 2023-10-31 12:59:50.252 +0530
OS Version: macOS 14.0 (23A344)
Report Version: 7
Analysis Tool: /usr/bin/leaks
Physical footprint: 3313K
Physical footprint (peak): 3313K
Idle exit: untracked
----
leaks Report Version: 4.0, multi-line stacks
Process 3822: 210 nodes malloced for 91 KB
Process 3822: 19 leaks for 77824 total leaked bytes.
STACK OF 19 INSTANCES OF 'ROOT LEAK: <malloc in _lfortran_malloc>':
2 a.out 0x102407eb8 main + 44
1 liblfortran_runtime.0.20.3.dylib 0x1028a3e5c _lfortran_malloc + 24 lfortran_intrinsics.c:1580
0 libsystem_malloc.dylib 0x180b08a44 _malloc_zone_malloc_instrumented_or_legacy + 136
====
19 (76.0K) << TOTAL >>
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124008200> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124009200> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x12400c600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x12400d600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x12400e600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x12400f600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124010600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124011600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124012600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124013600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124014600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124015600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124016600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124017600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124018600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x124019600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x12401a600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x12401b600> [4096]
1 (4.00K) ROOT LEAK: <malloc in _lfortran_malloc 0x12401c600> [4096]
Binary Images:
0x102404000 - 0x102407fff +a.out (0) <4681AC7F-0AC3-3A87-823E-F4693369BA5E> /Users/*/a.out
0x102888000 - 0x10288bffb libLeaksAtExit.dylib (64561.82.2) <79DF57F9-0882-3396-B7CA-B66DDA910C71> /usr/lib/libLeaksAtExit.dylib
0x10289c000 - 0x1028a7fff +liblfortran_runtime.0.20.3.dylib (0) <7493D1DE-3E9A-3E86-A92A-C11FF3EDB73D> /Users/*/liblfortran_runtime.0.20.3.dylib
0x1808f8000 - 0x180946f08 libobjc.A.dylib (906) <1C0364B5-D7E3-3170-BAA6-9C28F400E7CC> /usr/lib/libobjc.A.dylib
0x180947000 - 0x1809da873 dyld (1.0.0 - 1122.1) <FFD8AB66-C9AB-31DF-AB80-3A3DFF367DDD> /usr/lib/dyld
0x1809db000 - 0x1809dfff8 libsystem_blocks.dylib (90) <A81CDBDB-D1B1-386E-9029-2F6E0F685E3A> /usr/lib/system/libsystem_blocks.dylib
0x1809e0000 - 0x180a26fff libxpc.dylib (2679.0.25) <84971772-939B-3130-8908-B7A38851A516> /usr/lib/system/libxpc.dylib
0x180a27000 - 0x180a41fff libsystem_trace.dylib (1481.0.12) <4AF0EA7D-E9F4-30E7-914D-FB292EDB7F37> /usr/lib/system/libsystem_trace.dylib
0x180a42000 - 0x180adefc7 libcorecrypto.dylib (1608.0.18) <B3AC12C0-8ED6-35A2-86C6-0BFA55BFF333> /usr/lib/system/libcorecrypto.dylib
0x180adf000 - 0x180b15fff libsystem_malloc.dylib (474.0.13) <00379936-C90E-3D9D-A569-3F97338F1472> /usr/lib/system/libsystem_malloc.dylib
0x180b16000 - 0x180b5cfff libdispatch.dylib (1462.0.4) <8790BA20-19EC-3A36-8975-E34382D9747C> /usr/lib/system/libdispatch.dylib
0x180b5d000 - 0x180b5ffff libsystem_featureflags.dylib (85) <0C18273A-0DB8-3522-853B-67A2187EEF07> /usr/lib/system/libsystem_featureflags.dylib
0x180b60000 - 0x180bdeff3 libsystem_c.dylib (1583.0.14) <1B84A7E4-8958-330C-98B8-27D491DFF69E> /usr/lib/system/libsystem_c.dylib
0x180bdf000 - 0x180c6cff7 libc++.1.dylib (1600.151) <93FBD681-FC78-33A3-AFFE-568702676142> /usr/lib/libc++.1.dylib
0x180c6d000 - 0x180c84fff libc++abi.dylib (1600.151) <43940F08-B65E-3888-8CD3-C52398EB1CA1> /usr/lib/libc++abi.dylib
0x180c85000 - 0x180cbffef libsystem_kernel.dylib (10002.1.13) <A7D3C07D-0A1E-3C4C-8FBA-66905E16BF99> /usr/lib/system/libsystem_kernel.dylib
0x180cc0000 - 0x180cccff3 libsystem_pthread.dylib (519) <E4DEBB6E-421D-33D0-9E17-77AE0E0FE4DC> /usr/lib/system/libsystem_pthread.dylib
0x180ccd000 - 0x180cf1fff libdyld.dylib (1122.1) <4BB77515-DBA8-3EDF-9AF7-3C9EAE959EA6> /usr/lib/system/libdyld.dylib
0x180cf2000 - 0x180cf8ffb libsystem_platform.dylib (306.0.1) <E7E2F10A-772A-397F-BD19-A7EA5A54E49F> /usr/lib/system/libsystem_platform.dylib
0x180cf9000 - 0x180d25ffb libsystem_info.dylib (583.0.1) <8F2A7530-B8D3-3705-B2B7-9347D9495223> /usr/lib/system/libsystem_info.dylib
0x184069000 - 0x184073ff7 libsystem_darwin.dylib (1583.0.14) <67509907-7BCE-3723-A305-0B042A41FDA5> /usr/lib/system/libsystem_darwin.dylib
0x1844db000 - 0x1844ebfff libsystem_notify.dylib (317) <7DA7113E-E87A-3E78-AD4F-77F6EC076E42> /usr/lib/system/libsystem_notify.dylib
0x1862fa000 - 0x186313ff7 libsystem_networkextension.dylib (1834.0.1.0.1) <9DBF57DB-0008-3584-89A3-1463AB9E4967> /usr/lib/system/libsystem_networkextension.dylib
0x18638a000 - 0x1863a1fff libsystem_asl.dylib (398) <2EA1DA8E-A94D-32EA-A64F-DD586FE72599> /usr/lib/system/libsystem_asl.dylib
0x187d3d000 - 0x187d45ff3 libsystem_symptoms.dylib (1848.0.5) <2D7D536C-1D08-304A-9E67-BFB1E99301FE> /usr/lib/system/libsystem_symptoms.dylib
0x18ad89000 - 0x18adb2ffb libsystem_containermanager.dylib (582.0.8) <8E716F51-A02E-3005-9F6F-84163BFAF1AE> /usr/lib/system/libsystem_containermanager.dylib
0x18bcf9000 - 0x18bcfdfff libsystem_configuration.dylib (1296.0.1) <5E85DB71-174E-3E61-A3C9-36D5E0CBBF46> /usr/lib/system/libsystem_configuration.dylib
0x18bcfe000 - 0x18bd03ff3 libsystem_sandbox.dylib (2169.0.12) <3BC895C5-B10E-39F4-B761-E79D5719FB1D> /usr/lib/system/libsystem_sandbox.dylib
0x18ca17000 - 0x18ca19ffb libquarantine.dylib (169) <3011123B-C35C-35FC-AB0B-8D775C9109D1> /usr/lib/system/libquarantine.dylib
0x18d11c000 - 0x18d121ffb libsystem_coreservices.dylib (152) <0ECFD7D1-0230-3C32-BDD7-4715D94A28E6> /usr/lib/system/libsystem_coreservices.dylib
0x18d43c000 - 0x18d472ff3 libsystem_m.dylib (3252) <72899DE6-A120-389C-90DC-B94C392EF655> /usr/lib/system/libsystem_m.dylib
0x18d477000 - 0x18d47effb libmacho.dylib (1009) <1A7038EC-EE49-35AE-8A3C-C311083795FB> /usr/lib/system/libmacho.dylib
0x18d49f000 - 0x18d4acff7 libcommonCrypto.dylib (600025) <BFDF8F55-D3DC-3A92-B8A1-8EF165A56F1B> /usr/lib/system/libcommonCrypto.dylib
0x18d4ad000 - 0x18d4b7fff libunwind.dylib (1600.112) <FB2051CF-36B6-348F-B8DF-5B2F702F6290> /usr/lib/system/libunwind.dylib
0x18d4b8000 - 0x18d4bffff liboah.dylib (312) <5A83FBC3-0944-3D77-8DF2-03E262F67AFA> /usr/lib/liboah.dylib
0x18d4c0000 - 0x18d4c9ff3 libcopyfile.dylib (196) <65612C42-C5E4-3821-B71D-DDE620FB014C> /usr/lib/system/libcopyfile.dylib
0x18d4ca000 - 0x18d4cdfff libcompiler_rt.dylib (103.1) <B29A99B2-7ADE-3371-A774-B690BEC3C406> /usr/lib/system/libcompiler_rt.dylib
0x18d4ce000 - 0x18d4d2ffb libsystem_collections.dylib (1583.0.14) <FA0C6F96-A737-3F29-B640-99E313873A21> /usr/lib/system/libsystem_collections.dylib
0x18d4d3000 - 0x18d4d5ffb libsystem_secinit.dylib (139) <1C92CD80-54E6-3E70-BB3C-7264A160AD7C> /usr/lib/system/libsystem_secinit.dylib
0x18d4d6000 - 0x18d4d8ffb libremovefile.dylib (70) <CE63E895-25E4-350C-BB79-0EFD3AC899E5> /usr/lib/system/libremovefile.dylib
0x18d4d9000 - 0x18d4d9ffb libkeymgr.dylib (31) <F7CE9486-FFF5-3CB8-B26F-75811EF4283A> /usr/lib/system/libkeymgr.dylib
0x18d4da000 - 0x18d4e2fff libsystem_dnssd.dylib (2200.0.8) <82426BAB-A0A9-3C99-8A86-DABB3A0BA072> /usr/lib/system/libsystem_dnssd.dylib
0x18d4e3000 - 0x18d4e8fff libcache.dylib (92) <C683623C-1FF6-3133-9E28-28672FDBA4D3> /usr/lib/system/libcache.dylib
0x18d4e9000 - 0x18d4eafff libSystem.B.dylib (1336) <F0A54B2D-8751-35F1-A3CF-F1A02F842211> /usr/lib/libSystem.B.dylib
0x1a565f000 - 0x1a566aff7 com.apple.MallocStackLogging (1.0 - 64561.91.2) <A193B4F2-B10B-3785-AF87-9E17FD9E2999> /System/Library/PrivateFrameworks/MallocStackLogging.framework/Versions/A/MallocStackLogging
0x220087000 - 0x22008afff libsystem_darwindirectory.dylib (86.0.2) <F1E52C1A-7156-3BED-AD5B-1CE3AFFDFCE3> /usr/lib/system/libsystem_darwindirectory.dylib Speed Up (lf) 14:08:20:~/lfortran_project/lfortran % lfortran /Users/czgdp1807/lfortran_project/debug.f90 --fast -o a.out
(lf) 14:08:28:~/lfortran_project/lfortran % time ./a.out
-1863462912
./a.out 0.00s user 0.00s system 74% cpu 0.007 total
(lf) 14:08:31:~/lfortran_project/lfortran % lfortran /Users/czgdp1807/lfortran_project/debug.f90 --skip-pass=promote_allocatable_to_nonallocatable --fast -o a.out
(lf) 14:08:49:~/lfortran_project/lfortran % time ./a.out
-1863462912
./a.out 2.42s user 1.03s system 93% cpu 3.679 total |
I will add the above example as a test. What do you say @certik? |
The code: do i = 1, 20
allocate(d(d1, d2, d3))
s = s + size(d)
end do should not leak with or without this pass. |
The test is quite slow (several seconds), so I would not add it, unless it runs "immediately". See this comment #2734 (comment) what needs to be done to finish the PR. |
TODO:
|
Actually if we add the test then we will be able to run it only in Release mode. Now I am not sure what to do here. I am thinking to keep this pass in both FAST and NOFAST mode but only in Release and it will not be there in Debug at all. And in testing the test will be run only in Release mode. |
How so? The test should run in all modes. The ASR pass introduced in this PR will only run in Release mode. Run the test in both Debug and Release. |
Well, I mean then I will have to reduce the size of the test. Then we will never now if the test is actually working or not. If I keep the size same then it will take longer in debug mode. |
As we discussed, the test should be small. Indeed, we will not know from just running it that a given performance pass was applied. To know that for sure, you would need to add an ASR check. We can. But as we discussed, it might not be worth it. |
@certik This is ready for review. |
However please do not merge. I will clean up the git history and merge. |
integer, intent(out) :: s | ||
integer, allocatable :: d(:, :, :) | ||
s = 0 | ||
do i = 1, 200000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Debug mode this runs in 21ms, which I think is fine.
time ./b.out
200000000
./b.out 0.01s user 0.00s system 84% cpu 0.021 total
So we can keep it. If needed, we can further lower this number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. But it segfaults on my mac at this number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, than that's a bug that we need to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to complete it so I am using a smaller number for now. I will fix the segmentation fault later because I have to complete the sync for now.
src/libasr/pass/pass_manager.h
Outdated
"insert_deallocate", | ||
#if !defined(WITH_LFORTRAN_ASSERT) | ||
"promote_allocatable_to_nonallocatable" | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this ifdef needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For using it only in Release mode. This condition checks for that. If WITH_LFORTRAN_ASSERT
is not defined then its Release mode. I think WITH_LFORTRAN_ASSERT
is defined only for Debug mode, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's Release/Debug for the LFortran compiler, not the LFortran runtime. Use the compiler_options.fast
for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than that, I think it is fine.
I will complete it tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is fine.
…_of_struct_arrays
Thank you for finishing it! |
Closes #2656