Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak on unset of associative array #64

Merged
merged 6 commits into from
Jul 9, 2020

Conversation

JohnoKing
Copy link

@JohnoKing JohnoKing commented Jul 6, 2020

Associative arrays aren't completely freed from memory in nv_delete, which causes a memory leak to occur when an associative array is unset. This is fixed by freeing all elements of an associative array with a loop before they become inaccessible. Reproducer from https://www.mail-archive.com/ast-users@lists.research.att.com/msg01016.html:

$ cat ./reproducer.sh
#!/bin/ksh
export UNIX95=1
printf 'Memory usage before:'
ps -p $$ -o vsz=
typeset -A stuff
typeset -lui i=0 
for (( i=0; i<1000; i++ )); do
    unset stuff[xyz]
    typeset -A stuff[xyz]
    stuff[xyz][elem0]="data0"
    stuff[xyz][elem1]="data1"
    stuff[xyz][elem2]="data2"
    stuff[xyz][elem3]="data3"
    stuff[xyz][elem4]="data4"
done
printf 'Memory usage after: '
ps -p $$ -o vsz=
$ ksh ./reproducer.sh
Memory usage before: 8592
Memory usage after:  9456

@McDutchie
Copy link

McDutchie commented Jul 6, 2020

For me, the regression test is still reporting a leak. After applying this diff:

diff --git a/src/cmd/ksh93/tests/leaks.sh b/src/cmd/ksh93/tests/leaks.sh
index 6a915c7..93a2ae8 100755
--- a/src/cmd/ksh93/tests/leaks.sh
+++ b/src/cmd/ksh93/tests/leaks.sh
@@ -110,7 +110,8 @@ do
 	stuff[xyz][elem4]="data4"
 done
 after=$(getmem)
-(( after > before )) && err_exit 'unset of associative array causes memory leak'
+(( after > before )) && err_exit 'unset of associative array causes memory leak' \
+	"(leaked $((after - before)) KiB)"
 
 # ======
 exit $((Errors<125?Errors:125))

the output is:

$ bin/shtests -p leaks
#### Regression-testing /usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh ####
test leaks begins at 2020-07-06+22:06:17
	leaks.sh[113]: unset of associative array causes memory leak (leaked 27648 KiB)
test leaks failed at 2020-07-06+22:06:17 with exit code 1 [ 5 tests 1 error ]
Total errors: 1
CPU time       user:      system:
main:       0m00.00s     0m00.00s
tests:      0m00.22s     0m00.34s

The amount of leak reported varies each time I run the test; I get anywhere between 9216 KiB and 36864 KiB.

This is on macOS 10.14.6. And no, I did not forget to recompile :)

@McDutchie
Copy link

Can confirm that this fixes it on FreeBSD (x86_64) and Linux (Debian 10, x86_64). Also, when I run the test before recompiling, the reported memory leak is much smaller, and consistent (832 KiB on FreeBSD, 864 KiB on Debian).

Which makes me wonder if something is wrong with ps on macOS. But then why don't the other leaks.sh regression tests fail?

@JohnoKing
Copy link
Author

JohnoKing commented Jul 6, 2020

Does building ksh with bin/package make ast-ksh CCFLAGS='-O2 -D_std_malloc' change anything significantly? Vmalloc provides _std_malloc as a compile time option to use the system's malloc for memory allocation (making vmalloc effectively a wrapper):

/*
* define _AST_std_malloc=1 to force the standard malloc
* if _map_malloc is also defined then _ast_malloc etc.
* will simply call malloc etc.
*/
#if !defined(_AST_std_malloc) && __CYGWIN__
#define _AST_std_malloc 1
#endif

#if !_mem_win32 && !_mem_sbrk && !_mem_mmap_anon && !_mem_mmap_zero
#undef _std_malloc
#define _std_malloc 1 /* use native malloc/free/realloc */
#endif

On Arch Linux the reported memory leak goes down from 800KiB to 780KiB (before patching) when -D_std_malloc is used, although it causes a different memory leak:

$ bin/shtests leaks
#### Regression-testing /home/johno/Documents/GitRepos/Clones/ksh/arch/linux.i386-64/bin/ksh ####
test leaks begins at 2020-07-06+14:43:26
test leaks passed at 2020-07-06+14:43:26 [ 5 tests 0 errors ]
test leaks(C.UTF-8) begins at 2020-07-06+14:43:26
test leaks(C.UTF-8) passed at 2020-07-06+14:43:26 [ 5 tests 0 errors ]
test leaks(shcomp) begins at 2020-07-06+14:43:26
	shcomp-leaks.ksh[86]: memory leak with read -C when deleting compound variable (leaked 64 KiB)
test leaks(shcomp) failed at 2020-07-06+14:43:27 with exit code 1 [ 5 tests 1 error ]
Total errors: 1

@McDutchie
Copy link

Yes, I'd say that -D_std_malloc changes things significantly. The associative array unset failure goes away, but three other failures take its place:

$ bin/shtests -p leaks
#### Regression-testing /usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh ####
test leaks begins at 2020-07-07+01:28:17
	leaks.sh[67]: variable value reset memory leak -- 9216 KiB after 1000 iterations
	leaks.sh[86]: memory leak with read -C when deleting compound variable (leaked 1092 KiB)
	leaks.sh[94]: memory leak with read -C when using <<< (leaked 18432 KiB)
test leaks failed at 2020-07-07+01:28:17 with exit code 3 [ 5 tests 3 errors ]
Total errors: 3
CPU time       user:      system:
main:       0m00.00s     0m00.00s
tests:      0m00.22s     0m00.36s

@McDutchie
Copy link

McDutchie commented Jul 7, 2020

Even if I change -o vsz= to -o rss= (resident set size a.k.a. real size, a nonstandard but common ps option) in the getmem function, ps on macOS keeps producing nondeterministic results -- different on every test run, sometimes different by several orders of magnitude. I don't know much about memory management but this seems absurd, it looks like ps on macOS cannot be trusted. I think we may have to go with your original idea of incorporating the vmstate builtin to run these tests. We can always remove it in a release branch shortly before doing a final release.

@JohnoKing
Copy link
Author

JohnoKing commented Jul 7, 2020

If ps is the problem it could be worth looking into a different command like vmmap (although it is platform specific).

@McDutchie
Copy link

The shortest output I can get vmmap to produce on macOS is below. (Also requested pages because the default uses formats like 75.6M which is useless for this purpose.)

If I change getmem() in leaks.sh to extract the total RESIDENT PAGES number using awk, as below, then the results are nondeterministic again, different for every test run. Then I tried getting the RESIDENT PAGES column from the line that starts with DefaultMallocZone_ instead. Same thing: nondeterministic results, random failures.

$ vmmap -summary -pages $$
Process:         ksh [81463]
Path:            /usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh
Load Address:    0x1015f9000
Identifier:      ksh
Version:         ???
Code Type:       X86-64
Parent Process:  bash [974]

Date/Time:       2020-07-07 21:38:53.647 +0100
Launch Time:     2020-07-07 21:38:46.280 +0100
OS Version:      Mac OS X 10.14.6 (18G5033)
Report Version:  7
Analysis Tool:   /Applications/Xcode.app/Contents/Developer/usr/bin/vmmap
Analysis Tool Version:  Xcode 11.2.1 (11B500)
----

ReadOnly portion of Libraries: Total=236.4M resident=70.7M(30%) swapped_out_or_unallocated=165.7M(70%)
Writable regions: Total=26.6M written=28K(0%) resident=436K(2%) swapped_out=0K(0%) unallocated=26.2M(98%)

                                  VIRTUAL   RESIDENT      DIRTY    SWAPPED   VOLATILE     NONVOL      EMPTY   REGION 
REGION TYPE                         PAGES      PAGES      PAGES      PAGES      PAGES      PAGES      PAGES    COUNT (non-coalesced) 
===========                       =======   ========      =====    =======   ========     ======      =====  ======= 
Kernel Alloc Once                       2          1          1          0          0          0          0        1 
MALLOC guard page                       4          0          0          0          0          0          0        4 
MALLOC metadata                        15         15         15          0          0          0          0        5 
MALLOC_LARGE                           61         29         29          0          0          0          0        3         see MALLOC ZONE table below
MALLOC_LARGE (empty)                   16          0          0          0          0          0          0        1         see MALLOC ZONE table below
MALLOC_SMALL                         4096         27         27          0          0          0          0        2         see MALLOC ZONE table below
MALLOC_TINY                           512         13         13          0          0          0          0        2         see MALLOC ZONE table below
STACK GUARD                         14336          0          0          0          0          0          0        1 
Stack                                2048         14         14          0          0          0          0        1 
__DATA                                593        306         68          0          0          0          0       44 
__LINKEDIT                          56974      15951          0          0          0          0          0        3 
__TEXT                               3538       2138          0          0          0          0          0       42 
shared memory                           2          2          2          0          0          0          0        2 
===========                       =======   ========      =====    =======   ========     ======      =====  ======= 
TOTAL                               82197      18496        169          0          0          0          0      111 

                                 VIRTUAL   RESIDENT      DIRTY    SWAPPED ALLOCATION      BYTES DIRTY+SWAP          REGION
MALLOC ZONE                        PAGES      PAGES      PAGES      PAGES      COUNT  ALLOCATED  FRAG SIZE  % FRAG   COUNT
===========                      =======  =========  =========  =========  =========  =========  =========  ======  ======
DefaultMallocZone_0x101767000       4669         69         69          0        345       365K         0K      0%       7
function getmem
{
        vmmap -summary -pages "$$" | awk '/^DefaultMallocZone_/ { print $3; }'
}

(note: below, KiB should of course be pages, I didn't bother to change that for this test)

$ bin/shtests -p leaks
#### Regression-testing /usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh ####
test leaks begins at 2020-07-07+22:07:30
	leaks.sh[67]: variable value reset memory leak -- 5 KiB after 1000 iterations
	leaks.sh[94]: memory leak with read -C when using <<< (leaked 17 KiB)
	leaks.sh[113]: unset of associative array causes memory leak (leaked 11 KiB)
test leaks failed at 2020-07-07+22:07:34 with exit code 3 [ 5 tests 3 errors ]
Total errors: 3
CPU time       user:      system:
main:       0m00.00s     0m00.00s
tests:      0m01.96s     0m02.44s
$ bin/shtests -p leaks
#### Regression-testing /usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh ####
test leaks begins at 2020-07-07+22:07:35
	leaks.sh[67]: variable value reset memory leak -- 5 KiB after 1000 iterations
	leaks.sh[94]: memory leak with read -C when using <<< (leaked 18 KiB)
	leaks.sh[113]: unset of associative array causes memory leak (leaked 10 KiB)
test leaks failed at 2020-07-07+22:07:39 with exit code 3 [ 5 tests 3 errors ]
Total errors: 3
CPU time       user:      system:
main:       0m00.00s     0m00.00s
tests:      0m01.95s     0m02.43s
$ bin/shtests -p leaks
#### Regression-testing /usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh ####
test leaks begins at 2020-07-07+22:07:40
	leaks.sh[67]: variable value reset memory leak -- 5 KiB after 1000 iterations
	leaks.sh[94]: memory leak with read -C when using <<< (leaked 20 KiB)
	leaks.sh[113]: unset of associative array causes memory leak (leaked 10 KiB)
test leaks failed at 2020-07-07+22:07:44 with exit code 3 [ 5 tests 3 errors ]
Total errors: 3
CPU time       user:      system:
main:       0m00.00s     0m00.00s
tests:      0m01.94s     0m02.40s
$ bin/shtests -p leaks
#### Regression-testing /usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh ####
test leaks begins at 2020-07-07+22:07:46
	leaks.sh[94]: memory leak with read -C when using <<< (leaked 20 KiB)
	leaks.sh[113]: unset of associative array causes memory leak (leaked 9 KiB)
test leaks failed at 2020-07-07+22:07:50 with exit code 2 [ 5 tests 2 errors ]
Total errors: 2
CPU time       user:      system:
main:       0m00.00s     0m00.00s
tests:      0m01.95s     0m02.40s
$ bin/shtests -p leaks
#### Regression-testing /usr/local/src/ksh93/ksh/arch/darwin.i386-64/bin/ksh ####
test leaks begins at 2020-07-07+22:07:58
	leaks.sh[86]: memory leak with read -C when deleting compound variable (leaked 6 KiB)
	leaks.sh[94]: memory leak with read -C when using <<< (leaked 17 KiB)
	leaks.sh[113]: unset of associative array causes memory leak (leaked 8 KiB)
test leaks failed at 2020-07-07+22:08:02 with exit code 3 [ 5 tests 3 errors ]
Total errors: 3
CPU time       user:      system:
main:       0m00.00s     0m00.00s
tests:      0m01.95s     0m02.41s

McDutchie added a commit that referenced this pull request Jul 8, 2020
'ps' does not always give reliable results; on macOS, 'ps' appears
to produce nondeterministic (i.e. randomly varying) results for
'vsz' and 'rss', making it unusable for memory leak tests. See:
#64 (comment)
and further comments.

So let's compile in the vmstate builtin so that we can make sure to
measure things properly. It also reports bytes instead of 1024-byte
blocks, so smaller leaks can be detected.

To be decided: whether or not to disable the vmstate builtin for
release builds in order to save about 12K in the ksh binary.

src/cmd/ksh93/data/builtins.c:
- Add vmstate to the list of builtins that are compiled in.

src/cmd/ksh93/tests/leaks.sh:
- getmem(): get size using: vmstate --format='%(busy_size)u'
  (Using busy_size instead of size seems to make more sense as it
  excludes freed blocks. See vmstate --man)
- Introduce a $unit variable for reporting leaks and set it to
  'bytes'; this makes it easier to change the unit in future.
- Since the tests are now more sensitive, initialise all variables
  before use to avoid false leak detections.
- The last test seemed to need a few more 'read' invocations in
  order to get memory usage to a steady state before the test.
src/cmd/ksh93/sh/name.c:
- Associative arrays weren't being properly freed from memory,
  which was causing a memory leak. This is solved by properly
  freeing associative arrays from memory in 'nv_delete'.
  Reproducer from https://www.mail-archive.com/ast-users@lists.research.att.com/msg01016.html:
  $ typeset -A stuff
  $ typeset -lui i=0
  $ for (( i=0; i<1000; i++ )); do
  $     unset stuff[xyz]
  $     typeset -A stuff[xyz]
  $     stuff[xyz][elem0]="data0"
  $     stuff[xyz][elem1]="data1"
  $     stuff[xyz][elem2]="data2"
  $     stuff[xyz][elem3]="data3"
  $     stuff[xyz][elem4]="data4"
  $ done

src/cmd/ksh93/tests/leaks.sh:
- Add a regression test for memory leaking when an associative
  array is unset.
JohnoKing and others added 2 commits July 8, 2020 16:50
src/cmd/ksh93/tests/leaks.sh:
- The associative array wasn't being unset after the loop finished,
  which caused a spurious failure when the more precise vmstate
  builtin is used instead of ps(1). Fix this with an additional unset
  placed after the loop.
@McDutchie McDutchie merged commit e70925c into ksh93:master Jul 9, 2020
@JohnoKing JohnoKing deleted the assoc-leak branch July 9, 2020 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants