Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std/nre leaks memory on every created Regex due to auto-translated finalizers behaving differently under ARC/ORC #22868

Closed
Yardanico opened this issue Oct 24, 2023 · 0 comments · Fixed by #22872

Comments

@Yardanico
Copy link
Collaborator

Yardanico commented Oct 24, 2023

Description

First discussed in Telegram (https://t.me/ru_nim_talks/135383), basic repro:

import std/[nre]

proc processString() =
  let regexpUid = re",uid=(.*),mobile=1"
  let uidMatch = "2023-10-23 11:26:36 Instance 1 access_ok,uid=SDfv7772,mobile=1".find(regexpUid)
  if uidMatch.isSome:
    echo uidMatch.get.captures[0]

proc main() =
    for x in 0..200:
        processString()

main()

If you compile it and run under Valgrind with --mm:arc -d:danger -d:useMalloc, you get this:

==2587893== HEAP SUMMARY:
==2587893==     in use at exit: 420,124 bytes in 406 blocks
==2587893==   total heap usage: 4,231 allocs, 3,825 frees, 2,310,170 bytes allocated
==2587893== 
==2587893== 5,427 bytes in 201 blocks are definitely lost in loss record 5 of 6
==2587893==    at 0x4841848: malloc (vg_replace_malloc.c:431)
==2587893==    by 0x10C6A3: allocImpl__system_u1741 (@m..@s..@sStuff@snim@slib@ssystem.nim.c:324)
==2587893==    by 0x10C6A3: allocSharedImpl (@m..@s..@sStuff@snim@slib@ssystem.nim.c:329)
==2587893==    by 0x10C6A3: nimAsgnStrV2 (@m..@s..@sStuff@snim@slib@ssystem.nim.c:1194)
==2587893==    by 0x10C6A3: eqcopy___system_u2618 (@m..@s..@sStuff@snim@slib@ssystem.nim.c:1206)
==2587893==    by 0x11063B: initRegex__impureZnre_u1652 (@m..@s..@sStuff@snim@slib@simpure@snre.nim.c:1174)
==2587893==    by 0x1108C0: re__impureZnre_u6786 (@m..@s..@sStuff@snim@slib@simpure@snre.nim.c:1261)
==2587893==    by 0x111934: processString__tast50_u2 (@mtast2.nim.c:223)
==2587893==    by 0x111C02: main__tast50_u52 (@mtast2.nim.c:294)
==2587893==    by 0x111C02: NimMainModule (@mtast2.nim.c:351)
==2587893==    by 0x10A186: NimMainInner (@mtast2.nim.c:324)
==2587893==    by 0x10A186: NimMain (@mtast2.nim.c:335)
==2587893==    by 0x10A186: main (@mtast2.nim.c:343)
==2587893== 
==2587893== 413,256 bytes in 201 blocks are definitely lost in loss record 6 of 6
==2587893==    at 0x4841848: malloc (vg_replace_malloc.c:431)
==2587893==    by 0x10C004: allocImpl__system_u1741 (@m..@s..@sStuff@snim@slib@ssystem.nim.c:324)
==2587893==    by 0x10C004: allocSharedImpl (@m..@s..@sStuff@snim@slib@ssystem.nim.c:329)
==2587893==    by 0x10C004: alignedAlloc__system_u1881 (@m..@s..@sStuff@snim@slib@ssystem.nim.c:864)
==2587893==    by 0x10C004: newSeqPayloadUninit (@m..@s..@sStuff@snim@slib@ssystem.nim.c:895)
==2587893==    by 0x10C004: prepareSeqAddUninit (@m..@s..@sStuff@snim@slib@ssystem.nim.c:937)
==2587893==    by 0x10CBA8: setLen__impureZnre_u144 (@m..@s..@sStuff@snim@slib@ssystem.nim.c:1405)
==2587893==    by 0x10DF87: initTable__impureZnre_u129 (@m..@s..@sStuff@snim@slib@spure@scollections@stables.nim.c:373)
==2587893==    by 0x11046C: getNameToNumberTable__impureZnre_u55 (@m..@s..@sStuff@snim@slib@simpure@snre.nim.c:1124)
==2587893==    by 0x11076A: initRegex__impureZnre_u1652 (@m..@s..@sStuff@snim@slib@simpure@snre.nim.c:1231)
==2587893==    by 0x1108C0: re__impureZnre_u6786 (@m..@s..@sStuff@snim@slib@simpure@snre.nim.c:1261)
==2587893==    by 0x111934: processString__tast50_u2 (@mtast2.nim.c:223)
==2587893==    by 0x111C02: main__tast50_u52 (@mtast2.nim.c:294)
==2587893==    by 0x111C02: NimMainModule (@mtast2.nim.c:351)
==2587893==    by 0x10A186: NimMainInner (@mtast2.nim.c:324)
==2587893==    by 0x10A186: NimMain (@mtast2.nim.c:335)
==2587893==    by 0x10A186: main (@mtast2.nim.c:343)
==2587893== 
==2587893== LEAK SUMMARY:
==2587893==    definitely lost: 418,683 bytes in 402 blocks
==2587893==    indirectly lost: 0 bytes in 0 blocks
==2587893==      possibly lost: 0 bytes in 0 blocks
==2587893==    still reachable: 1,441 bytes in 4 blocks
==2587893==         suppressed: 0 bytes in 0 blocks

The root cause is the same as #19794 - Nim translates finalizers to destructors 1:1, so in this case the destructor won't free the Table in the Regex. The relevant code - https://github.com/nim-lang/Nim/blob/devel/lib/impure/nre.nim#L219:

proc destroyRegex(pattern: Regex) =
  pcre.free_substring(cast[cstring](pattern.pcreObj))
  if pattern.pcreExtra != nil:
    pcre.free_study(pattern.pcreExtra)

Simplified version that highlights the same behavior under Valgrind:

import std/tables

type
    RegexObj* = object
        someOther: Table[string, int]
    Regex* = ref RegexObj

proc destroyRegex(r: Regex) = 
    echo "doing some destruction"

proc newRegex(): Regex = 
    new(result, destroyRegex)
    result.someOther["test"] = 1

proc main = 
    var r = newRegex()

main()

Nim Version

Nim Compiler Version 2.1.1 [Linux: amd64]
Compiled at 2023-10-24
Copyright (c) 2006-2023 by Andreas Rumpf

git hash: 3fd4e684331a22be84e940833faee111226945ea
active boot switches: -d:release

Possible Solution

As Araq explained in #19794 - one way is to explicitly call the generic system destructor in the finalizer. This worked in my old Nim devel build from August, but it doesn't seem to work on the latest devel anymore:

proc destroyRegex(pattern: Regex) =
  pcre.free_substring(cast[cstring](pattern.pcreObj))
  if pattern.pcreExtra != nil:
    pcre.free_study(pattern.pcreExtra)
  if pattern != nil:
    system.`=destroy`(pattern[])

Gives:

/home/dian/Stuff/nim/lib/impure/nre.nim(254, 6) Error: cannot bind another '=destroy' to: Regex:ObjectType; previous declaration was constructed here implicitly: /home/dian/Stuff/nim/lib/impure/nre.nim(224, 22)

on newest devel.

It did fix the main huge leak, but I don't know if this would then work under --mm:refc or might have some other unintended side effects.

Though, even after fixing it, there's still another weird leak, but it's not as severe:

==2582680== 2,268 bytes in 84 blocks are definitely lost in loss record 5 of 6
==2582680==    at 0x48469B3: calloc (vg_replace_malloc.c:1554)
==2582680==    by 0x10D2E8: alloc0Impl__system_u1759 (@m..@s..@sStuff@snim@slib@ssystem.nim.c:327)
==2582680==    by 0x10D2E8: allocShared0Impl__system_u1772 (@m..@s..@sStuff@snim@slib@ssystem.nim.c:333)
==2582680==    by 0x10D2E8: nimAsgnStrV2 (@m..@s..@sStuff@snim@slib@ssystem.nim.c:1174)
==2582680==    by 0x10D2E8: eqcopy___stdZassertions_u30 (@m..@s..@sStuff@snim@slib@ssystem.nim.c:1186)
==2582680==    by 0x11133B: initRegex__impureZnre_u1671 (@m..@s..@sStuff@snim@slib@simpure@snre.nim.c:1186)
==2582680==    by 0x1115C0: re__impureZnre_u6881 (@m..@s..@sStuff@snim@slib@simpure@snre.nim.c:1273)
==2582680==    by 0x112D3A: processString__tast50_u11 (@mtast2.nim.c:267)
==2582680==    by 0x112F59: scanStrings__tast50_u62 (@mtast2.nim.c:347)
==2582680==    by 0x1130AD: main__tast50_u65 (@mtast2.nim.c:372)
==2582680==    by 0x1130AD: NimMainModule (@mtast2.nim.c:432)
==2582680==    by 0x10A286: NimMainInner (@mtast2.nim.c:398)
==2582680==    by 0x10A286: NimMain (@mtast2.nim.c:409)
==2582680==    by 0x10A286: main (@mtast2.nim.c:417)

Additional Information

Creating the same regexes each time is usually a program logic bug (they should be created globally or inside a table/array/etc so they are reused), so this isn't very high priority I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants