Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to use NoGhcTc. #1

Merged
merged 1 commit into from
Oct 4, 2018
Merged

Refactor to use NoGhcTc. #1

merged 1 commit into from
Oct 4, 2018

Conversation

goldfirere
Copy link

In a few places in the AST, we need to store a type in a non-type
AST, like in type annotations, type applications, and pattern
annotations. In all of these places, we need a, e.g.,
HsExpr GhcTc to store an HsType GhcRn -- note that the type never
makes it past GhcRn. Before this patch, this was done via the extension
fields in the AST. This is unsatisfying, though, because the e.g.
type in an annotated expression is surely not an extension, but the
raison d'être for the AST node in the first place.

This patch introduces a new type family NoGhcTc such that
NoGhcTc GhcTc = GhcRn and behaves like the type-level identity
otherwise. This new type family is used in a few places that
benefit from it. There should be no outward change in behavior,
and hence, no tests.

In a few places in the AST, we need to store a type in a non-type
AST, like in type annotations, type applications, and pattern
annotations. In all of these places, we need a, e.g.,
HsExpr GhcTc to store an HsType GhcRn -- note that the type never
makes it past GhcRn. Before this patch, this was done via the extension
fields in the AST. This is unsatisfying, though, because the e.g.
type in an annotated expression is surely not an extension, but the
raison d'être for the AST node in the first place.

This patch introduces a new type family NoGhcTc such that
NoGhcTc GhcTc = GhcRn and behaves like the type-level identity
otherwise. This new type family is used in a few places that
benefit from it. There should be no outward change in behavior,
and hence, no tests.
@m-yac m-yac merged commit 30b8720 into m-yac:wip/yac Oct 4, 2018
m-yac pushed a commit that referenced this pull request Oct 22, 2018
While debugging #15285 I realized that free block lists (free_list in
BlockAlloc.c) get corrupted when multiple scavenge threads allocate and
release blocks concurrently. Here's a picture of one such race:

    Thread 2 (Thread 32573.32601):
    #0  check_tail
        (bd=0x940d40 <stg_TSO_info>) at rts/sm/BlockAlloc.c:860
    #1  0x0000000000928ef7 in checkFreeListSanity
        () at rts/sm/BlockAlloc.c:896
    ghc#2  0x0000000000928979 in freeGroup
        (p=0x7e998ce02880) at rts/sm/BlockAlloc.c:721
    ghc#3  0x0000000000928a17 in freeChain
        (bd=0x7e998ce02880) at rts/sm/BlockAlloc.c:738
    ghc#4  0x0000000000926911 in freeChain_sync
        (bd=0x7e998ce02880) at rts/sm/GCUtils.c:80
    ghc#5  0x0000000000934720 in scavenge_capability_mut_lists
        (cap=0x1acae80) at rts/sm/Scav.c:1665
    ghc#6  0x000000000092b411 in gcWorkerThread
        (cap=0x1acae80) at rts/sm/GC.c:1157
    ghc#7  0x000000000090be9a in yieldCapability
        (pCap=0x7f9994e69e20, task=0x7e9984000b70, gcAllowed=true) at rts/Capability.c:861
    ghc#8  0x0000000000906120 in scheduleYield
        (pcap=0x7f9994e69e50, task=0x7e9984000b70) at rts/Schedule.c:673
    ghc#9  0x0000000000905500 in schedule
        (initialCapability=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:293
    ghc#10 0x0000000000908d4f in scheduleWorker
        (cap=0x1acae80, task=0x7e9984000b70) at rts/Schedule.c:2554
    ghc#11 0x000000000091a30a in workerStart
        (task=0x7e9984000b70) at rts/Task.c:444
    ghc#12 0x00007f99937fa6db in start_thread
        (arg=0x7f9994e6a700) at pthread_create.c:463
    ghc#13 0x000061654d59f88f in clone
        () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

    Thread 1 (Thread 32573.32573):
    #0  checkFreeListSanity
        () at rts/sm/BlockAlloc.c:887
    #1  0x0000000000928979 in freeGroup
        (p=0x7e998d303540) at rts/sm/BlockAlloc.c:721
    ghc#2  0x0000000000926f23 in todo_block_full
        (size=513, ws=0x1aa8ce0) at rts/sm/GCUtils.c:264
    ghc#3  0x00000000009583b9 in alloc_for_copy
        (size=513, gen_no=0) at rts/sm/Evac.c:80
    ghc#4  0x000000000095850d in copy_tag_nolock
        (p=0x7e998c675f28, info=0x421d98 <Main_Large_con_info>, src=0x7e998d075d80, size=513,
        gen_no=0, tag=1) at rts/sm/Evac.c:153
    ghc#5  0x0000000000959177 in evacuate
        (p=0x7e998c675f28) at rts/sm/Evac.c:715
    ghc#6  0x0000000000932388 in scavenge_small_bitmap
        (p=0x7e998c675f28, size=1, bitmap=0) at rts/sm/Scav.c:271
    ghc#7  0x0000000000934aaf in scavenge_stack
        (p=0x7e998c675f28, stack_end=0x7e998c676000) at rts/sm/Scav.c:1908
    ghc#8  0x0000000000934295 in scavenge_one
        (p=0x7e998c66e000) at rts/sm/Scav.c:1466
    ghc#9  0x0000000000934662 in scavenge_mutable_list
        (bd=0x7e998d300440, gen=0x1b1d880) at rts/sm/Scav.c:1643
    ghc#10 0x0000000000934700 in scavenge_capability_mut_lists
        (cap=0x1aaa340) at rts/sm/Scav.c:1664
    ghc#11 0x00000000009299b6 in GarbageCollect
        (collect_gen=0, do_heap_census=false, gc_type=2, cap=0x1aaa340, idle_cap=0x1b38aa0)
        at rts/sm/GC.c:378
    ghc#12 0x0000000000907a4a in scheduleDoGC
        (pcap=0x7ffdec5b5310, task=0x1b36650, force_major=false) at rts/Schedule.c:1798
    ghc#13 0x0000000000905de7 in schedule
        (initialCapability=0x1aaa340, task=0x1b36650) at rts/Schedule.c:546
    ghc#14 0x0000000000908bc4 in scheduleWaitThread
        (tso=0x7e998c0067c8, ret=0x0, pcap=0x7ffdec5b5430) at rts/Schedule.c:2537
    ghc#15 0x000000000091b5a0 in rts_evalLazyIO
        (cap=0x7ffdec5b5430, p=0x9c11f0, ret=0x0) at rts/RtsAPI.c:530
    ghc#16 0x000000000091ca56 in hs_main
        (argc=1, argv=0x7ffdec5b5628, main_closure=0x9c11f0, rts_config=...) at rts/RtsMain.c:72
    ghc#17 0x0000000000421ea0 in main
        ()

In particular, dbl_link_onto() which is used to add a freed block to a
doubly-linked free list is not thread safe and corrupts the list when
called concurrently.

Note that thread 1 is to blame here as thread 2 is properly taking the
spinlock. With this patch we now take the spinlock when freeing a todo
block in GC, avoiding this race.

Test Plan:
- Tried slow validate locally: this patch does not introduce new failures.
- circleci: https://circleci.com/gh/ghc/ghc-diffs/283 The test got killed
  because it took 5 hours but T7919 (which was previously failing on circleci)
  passed.

Reviewers: simonmar, bgamari, erikd

Reviewed By: simonmar

Subscribers: rwbarton, carter

GHC Trac Issues: #15285

Differential Revision: https://phabricator.haskell.org/D5115
m-yac pushed a commit that referenced this pull request Oct 22, 2018
Add a RTS option -xp to load PIC object anywhere in address space. We do
this by relaxing the requirement of <0x80000000 result of
`mmapForLinker` and implying USE_CONTIGUOUS_MMAP.

We also need to change calls to `ocInit` and `ocGetNames` to avoid
dangling pointers when the address of `oc->image` is changed by
`ocAllocateSymbolExtra`.

Test Plan:
```
$ uname -a
Linux localhost 4.18.8-arch1-1-ARCH #1 SMP PREEMPT Sat Sep 15 20:34:48
UTC 2018 x86_64 GNU/Linux
$ cat mk/build.mk
DYNAMIC_GHC_PROGRAMS = NO
DYNAMIC_BY_DEFAULT = NO

GhcRTSWays += thr_debug
EXTRA_HC_OPTS += -debug
WAY_p_HC_OPTS += -fPIC -fexternal-dynamic-refs
$ inplace/bin/ghc-stage2 --interactive -prof +RTS -xp
GHCi, version 8.7.20180928: http://www.haskell.org/ghc/  :? for help
ghc-stage2: R_X86_64_32 relocation out of range:
ghczmprim_GHCziTypes_ZMZN_closure = 7f690bffab59
Recompile
/data/users/watashi/ghc/libraries/ghc-prim/dist-install/build/HSghc-prim
-0.5.3.o with -fPIC -fexternal-dynamic-refs.
ghc-stage2: unable to load package `ghc-prim-0.5.3'
$ strace -f -e open,mmap inplace/bin/ghc-stage2 --interactive -prof
-fexternal-interpreter  -opti+RTS -opti-xp
...
[pid 1355283]
open("/data/users/watashi/ghc/libraries/base/dist-install/build/libHSbas
e-4.12.0.0_p.a", O_RDONLY) = 14
[pid 1355283] mmap(NULL, 8192, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6a84842000
[pid 1355283]
open("/data/users/watashi/ghc/libraries/base/dist-install/build/libHSbas
e-4.12.0.0_p.a", O_RDONLY) = 14
[pid 1355283] mmap(NULL, 8192, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6a84676000
...
Prelude> System.Posix.Process.getProcessID
...
[pid 1355283]
open("/data/users/watashi/ghc/libraries/unix/dist-install/build/libHSuni
x-2.7.2.2_p.a", O_RDONLY) = 14
[pid 1355283] mmap(NULL, 45056, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6a67d60000
[pid 1355283]
open("/data/users/watashi/ghc/libraries/unix/dist-install/build/libHSuni
x-2.7.2.2_p.a", O_RDONLY) = 14
[pid 1355283] mmap(NULL, 57344, PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6a67d52000
...
```

```
$ uname -a
Darwin watashis-iMac.local 18.0.0 Darwin Kernel Version 18.0.0: Wed Aug
22 20:13:40 PDT 2018; root:xnu-4903.201.2~1/RELEASE_X86_64 x86_64
$ mv
/Users/watashi/gao/ghc/libraries/integer-gmp/dist-install/build/HSintege
r-gmp-1.0.2.0.o{,._DISABLE_GHC_ISSUE_15105}
$ inplace/bin/ghc-stage2 --interactive  +RTS -xp
GHCi, version 8.7.20181003: http://www.haskell.org/ghc/  :? for help
Prelude> System.Posix.Process.getProcessID
42791
Prelude> Data.Set.fromList [1 .. 10]
fromList [1,2,3,4,5,6,7,8,9,10]
Prelude>
Leaving GHCi.
$ inplace/bin/ghc-stage2 --interactive -prof -fexternal-interpreter
GHCi, version 8.7.20181003: http://www.haskell.org/ghc/  :? for help
Prelude> System.Posix.Process.getProcessID
42806
Prelude> Data.Set.fromList [1 .. 10]
fromList [1,2,3,4,5,6,7,8,9,10]
Prelude>
Leaving GHCi.
```

Also test with something that used to hit the 2Gb limit and it loads
and runs without problem.

Reviewers: simonmar, bgamari, angerman, Phyx, hvr, erikd

Reviewed By: simonmar

Subscribers: rwbarton, carter

Differential Revision: https://phabricator.haskell.org/D5195
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants