Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share ModuleGraphs for all files #3232

Merged
merged 7 commits into from
Aug 4, 2023
Merged

Share ModuleGraphs for all files #3232

merged 7 commits into from
Aug 4, 2023

Conversation

wz1000
Copy link
Collaborator

@wz1000 wz1000 commented Sep 28, 2022

Remove GetDependencyInformation in favour of GetModuleGraph

Computing and storing GetDependencyInformation for each file essentially individually means
that we perform downsweep on each file individually, wasting a lot of work and using an excessive
amount of memory to store all these duplicated graphs individually.

However, we already have the GetModuleGraph rule, which we need to compute before compiling
files any way due to being depended on by NeedsCompilation, which needs to know if any reverse
dependencies of the module we are compiling requires TH, which meant that each file already depends on
the results of downsweep for the whole project.

Instead, we can compute the whole graph once when we execute the GetModuleGraph rule and even use this inside HscEnv.hsc_mod_graph to avoid reconstructing the ModuleGraph on each invocation of GhcSessionDeps.

There may be concerns about excessive build churn due to any change to the result of GetModuleGraph
invalidating the result of GhcSessionDeps too often, but note that this only happens when something
in the header of a module changes, and this could be solved easily be re-introducing
a version of GetDependencyInformation with early cutoff that essentially returns the result of GetModuleGraph
but includes the hash of only the ModSummarys in the downward dependency closure of the file.

@pepeiborra
Copy link
Collaborator

pepeiborra commented Sep 28, 2022

How do you deal with the cycles issue? If there are cycles in the module graph, then you end up with cycles in the build graph, and hls-graph cannot handle them.

This has been discussed before: #1359, #2323 (comment)

@wz1000
Copy link
Collaborator Author

wz1000 commented Sep 28, 2022

How do you deal with the cycles issue? If there are cycles in the module graph, then you end up with cycles in the build graph, and hls-graph cannot handle them.

This has been discussed before: #1359, #2323 (comment)

There is no rule cycle because we now construct the ModuleGraph a single time using all the known targets. A GetModuleGraph rule doesn't depend on other GetModuleGraph rules (infact it is a defineNoFile rule, so there is only one instance). The rawDependencyInformation function already dealt with cycles by maintaining state for modules which have already been visited.

@pepeiborra
Copy link
Collaborator

pepeiborra commented Oct 8, 2022

There may be concerns about excessive build churn due to any change to the result of GetModuleGraph
invalidating the result of GhcSessionDeps too often, but note that this only happens when something
in the header of a module changes,

I have concerns about this. It's not just build churn, but the fact that to load a module first I need to compute the module graph for the entire codebase. How is this going to affect init times for large codebases with >10.000 modules?

this could be solved easily be re-introducing
a version of GetDependencyInformation with early cutoff that essentially returns the result of GetModuleGraph
but includes the hash of only the ModSummarys in the downward dependency closure of the file.

By all means, include this in the PR. But it doesn't solve the startup cost issue.

@wz1000 wz1000 force-pushed the wip/module-graph-mem branch 2 times, most recently from d286a48 to aea4709 Compare December 9, 2022 13:18
@wz1000
Copy link
Collaborator Author

wz1000 commented Dec 20, 2022

@pepeiborra I've replace checkForImportCycles with a fullModuleGraph option that tries to restore the older behaviour.

@pepeiborra
Copy link
Collaborator

@pepeiborra I've replace checkForImportCycles with a fullModuleGraph option that tries to restore the older behaviour.

where is the option? I cannot see it

@wz1000
Copy link
Collaborator Author

wz1000 commented Dec 20, 2022

@pepeiborra I've replace checkForImportCycles with a fullModuleGraph option that tries to restore the older behaviour.

where is the option? I cannot see it

Check now, the push was still going when I made the comment.

@wz1000 wz1000 force-pushed the wip/module-graph-mem branch 6 times, most recently from 2f059f8 to 616d328 Compare December 20, 2022 11:42
@wz1000 wz1000 enabled auto-merge (rebase) December 20, 2022 11:48
@pepeiborra
Copy link
Collaborator

@pepeiborra I've replace checkForImportCycles with a fullModuleGraph option that tries to restore the older behaviour.

where is the option? I cannot see it

Check now, the push was still going when I made the comment.

The option looks good, but I kind of want to see a benchmark in a large code base. Both with the full module graph and without it.

Do you still want to get this merged before the next release? If it cannot wait, then go ahead and merge it

@wz1000
Copy link
Collaborator Author

wz1000 commented Dec 22, 2022

This needs #3391 to work because the eval plugin currently relies on the driver (it calls load) which inspects ms_hs_date which we unset for GetModSummaryWithoutTimestamps. It is not desirable to use GetModSummary for the module graph as then any change to any file will invalidate the module graph (which all files depend on), rather than just changes to the module header.

We shouldn't be using the GHC driver anyway as we want to perform loading and recompilation checking using our own logic. The rework of the eval plugin fixes this.

I will land them both after the release.

@pepeiborra
Copy link
Collaborator

@wz1000 I benchmarked this change in Bigcorp codebase and the results were not good. 6X longer startup times with full module graph and 5X without full module graph. This is compared to my O(imports) GetDependencies hack, which for reference is simply:

getDependencies :: NormalizedFilePath -> Action TransitiveDependencies
getDependencies f = do
  imports <- use_ GetLocatedImports f
  let direct = [artifactFilePath | (_, Just ArtifactsLocation {..}) <- imports]
  transitive <- foldMap transitiveModuleDeps <$> uses_ GetDependencies direct
  let !final = HashSet.toList $ HashSet.fromList $ direct <> transitive
  return $ TransitiveDependencies final

Full results:

BEFORE

name         | success | samples | startup | setup | userTime | delayedTime | totalTime
------------ | ------- | ------- | ------- | ----- | -------- | ----------- | ---------
edit         | True    | 10      | 2m10s   | 0.00s | 13.82s   | 0.00s       | 13.83s
edit imports | True    | 10      | 1m49s   | 0.00s | 0.27s    | 0.00s       | 0.27s
/data/users/pepeiborra/fbsource/fbcode/buck-out/opt/gen/sigma/ide/sigma-ide --lsp --test --cwd /home/pepeiborra/si_sigma +RTS -A32M -I0 -s/tmp/sigma-ide-benchmark-large.gcStats
 355,733,992,672 bytes allocated in the heap
  52,305,367,504 bytes copied during GC
  13,424,971,888 bytes maximum residency (10 sample(s))
     552,684,264 bytes maximum slop
           12803 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       413 colls,   412 par   563.854s  14.237s     0.0345s    0.5817s
  Gen  1        10 colls,     8 par   290.148s  18.952s     1.8952s    10.9664s

  Parallel GC work balance: 89.61% (serial 0%, perfect 100%)

  TASKS: 198 (7 bound, 168 peak workers (191 total), using -N40)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time  378.871s  ( 89.268s elapsed)
  GC      time  854.002s  ( 33.189s elapsed)
  EXIT    time    0.043s  (  0.006s elapsed)
  Total   time  1232.917s  (122.464s elapsed)

  Alloc rate    938,932,822 bytes per MUT second

  Productivity  30.7% of total user, 72.9% of total elapsed

fullModulegraph=true

name | success | samples | startup | setup | userTime | delayedTime | totalTime
---- | ------- | ------- | ------- | ----- | -------- | ----------- | ---------
edit | True    | 10      | 12m42s  | 0.00s | 13.31s   | 0.00s       | 13.32s
/data/users/pepeiborra/fbsource/fbcode/buck-out/opt/gen/sigma/ide/sigma-ide --lsp --test --cwd /home/pepeiborra/si_sigma +RTS -A32M -I0 -s/tmp/sigma-ide-benchmark-large.gcStats
3,707,510,882,688 bytes allocated in the heap
  96,192,538,160 bytes copied during GC
  19,762,467,872 bytes maximum residency (14 sample(s))
     604,503,008 bytes maximum slop
           18846 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      4080 colls,  4079 par   1592.894s  39.927s     0.0098s    0.8547s
  Gen  1        14 colls,    12 par   589.262s  45.278s     3.2342s    29.5445s

  Parallel GC work balance: 81.60% (serial 0%, perfect 100%)

  TASKS: 284 (7 bound, 270 peak workers (277 total), using -N40)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time  3939.385s  (722.147s elapsed)
  GC      time  2182.156s  ( 85.205s elapsed)
  EXIT    time    0.089s  (  0.008s elapsed)
  Total   time  6121.632s  (807.361s elapsed)

  Alloc rate    941,139,498 bytes per MUT second

  Productivity  64.4% of total user, 89.4% of total elapsed

fullModulegraph=false

name | success | samples | startup | setup | userTime | delayedTime | totalTime
---- | ------- | ------- | ------- | ----- | -------- | ----------- | ---------
edit | True    | 10      | 9m46s   | 0.00s | 12.12s   | 0.00s       | 12.12s   
/data/users/pepeiborra/fbsource/fbcode/buck-out/opt/gen/sigma/ide/sigma-ide --lsp --test --cwd /home/pepeiborra/si_sigma +RTS -A32M -I0 -s/tmp/sigma-ide-benchmark-large.gcStats 
3,536,421,762,320 bytes allocated in the heap
  86,187,539,816 bytes copied during GC
  17,900,403,312 bytes maximum residency (15 sample(s))
     547,153,296 bytes maximum slop
           17071 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      3636 colls,  3635 par   1222.932s  30.644s     0.0084s    0.4791s
  Gen  1        15 colls,    13 par   549.295s  34.563s     2.3042s    19.7335s

  Parallel GC work balance: 84.83% (serial 0%, perfect 100%)

  TASKS: 268 (7 bound, 255 peak workers (261 total), using -N40)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time  3415.233s  (554.057s elapsed)
  GC      time  1772.227s  ( 65.207s elapsed)
  EXIT    time    0.091s  (  0.016s elapsed)
  Total   time  5187.552s  (619.282s elapsed)

  Alloc rate    1,035,484,716 bytes per MUT second

  Productivity  65.8% of total user, 89.5% of total elapsed

@wz1000
Copy link
Collaborator Author

wz1000 commented Dec 26, 2022

@pepeiborra do you mean your hack is separate from what lives in HLS master as of now? Is there any regression in fullModulegraph=false compared to current HLS master? I don't think there should be, but maybe I made a mistake.

I think we could use something like your hack if we can augment hls-graph cycle detection to produce recoverable errors.

@pepeiborra
Copy link
Collaborator

@pepeiborra do you mean your hack is separate from what lives in HLS master as of now?

Actually ignore the hack, I was confused, the BEFORE results are not using the hack.

Reason: we are still on ghcide 1.4, while upgrading to ghcide 1.8 I deleted the hack since GetDependencies no longer exists. Startup times with ghcide 1.4 + hack were around 1m20s, on vanilla 1.8 it's 2m10s.

Is there any regression in fullModulegraph=false compared to current HLS master? I don't think there should be, but maybe I made a mistake.

Yes, the regression is 2m10s -> 9m46s for startup. More specifically, by startup I mean loading the first module, which is handpicked to have a very large transitive module graph.

I think we could use something like your hack if we can augment hls-graph cycle detection to produce recoverable errors.

I don't think this can be done cleanly.

@wz1000 wz1000 force-pushed the wip/module-graph-mem branch 2 times, most recently from 299795a to 63d9ec6 Compare December 26, 2022 11:41
@pepeiborra
Copy link
Collaborator

With the strictness fix, the regression is less dramatic: 50% longer to startup and 32% more total memory in use (17G of instead of 12.8G). Not quite ready yet, but less bad.

name | success | samples | startup | setup | userTime | delayedTime | totalTime
---- | ------- | ------- | ------- | ----- | -------- | ----------- | ---------
edit | True    | 10      | 3m16s   | 0.00s | 12.04s   | 0.00s       | 12.04s   
/data/users/pepeiborra/fbsource/fbcode/buck-out/opt/gen/sigma/ide/sigma-ide --lsp --test --cwd /home/pepeiborra/si_sigma +RTS -A32M -I0 -s/tmp/sigma-ide-benchmark-large.gcStats 
 390,443,605,936 bytes allocated in the heap
  74,427,647,744 bytes copied during GC
  17,896,901,816 bytes maximum residency (14 sample(s))
     560,837,448 bytes maximum slop
           17067 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       929 colls,   928 par   784.410s  19.680s     0.0212s    0.5344s
  Gen  1        14 colls,    12 par   478.783s  34.307s     2.4505s    21.2495s

  Parallel GC work balance: 86.81% (serial 0%, perfect 100%)

  TASKS: 284 (7 bound, 264 peak workers (277 total), using -N40)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.099s elapsed)
  MUT     time  718.973s  (177.425s elapsed)
  GC      time  1263.193s  ( 53.987s elapsed)
  EXIT    time    0.084s  (  0.009s elapsed)
  Total   time  1982.251s  (231.521s elapsed)

  Alloc rate    543,057,234 bytes per MUT second

  Productivity  36.3% of total user, 76.6% of total elapsed

@wz1000
Copy link
Collaborator Author

wz1000 commented Jul 12, 2023

I had another look at this and fixed a couple of leaks and it seems like this patch is a big win for performance in most common cases.

There are two options in RulesConfig which can have an impact on performance for large projects:

  1. fullModuleGraph(called checkForImportCycles in baseline)
  2. enableTemplateHaskell

Both of these options are enabled by default.

In baseline HLS, checkForImportCycles and enableTemplateHaskell were largely independent, checkForImportCycles controlled if on each call to ghcSessionDepsDefinition, we recomputed the entire module graph below the file (a quadratic if not exponential process because of all the duplicated work).

With the patch, checkForImportCycles results in calls to GetModuleGraph instead, avoiding duplicated work.

enableTemplateHaskell always entails calls to GetModuleGraph because of the way we need to decide if anything above a file in the graph for the whole project requires this file to be compiled.

With the patch, either of these options when enabled result in calls to GetModuleGraph.

I will be testing two configurations, NOGRAPH={fullModuleGraph=False, enableTemplateHaskell=False} and GRAPH={fullModuleGraph=True, enableTemplateHaskell=True}, as these are the two canonical configurations that decide if GetModuleGraph is called.

I tested this patch with a project generated via the following script:

#!/usr/bin/env bash
# Generate $DEPTH layers of modules with $WIDTH modules on each layer
# Every module on layer N imports all the modules on layer N-1
# MultiLayerModules.hs imports all the modules from the last layer
DEPTH=15
WIDTH=40
for i in $(seq -w 1 $WIDTH); do
  echo "module DummyLevel0M$i where" > DummyLevel0M$i.hs;
done
for l in $(seq 1 $DEPTH); do
  for i in $(seq -w 1 $WIDTH); do
    echo "module DummyLevel${l}M$i where" > DummyLevel${l}M$i.hs;
    for j in $(seq -w 1 $WIDTH); do
      echo "import DummyLevel$((l-1))M$j" >> DummyLevel${l}M$i.hs;
    done
  done
done
echo "module MultiLayerModules where" > MultiLayerModules.hs
for j in $(seq -w 1 $WIDTH); do
  echo "import DummyLevel${DEPTH}M$j" >> MultiLayerModules.hs;
done

I focused testing on the two extremes, DummyLevel0M1 (which imports nothing) and MultiLayerModules which imports everything.

Here are the results of running haskell-language-server from the command line on these files:

commit config file time bytes allocated peak memory
master GRAPH MultiLayerModules 17.288 47,771,227,040 bytes 1035 MiB
wip/module-graph-mem GRAPH MultiLayerModules 4.222s 14,076,683,792 bytes 712 MiB
master NOGRAPH MultiLayerModules 6.318s 15,325,017,376 bytes 952 MiB
wip/module-graph-mem NOGRAPH MultiLayerModules 6.126s 15,328,584,272 bytes 960 MiB
master GRAPH DummyLevel0M1 0.396s 572,811,704 bytes 161 MiB
wip/module-graph-mem GRAPH DummyLevel0M1 1.205s 4,113,919,264 bytes 302 MiB
master NOGRAPH DummyLevel0M1 0.397s 572,832,632 bytes 161 MiB
wip/module-graph-mem NOGRAPH DummyLevel0M1 0.397s 572,799,488 bytes 161 MiB

So this branch is much faster in the (default) GRAPH config for MultiLayerModules, slower for DummyLevel0M1, and equivalent in the NOGRAPH config for both files.

However, this is not the full story for the GRAPH config on DummyLevel0M1. Calling haskell-language-server on a file from the command line only runs the TypeCheck, GetHieAst and GenerateCore rules. None of these rules require us to know if we need to compile a file. It is only GetModIface that forces us to compute the full module graph in HLS master.
A more realistic test will then force at least one GetModIface call.

Testing DummyLevel1M1, one level up, so that GetModuleGraph is also called due to needing GetModIface on DummyLevel0M*, we get the following results:

commit config file time bytes allocated peak memory
master GRAPH DummyLevel1M1 1.456s 4,531,463,816 bytes 416 MiB
wip/module-graph-mem GRAPH DummyLevel1M1 1.461s 4,502,678,944 bytes 383 MiB

Changing the haskell-language-server commandline check command to instead call GetModIface on its arguments also equalises the performance of both branches on DummyLevel0M1

And we see that this branch is equivalent to or outperforms master for most realistic scenarios and configurations.

@wz1000 wz1000 force-pushed the wip/module-graph-mem branch 3 times, most recently from 7f06097 to 0220797 Compare July 12, 2023 09:03
@pepeiborra
Copy link
Collaborator

Thanks for pushing ahead with this branch and sharing the revised benchmarks!

My only concern is that the benchmarks specifically test startup scenarios, and I would like to see bench numbers for the edit-module-headers scenario that is expected to regress:

this could be solved easily be re-introducing
a version of GetDependencyInformation with early cutoff that essentially returns the result of GetModuleGraph
but includes the hash of only the ModSummarys in the downward dependency closure of the file.

Did you look into this?

@wz1000
Copy link
Collaborator Author

wz1000 commented Jul 24, 2023

@pepeiborra I've added a benchmark to edit the header and also added my script to the benchmarks.

After benchmarking, I made a few more rules early cutoff, this dramatically improves edit performance for all scenarios other than the DummyLevel0M01 case.

I tested three commits, baseline(master), module-graph(this patch without improvements to early cutoff) and module-graph-cutoff(this patch). Here are the results of benchmarking:

DummyLevel0M1:

version               configuration   name          success   samples   startup   setup   userT   delayedT   1stBuildT   avgPerRespT   totalT   rulesBuilt   rulesChanged   rulesVisited   rulesTotal   ruleEdges   ghcRebuilds   maxResidency   allocatedBytes
baseline              All             edit-header   True      50        0.32      0.00    0.50    66.47      66.46       0.01          66.97    10           9              13             11504        572850      43            912MB          51438MB
module-graph          All             edit-header   True      50        0.13      0.00    0.45    3.05       3.05        0.01          3.51     10           9              14             10906        164129      1             389MB          17112MB
module-graph-cutoff   All             edit-header   True      50        0.12      0.00    0.44    3.23       3.22        0.01          3.67     10           9              15             11467        141453      82            423MB          17867MB
baseline              All             edit          True      50        0.25      0.00    0.49    67.01      67.00       0.01          67.51    10           9              13             11546        572808      1             849MB          50631MB
module-graph          All             edit          True      50        0.12      0.00    0.46    3.15       3.14        0.01          3.61     10           9              14             10826        164209      81            399MB          17982MB
module-graph-cutoff   All             edit          True      50        0.09      0.00    0.40    3.02       3.01        0.01          3.42     10           9              15             11467        141453      82            423MB          17754MB

DummyLevel0M01-edit

DummyLevel0M01:

version               configuration   name          success   samples   startup   setup   userT   delayedT   1stBuildT   avgPerRespT   totalT   rulesBuilt   rulesChanged   rulesVisited   rulesTotal   ruleEdges   ghcRebuilds   maxResidency   allocatedBytes
baseline              All             edit-header   True      50        0.19      0.00    34.70   66.20      66.19       0.71          100.91   651          9              775            11467        572886      81            781MB          71020MB
module-graph          All             edit-header   True      50        0.11      0.00    34.50   2.95       2.98        0.70          37.45    651          9              855            10906        164129      1             389MB          37691MB
module-graph-cutoff   All             edit-header   True      50        0.14      0.00    2.26    3.06       3.05        0.05          5.33     10           9              856            11469        141450      81            428MB          27791MB
baseline              All             edit          True      50        0.20      0.00    33.97   65.32      65.31       0.69          99.30    651          9              775            11546        572808      1             852MB          70324MB
module-graph          All             edit          True      50        0.12      0.00    36.88   3.26       3.25        0.75          40.14    651          9              855            10826        164208      82            389MB          38301MB
module-graph-cutoff   All             edit          True      50        0.28      0.00    2.22    2.95       2.94        0.05          5.18     10           9              856            11548        141372      1             429MB          27183MB

DummyLevel1M01NoTH-edit

MultiLevelModules:

version               configuration   name          success   samples   startup   setup   userT   delayedT   1stBuildT   avgPerRespT   totalT   rulesBuilt   rulesChanged   rulesVisited   rulesTotal   ruleEdges   ghcRebuilds   maxResidency   allocatedBytes
baseline              All             edit-header   True      50        0.25      0.00    55.89    69.19      69.18       0.38          125.08   668          22             4393           11550        572758      180           865MB          205450MB
module-graph          All             edit-header   True      50        0.16      0.00    59.90    2.83       2.82        0.41          62.73    665          20             3867           10910        164158      104           628MB          103989MB
module-graph-cutoff   All             edit-header   True      50        0.21      0.00    16.37    3.16       3.15        0.11          19.53    19           16             3861           11552        141322      182           578MB          98625MB
baseline              All             edit          True      50        0.26      0.00    118.19   66.16      66.15       0.80          184.35   1871         628            4394           11550        572238      101           1221MB         685464MB
module-graph          All             edit          True      50        0.18      0.00    99.24    3.21       3.21        0.68          102.46   1871         628            3871           10910        163559      180           1081MB         517165MB
module-graph-cutoff   All             edit          True      50        0.27      0.00    64.15    2.98       3.03        0.44          67.14    1221         621            3861           11554        140802      134           1097MB         530448MB

MultiLayerModules-edit

#endif
(catMaybes mss)
#endif
pure (fingerprintToBS $ Util.fingerprintFingerprints $ map (maybe fingerprint0 msrFingerprint) msrs, processDependencyInformation rawDepInfo bm mg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the fingerprint based on the module summaries? I can picture changed to module summaries that would not alter the module graph

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this may be too course. I can try to implement a finer grained hash, though I'm not sure how much it would improve things.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to do this because now the dependency information contains a GHC.ModuleGraph which is essentially a [ModSummary].

ghcide-bench/src/Experiments/Types.hs Outdated Show resolved Hide resolved
@pepeiborra
Copy link
Collaborator

We may still want a per module early cutoff right? The new benchmark scenario Edit-header doesn't alter the module graph, so it doesn't tell us what happens when a portion of the module graph is altered. Previously it would only dirty the modules in that portion, whereas now (even with the ModuleGraph early cutoff) it will dirty all the modules.

Concretely, I'm suggesting to keep GetDependencyInformation rule, reimplemented as a lookup on GetModuleGraph with early cutoff. I thought this was your original proposal. I might be wrong and this might be unnecessary, only benchmarks can tell

@wz1000
Copy link
Collaborator Author

wz1000 commented Jul 24, 2023

We may still want a per module early cutoff right? The new benchmark scenario Edit-header doesn't alter the module graph, so it doesn't tell us what happens when a portion of the module graph is altered. Previously it would only dirty the modules in that portion, whereas now (even with the ModuleGraph early cutoff) it will dirty all the modules.

Concretely, I'm suggesting to keep GetDependencyInformation rule, reimplemented as a lookup on GetModuleGraph with early cutoff. I thought this was your original proposal. I might be wrong and this might be unnecessary, only benchmarks can tell

GetDependencyInformation was only used by ReportImportCycles - now I've made ReportImportCycles early cutoff (with a perfect hash, since the result is just unit). So ReportImportCycles can never dirty any modules. However, GetModuleGraph can still dirty ReportImportCycles. I'm not sure if adding a indirection to GetDependencyInformation in between would really help things, since inspecting if the module graph contains a cycle including the given file should not be too slow, certainly not slower than recomputing the hash for a restricted subset of the module graph (as we would need to do to implement GetDependencyInformation as you suggest).

Do you still think we need GetDependencyInformation?

@wz1000 wz1000 force-pushed the wip/module-graph-mem branch 5 times, most recently from 17a9330 to 366e23e Compare July 26, 2023 13:25
(do queueForEvaluation st nfp
setSomethingModified VFSUnmodified st [toKey IsEvaluating nfp] "Eval")
(do unqueueForEvaluation st nfp
setSomethingModified VFSUnmodified st [toKey IsEvaluating nfp] "Eval")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pepeiborra do you think we need this setSomethingModified call after calling unqueueForEvaluation?

Computing and storing GetDependencyInformation for each file essentially individually means
that we perform downsweep on each file individually, wasting a lot of work and using an excessive
amount of memory to store all these duplicated graphs individually.

However, we already have the `GetModuleGraph` rule, which we need to compute before compiling
files any way due to being depended on by `NeedsCompilation`, which needs to know if any reverse
dependencies of the module we are compiling requires TH, which meant that each file already depends on
the results of downsweep for the whole project.

Instead, we can compute the whole graph once when we execute the `GetModuleGraph` rule and even use this inside `HscEnv.hsc_mod_graph` to avoid reconstructing the `ModuleGraph` on each invocation of `GhcSessionDeps`.

There may be concerns about excessive build churn due to any change to the result of `GetModuleGraph`
invalidating the result of `GhcSessionDeps` too often, but note that this only happens when something
in the header of a module changes, and this could be solved easily be re-introducing
a version of `GetDependencyInformation` with early cutoff that essentially returns the result of `GetModuleGraph`
but includes the hash of only the `ModSummary`s in the downward dependency closure of the file.
early cutoff for eval plugin
@wz1000 wz1000 added merge me Label to trigger pull request merge and removed merge me Label to trigger pull request merge labels Aug 2, 2023
@mergify mergify bot merged commit 9effc56 into master Aug 4, 2023
46 of 47 checks passed
@fendor fendor mentioned this pull request Aug 8, 2023
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge me Label to trigger pull request merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants