Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate pantsd memory usage increases more than expected #7647

Closed
patliu85 opened this issue May 1, 2019 · 5 comments
Closed

Investigate pantsd memory usage increases more than expected #7647

patliu85 opened this issue May 1, 2019 · 5 comments

Comments

@patliu85
Copy link
Contributor

patliu85 commented May 1, 2019

As a result of #7596, we see pantsd’s memory usage increases in each run (of the same command), and reaches 55% more memory usage after 10 runs.

@patliu85
Copy link
Contributor Author

patliu85 commented May 1, 2019

I was using python memory_profiler module (https://github.com/pythonprofilers/memory_profiler or https://pypi.org/project/memory-profiler/ ) to profile pantsd's memory usage for running following command 10 times:

[ pants (pantsd-sans-forking)] $ for i in seq 1 10; do ./pants --enable-pantsd filter 3rdparty/jvm/commons-io:: ; done

The memory usage after the first run was 114.2 MB
after the second run: 119.2 MB
and after the tenth run: 156.3 MB
so in average, the memory usage increased by 4.2 MB, or about 3.67%, after each run

Here is a list of method calls that potentially increases the memory usage in each run:
(a chain of method calls from top to bottom)

  1. In DaemonPantsRunner(ProcessManager) class run(self) method in daemon_pants_runner.py
    (+ 0.2MB):
with self.nailgunned_stdio(self._maybe_shutdown_socket, self._env) as finalizer, \
      hermetic_environment_as(**self._env), \
      encapsulated_global_logger(), \
      ExceptionSink.exiter_as(self._exiter) as daemon_exiter:

(+ 2.5MB):

        runner = LocalPantsRunner.create(
          #NoopExiter(),
          daemon_exiter,
          self._args,
          self._env,
          self._target_roots,
          self._graph_helper,
          self._options_bootstrapper,
        )

(+ 0.3MB):
exit_code = runner.run()

  1. In LocalPantsRunner class create method in local_pants_runner.py:
    (+ 1.5MB):
    options, build_config, options_bootstrapper = cls.parse_options(
      args,
      env,
      options_bootstrapper=options_bootstrapper,
    )

(+ 0.8MB):

    if global_options.verify_config:
      options_bootstrapper.verify_configs_against_options(options)
  1. In LocalPantsRunner class parse_options method in local_pants_runner.py:
    (+ 0.4MB):
    options_bootstrapper = options_bootstrapper or OptionsBootstrapper.create(args=args, env=env)
    (+ 0.1MB):
    build_config = BuildConfigInitializer.get(options_bootstrapper)
    (+ 1.6MB):
    options = OptionsInitializer.create(options_bootstrapper, build_config)

  2. In LocalPantsRunner class _run(self) method in local_pants_runner.py:
    (+ 0.3MB):
    self._set_start_time(time.time())
    (+ 0.4MB):
    goal_runner_result = self._maybe_run_v1()

  3. In OptionsInitializer class create method in options_initializer.py:
    (+ 0.1MB):
    global_bootstrap_options = options_bootstrapper.get_bootstrap_options().for_global_scope()
    (+ 1.5MB):
    options = cls._construct_options(options_bootstrapper, build_configuration)
    (+ 0.1MB):
    GlobalOptionsRegistrar.validate_instance(options.for_global_scope())

  4. In OptionsInitializer class _construct_options method in options_initializer.py:
    (+ 1.5MB):
    return options_bootstrapper.get_full_options(known_scope_infos)

  5. In OptionsBootstrapper class get_full_options method in options_bootstrapper.py:
    (+ 1.3MB)
    return self._full_options(tuple(sorted(set(known_scope_infos))))

  6. In OptionsBootstrapper class _full_options method in options_bootstrapper.py:
    (+ 1.3MB):
    The @memoized_method decorator on _full_options method (or the memoize(*args, **kwargs)) in util/memo.py
    (+ 0.8MB:
    options = Options.create(self.env, self.config, known_scope_infos, args=self.args, bootstrap_option_values=bootstrap_option_values)

Based on a chain of method calls above, it seems like _full_options method in OptionsBootstrapper class accounts for half of the memory leak (memory usage increases in each run), and other methods along the chain, including those not listed here, probably account for another half of the memory leak.

@patliu85
Copy link
Contributor Author

patliu85 commented May 1, 2019

I was also using Python objgrap (https://mg.pov.lt/objgraph/objgraph.html) to generate the objects graphs. We could see number of new objects (grouped by data type) increase:

  • After first run
=====================================================================
Type                   Old_ids  Current_ids      New_ids Count_Deltas
=====================================================================
function                     0        27767       +27767       +27767
dict                         0        24330       +24330       +24330
tuple                        0        20524       +20524       +20524
list                         0        15922       +15922       +15922
weakref                      0        10294       +10294       +10294
set                          0        10184       +10184       +10184
WeakSet                      0         3851        +3851        +3851
cell                         0         3427        +3427        +3427
type                         0         3080        +3080        +3080
getset_descriptor            0         3071        +3071        +3071
=====================================================================
  • Subsequence run
=======================================================================
Type                     Old_ids  Current_ids      New_ids Count_Deltas
=======================================================================
dict                       24330        30869        +6763        +6539
list                       15922        21712        +5952        +5790
set                        10184        13257        +3107        +3073
OptionHistoryRecord         2971         5652        +2681        +2681
tuple                      20524        23017        +2545        +2493
ScopeInfo                   1686         3370        +1684        +1684
Parser                      1684         3366        +1682        +1682
OptionHistory               1373         2624        +1251        +1251
RankedValue                 1368         2614        +1246        +1246
method                      1474         2110         +665         +636
=======================================================================
  • Options object graph
    Options

  • OptionsBootstrapper object graph
    OptionsBootstrapper_10

OptionsBootstrapper

@stuhood
Copy link
Sponsor Member

stuhood commented May 2, 2019

A ticket that is definitely contributing to this memory usage (although it's not clear how much yet) is: #6555.

@ity ity added this to TODO in Pants Daemon via automation May 3, 2019
@stuhood
Copy link
Sponsor Member

stuhood commented May 3, 2019

In the randomized objgraph sample, we also see #5668:
Screen Shot 2019-05-02 at 5 07 12 PM

...ie, "all of the Targets in a BuildGraph reference the buildgraph".

@illicitonion mentioned that he had seen indication that modern python 3 does support breaking reference cycles that pass through built-in collections.

@patliu85
Copy link
Contributor Author

patliu85 commented May 3, 2019

Two branches were forked from pantsd-sans-forking branch:

  1. pliu/pantsd-sans-forking-using-memprofiler branch, to support using memory profiler module
  2. pliu/pantsd-sans-forking-using-objgraph branch, to support using objgraph module

@stuhood stuhood closed this as completed May 7, 2020
Pants Daemon automation moved this from TODO to Done May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Pants Daemon
  
Done
Development

No branches or pull requests

2 participants