Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrarily high physical memory usage while scrolling #4915

Closed
nnethercote opened this issue Feb 13, 2015 · 24 comments
Closed

Arbitrarily high physical memory usage while scrolling #4915

nnethercote opened this issue Feb 13, 2015 · 24 comments
Labels

Comments

@nnethercote
Copy link
Contributor

@nnethercote nnethercote commented Feb 13, 2015

On both Linux and Mac I can get physical memory usage to become arbitrarily high by repeatedly scrolling up and down the "Guardians of the Galaxy" wikipedia page from servo-static-suite. Here's sample -m output on Linux (with my patches from #4894 applied):

_size (MiB)_: _category_
     4530.26: vsize
     1508.38: resident
     1513.60: resident-according-to-smaps
     1465.70: - anonymous (rw-p)
       25.54: - /home/njn/moz/servo/components/servo/target/servo (r-xp)
        6.58: - [heap] (rw-p)
        6.29: - other
        3.55: - /usr/lib/x86_64-linux-gnu/dri/i965_dri.so (r-xp)
        1.36: - /lib/x86_64-linux-gnu/libc-2.19.so (r-xp)
        0.92: - /home/njn/moz/servo/components/servo/target/servo (r--p)
        0.80: - /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.20 (r-xp)
        0.78: - /usr/lib/x86_64-linux-gnu/libX11.so.6.3.0 (r-xp)
        0.68: - /lib/x86_64-linux-gnu/libm-2.19.so (r-xp)
        0.48: - /lib/x86_64-linux-gnu/libglib-2.0.so.0.4200.1 (r-xp)
        0.46: - /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (r-xp)
        0.45: - /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0 (r-xp)
      124.09: system-heap-allocated
       47.72: jemalloc-heap-allocated
       50.70: jemalloc-heap-active
      168.00: jemalloc-heap-mapped

The relatively low system-heap-allocated and jemalloc-heap-allocated numbers indicate that the memory is not on either heap, so it must be coming from mmap(). I did some measurement with vmmap on Mac and it gave me similar indications that it's not heap memory.

I normally use Massif to investigate such cases but I'm getting various task failures when I run Servo under Massif and view this page, alas.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 13, 2015

Whoa:

     7832.25: vsize
     5002.53: resident

I used GDB to see where the mmap calls are occurring and they all seem to be coming from jemalloc. So I now suspect that the jemalloc measurements from the memory profiler are bogus.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 13, 2015

I've confirmed that the current jemalloc measurements are bogus. #4917 has a fix. With that fix applied, it's clear that the increase is coming jemalloc, i.e. from allocations within Rust code. Here's a particularly horrifying case I got with wikipedia:

_category_              : _size (MiB)_
vsize                   :     12433.65
resident                :      9541.64
system-heap-allocated   :        51.40
jemalloc-heap-allocated :      9115.59
jemalloc-heap-active    :      9119.32
jemalloc-heap-mapped    :      9496.00
@larsbergstrom
Copy link
Contributor

@larsbergstrom larsbergstrom commented Feb 13, 2015

@pcwalton I don't know if you're following this investigation, but it looks like we have a horrific number of allocations coming from Rust code during scrolling.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 14, 2015

I'm hoping to get some detailed profiling data next week that will identify which parts of the Rust code are responsible.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 16, 2015

I built a more fine-grained memory profiler out of balsa wood, rubber bands, and GDB. I've used it to find a few things.

At first I thought the increasing RSS was due to traversal.rs's bloom filters being cloned frequently (#4938) but that appears to just cause heap churn without increasing memory usage.

I'm now most suspicious of this stack trace, which occurred 128 times, allocating 4 MiB each time, while scrolling once through the Guardians of the Galaxy page.

Breakpoint 1, 0x0000555556fa5410 in je_chunk_alloc_mmap ()
je_chunk_alloc_mmap:  4194304
#0  0x0000555556fa5410 in je_chunk_alloc_mmap ()
#1  0x0000555556f8dae0 in chunk_alloc_core ()
#2  0x0000555556f8dbbf in je_chunk_alloc_arena ()
#3  0x0000555556f8f05e in arena_chunk_init_hard ()
#4  0x0000555556f914cc in arena_bin_nonfull_run_get ()
#5  0x0000555556f9228a in je_arena_tcache_fill_small ()
#6  0x0000555556f8b33f in je_tcache_alloc_small_hard ()
#7  0x0000555556f86d22 in je_mallocx ()
#8  0x0000555555698c76 in heap::imp::allocate::h14b739cd8e4bb4f5Tfa ()
#9  0x0000555555698bc2 in heap::allocate::h279ce9bd90904ff1gaa ()
#10 0x000055555569f189 in heap::exchange_malloc::h7e1f19f5219aebd3ica ()
#11 0x000055555575551f in sync::mpsc::mpsc_queue::Node$LT$T$GT$::new::h16509210899478187253 ()
#12 0x00005555557c7e3d in sync::mpsc::mpsc_queue::Queue$LT$T$GT$::push::h12151795302260744675 ()
#13 0x00005555557c7a9a in sync::mpsc::shared::Packet$LT$T$GT$::send::h2045489467298542895 ()
#14 0x00005555557c5fd3 in sync::mpsc::Sender$LT$T$GT$::send::h6389378511748171549 ()
#15 0x000055555586bab1 in compositor::IOCompositor$LT$Window$GT$::handle_window_message::h10879893934503735665 ()
#16 0x0000555555841499 in compositor::IOCompositor$LT$Window$GT$.CompositorEventListener::handle_event::h7545344722760054378 ()
#17 0x000055555588056a in Browser$LT$Window$GT$::handle_event::h2440061703768279017 ()
#18 0x000055555569ab7a in servo::main () at main.rs:144
#19 0x0000555556f82ec9 in rust_try_inner ()

This may be misleading because this stack trace may only be responsible for a small fraction of each 4 MiB chunk. However, there were only 439 such chunks allocated during the page scrolling, so it seems likely that this really is accounting for a decent fraction (~128/439) of allocations. It's not clear that these allocations are long-lived, but it's my best lead at the moment.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 16, 2015

In the same run, the following stack trace also showed up 29 times (out of the 439 chunks).

Breakpoint 1, 0x0000555556fa5410 in je_chunk_alloc_mmap ()
je_chunk_alloc_mmap:  4194304
#0  0x0000555556fa5410 in je_chunk_alloc_mmap ()
#1  0x0000555556f8dae0 in chunk_alloc_core ()
#2  0x0000555556f8dbbf in je_chunk_alloc_arena ()
#3  0x0000555556f8f05e in arena_chunk_init_hard ()
#4  0x0000555556f93f92 in je_arena_malloc_large ()
#5  0x0000555556f8c952 in je_tcache_create ()
#6  0x0000555556f86ee3 in je_mallocx ()
#7  0x0000555556f49acb in ffi::c_str::CString::from_slice::h28d59eb74c6ba9590kc ()
#8  0x0000555556f54e83 in sys_common::thread_info::set::ha5d700b3aeb444c3scE ()
#9  0x0000555556a4cbf7 in thread::Builder::spawn_inner::closure.13027 ()
#10 0x0000555556a4fa54 in thunk::Thunk$LT$$LP$$RP$$C$$u20$R$GT$::new::closure.13204 ()
#11 0x0000555556a4f972 in thunk::F.Invoke$LT$A$C$$u20$R$GT$::invoke::h947583999620215949 ()
#12 0x0000555556f67466 in sys::thread::thread_start::hc5bbfed3723b5236ICB ()
#13 0x00007ffff71e10a5 in start_thread (arg=0x7fffec3ff700) at pthread_create.c:309
#14 0x00007ffff69e688d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
@jdm
Copy link
Member

@jdm jdm commented Feb 16, 2015

That last one is just the code that gives threads names :(

@larsbergstrom
Copy link
Contributor

@larsbergstrom larsbergstrom commented Feb 16, 2015

Is it possible the default minimum allocation size for jemalloc is 4MB? If so, it would make sense that it's happening in the code that gives the threads names, as that may be the first thing we do after spawning the pthread that executes allocating rust code.

As to the other one, it looks like the source of those allocations is probably

self.pending_scroll_events.push(ScrollEvent {
delta: delta,
cursor: cursor,
});
(which has been inlined as the send/push in handle_window_message in the stack above). I wonder if we're generating a bazillion scroll events...

@narfg
Copy link

@narfg narfg commented Feb 16, 2015

I just want to add that you can also create arbitrary amounts of memory usage by just moving the mouse inside the window. It doesn't seem to be a specific scrolling issue.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 16, 2015

If jemalloc is using thread-local heaps, as the tcache naming suggests it is -- but I'm not certain about that -- then, yes, the ffi::c_str::CString::from_slice stack would be explained by that being the first allocation done for each new thread.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 17, 2015

glandium confirmed for me that the tcache stuff is thread-local state used by jemalloc to provide lockless fast paths. So it's safe to ignore.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 17, 2015

I wonder if we're generating a bazillion scroll events...

I added a debugging println and only got ~1,000 when scrolling through the GotG page, which doesn't seem that high.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 17, 2015

I just discovered that the jemalloc used by Rust has good Valgrind hooks
(unlike the jemalloc currently used in Firefox) and so running Massif in
vanilla mode gives good data about live blocks. And now I have a smoking gun:

97.26% (5,115,008,997B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
92.32% (4,855,251,232B) 0x1B3AE51: je_mallocx (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,258,384B) 0x24CCC4: heap::imp::allocate::h14b739cd8e4bb4f5Tfa (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,258,384B) 0x24CC10: heap::allocate::h279ce9bd90904ff1gaa (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,254,072B) 0x2531D7: heap::exchange_malloc::h7e1f19f5219aebd3ica (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,920B) 0x30956D: sync::mpsc::mpsc_queue::Node$LT$T$GT$::new::h15915457416600639319 (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,920B) 0x37BE8B: sync::mpsc::mpsc_queue::Queue$LT$T$GT$::push::h2895255413455116970 (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,920B) 0x37BAE8: sync::mpsc::shared::Packet$LT$T$GT$::send::h14191281059685575982 (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,920B) 0x37A021: sync::mpsc::Sender$LT$T$GT$::send::h5383973159704401861 (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,600B) 0x41FAFF: compositor::IOCompositor$LT$Window$GT$::handle_window_message::h17864685478202790988 (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,600B) 0x3F54E7: compositor::IOCompositor$LT$Window$GT$.CompositorEventListener::handle_event::h3511795649320658099 (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,600B) 0x4345B8: Browser$LT$Window$GT$::handle_event::h15273756667072440557 (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,600B) 0x24EBC8: main::hb0fefe4297de4b03Taa (main.rs:144)
83.89% (4,412,249,600B) 0x1B371E7: rust_try_inner (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,600B) 0x1B371D4: rust_try (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,600B) 0x1B2ED4B: rt::lang_start::h9168c35e582ccd4cMQF (in /home/njn/moz/servo/components/servo/target/servo)
83.89% (4,412,249,600B) 0x24EED3: main (in /home/njn/moz/servo/components/servo/target/servo)

Which is the same one I was suspicous about yesterday.

I did some more digging with println!() statements and determined that the MouseWindowMoveEventClass and Quit events are the problems. Scroll is not the problem.

First: MouseWindowMoveEventClass. @narfg is right -- scrolling isn't necessary to reproduce this; just moving the mouse around within the window suffices. Each time you do that you only get a few dozen or maybe hundred such events, but that's enough to send the RSS up by 10s or 100s of MiBs. It'll eventually plateau, until the next mouse movement.

Second: Quit. I've noticed that quitting (by clicking on the 'X' in the window) can be slow, and that RSS can spike while I'm waiting. If I quit with a minimum of mouse movements, I might get a few thousand Quit events. But if I move the mouse around a lot just before quitting (thus generating many MouseWindowMoveEventClass events) I can get millions of Quit events, and this is fastest way to get truly outrageous memory usage. I just did a run where I wiggled the mouse as fast as possible for about five seconds before quitting. After waiting a couple of minutes for quitting to finish I got bored and ctrl-c'd, but before I did that there had been 55 million Quit events and RSS hit 20 GiB.

There's obviously a lot of badness going on here, but one thing that particularly concerns me is that the RSS never goes down.

@narfg
Copy link

@narfg narfg commented Feb 17, 2015

I noticed that this issue affects only some sites. Mouse movement on en.wikipedia.org causes high memory usage; Hacker News doesn't show this behavior. Probably the main difference between these test cases is that mouse movement on Wikipedia causes all the text to move by a few pixels (issue #2894).

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 17, 2015

One other thing I didn't mention: handle_window_message also receives millions of Idle events on the GotG page. These are no-ops but it still seems sub-optimal from a CPU/power viewpoint.

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Feb 17, 2015

We should probably throttle those events.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 18, 2015

I did a Massif run where I quit Servo via ctrl-c rather than clicking in the Window's 'X' button. This avoids the generation of any Quit events and mpsc_queue didn't show up at all. So that means the high mpsc_queue memory usage is caused by the flood of Quit events upon quitting, which makes a certain amount of sense -- millions of Quit events will overwhelm it.

And that means the memory increase caused by mouse moves have another cause. I don't have more data about that yet.

Finally, I tried changing handle_window_message to act on the first Quit event and then ignore all others, but it was flaky -- I got various task failures. Perhaps that doesn't interact well with the existing shutdown machinery, such as ShutdownState.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 23, 2015

I just learned that Servo builds are unoptimized by default, so I tried an optimized (release) build. The memory increase while moving the mouse is substantially faster in an optimized build, but I have trouble reproducing the spike when quitting -- quitting is mostly instantaneous (or nearly) now.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Feb 23, 2015

I did some more profiling with Massif on an optimized build. I kept moving the
mouse until 'jemalloc-allocated' hit 3000 MiB, and then I did a Ctrl-C (which
skips the sending of Quit events, AFAICT.)

Here are the five allocation stacks that account for the most live memory just
before shutdown. They all involve parallel.rs, and four of them involve
recalc_style.

I looked closely at all of them but don't understand the code well enough to
understand why all these allocations are being held onto indefinitely.

-----------------------------------------------------------------------------
09.38% (315,501,440B) 0x4D2446: css::matching::LayoutNode$LT$$u27$ln$GT$.PrivateMatchMethods::cascade_node_pseudo_element::h520664c44b061bbc1Wt (matching.rs:438)
  09.36% (314,695,520B) 0x4CD1BD: css::matching::LayoutNode$LT$$u27$ln$GT$.MatchMethods::cascade_node::h4925eca48b7a52c5m7t (matching.rs:637)
  09.36% (314,695,520B) 0x4C65A2: traversal::RecalcStyleForNode$LT$$u27$a$GT$.PreorderDomTraversal::process::h38470af429c5a97cfkr (traversal.rs:179)
  09.36% (314,695,520B) 0x4C6015: parallel::RecalcStyleForNode$LT$$u27$a$GT$.ParallelPreorderDomTraversal::run_parallel::h69697d0dc0aea4dbsdo (parallel.rs:103)
  09.36% (314,695,520B) 0x4C67F7: parallel::recalc_style::hd7cbb802977b7d35Xdo (parallel.rs:361)

-----------------------------------------------------------------------------
12.37% (416,066,304B) 0x4399D2: vec::Vec$LT$T$GT$::push::h14349886933206795279 (in /home/njn/moz/servo/components/servo/target/release/servo)
12.37% (416,066,304B) 0x4C4D59: inline::LineBreaker::push_fragment_to_line::h8427161d01247e36aBm (inline.rs:690)
  10.63% (357,472,768B) 0x40E510: inline::InlineFlow.Flow::assign_block_size::h7c5e192ad0efa521icn (inline.rs:587)
    10.38% (348,905,728B) 0x411893: flow::Flow::assign_block_size_for_inorder_child_if_necessary::h1847428863239950926 (flow.rs:216)
    10.38% (348,905,728B) 0x3FE516: block::BlockFlow.Flow::assign_block_size::he6b1638fe675aa9b82b (block.rs:915)
      07.87% (264,701,952B) 0x438FD0: flow::Flow::assign_block_size_for_inorder_child_if_necessary::h16812934223498796580 (list_item.rs:88)
      07.87% (264,701,952B) 0x3FE516: block::BlockFlow.Flow::assign_block_size::he6b1638fe675aa9b82b (block.rs:915)
      07.87% (264,701,952B) 0x3FF79D: block::BlockFlow.Flow::assign_block_size_for_inorder_child_if_necessary::h4a387f5b1efbd706i1b (block.rs:1672)
      07.87% (264,701,952B) 0x3FE516: block::BlockFlow.Flow::assign_block_size::he6b1638fe675aa9b82b (block.rs:915)
      07.87% (264,701,952B) 0x3FF79D: block::BlockFlow.Flow::assign_block_size_for_inorder_child_if_necessary::h4a387f5b1efbd706i1b (block.rs:1672)
      07.87% (264,701,952B) 0x3FE516: block::BlockFlow.Flow::assign_block_size::he6b1638fe675aa9b82b (block.rs:915)
      07.87% (264,701,952B) 0x3FF79D: block::BlockFlow.Flow::assign_block_size_for_inorder_child_if_necessary::h4a387f5b1efbd706i1b (block.rs:1672)
        07.85% (263,818,752B) 0x3FE516: block::BlockFlow.Flow::assign_block_size::he6b1638fe675aa9b82b (block.rs:915)
        07.85% (263,818,752B) 0x3FF79D: block::BlockFlow.Flow::assign_block_size_for_inorder_child_if_necessary::h4a387f5b1efbd706i1b (block.rs:1672)
        07.85% (263,818,752B) 0x3FE516: block::BlockFlow.Flow::assign_block_size::he6b1638fe675aa9b82b (block.rs:915)
        07.85% (263,818,752B) 0x3FF79D: block::BlockFlow.Flow::assign_block_size_for_inorder_child_if_necessary::h4a387f5b1efbd706i1b (block.rs:1672)
        07.85% (263,818,752B) 0x3FE516: block::BlockFlow.Flow::assign_block_size::he6b1638fe675aa9b82b (block.rs:915)
        07.85% (263,818,752B) 0x3FF79D: block::BlockFlow.Flow::assign_block_size_for_inorder_child_if_necessary::h4a387f5b1efbd706i1b (block.rs:1672)
        07.85% (263,818,752B) 0x3FD0B8: block::BlockFlow.Flow::assign_block_size::he6b1638fe675aa9b82b (block.rs:915)
        07.85% (263,818,752B) 0x4C59EB: parallel::AssignISizes$LT$$u27$a$GT$.ParallelPreorderFlowTraversal::run_parallel::ha105d9a7ce1cce72Nbo (traversal.rs:346)
        07.85% (263,818,752B) 0x4C5DF7: parallel::assign_inline_sizes::h6c7dc4bd8626fc23lfo (parallel.rs:381)

-----------------------------------------------------------------------------
11.45% (384,945,088B) 0xA9C362: vec::Vec$LT$T$GT$::push::h10553047079115022048 (in /home/njn/moz/servo/components/servo/target/release/servo)
  10.91% (366,747,520B) 0xA9BFAD: text::text_run::TextRun::break_and_shape::h18c1464b9980ff37Fti (text_run.rs:243)
  10.91% (366,747,520B) 0xA6E5B1: text::text_run::TextRun::new::ha7a18f2e26150989Ssi (text_run.rs:189)
  10.91% (366,747,520B) 0x40A7A9: text::TextRunScanner::scan_for_runs::hc67da7c3097c78d3INq (text.rs:186)
  10.91% (366,747,520B) 0x41733D: construct::FlowConstructor$LT$$u27$a$GT$::build_flow_for_block_starting_with_fragment::hf83e1b79e301b182Akd (construct.rs:335)
    06.48% (217,777,472B) 0x424D4C: construct::FlowConstructor$LT$$u27$a$GT$::build_flow_for_block::hae9ceff41d2cfa28qnd (construct.rs:571)
      06.46% (217,312,448B) 0x42581E: construct::FlowConstructor$LT$$u27$a$GT$::build_flow_for_nonfloated_block::h27092b25655a9f0a3od (construct.rs:580)
        06.32% (212,464,448B) 0x41930D: construct::FlowConstructor$LT$$u27$a$GT$.PostorderNodeMutTraversal::process::h9ff427f9e5553ed0LRd (construct.rs:1266)
        06.32% (212,464,448B) 0x4C68BB: traversal::ConstructFlows$LT$$u27$a$GT$.PostorderDomTraversal::process::h05a27862036cc1796or (traversal.rs:228)
        06.32% (212,464,448B) 0x4C6151: parallel::RecalcStyleForNode$LT$$u27$a$GT$.ParallelPreorderDomTraversal::run_parallel::h69697d0dc0aea4dbsdo (parallel.rs:153)
        06.32% (212,464,448B) 0x4C67F7: parallel::recalc_style::hd7cbb802977b7d35Xdo (parallel.rs:361)

11.45% (384,945,088B) 0xA9C362: vec::Vec$LT$T$GT$::push::h10553047079115022048 (in /home/njn/moz/servo/components/servo/target/release/servo)
  10.91% (366,747,520B) 0xA9BFAD: text::text_run::TextRun::break_and_shape::h18c1464b9980ff37Fti (text_run.rs:243)
  10.91% (366,747,520B) 0xA6E5B1: text::text_run::TextRun::new::ha7a18f2e26150989Ssi (text_run.rs:189)
  10.91% (366,747,520B) 0x40A7A9: text::TextRunScanner::scan_for_runs::hc67da7c3097c78d3INq (text.rs:186)
  10.91% (366,747,520B) 0x41733D: construct::FlowConstructor$LT$$u27$a$GT$::build_flow_for_block_starting_with_fragment::hf83e1b79e301b182Akd (construct.rs:335)
    04.43% (148,970,048B) 0x41ED35: construct::FlowConstructor$LT$$u27$a$GT$.PostorderNodeMutTraversal::process::h9ff427f9e5553ed0LRd (construct.rs:1035)
    04.43% (148,970,048B) 0x4C68BB: traversal::ConstructFlows$LT$$u27$a$GT$.PostorderDomTraversal::process::h05a27862036cc1796or (traversal.rs:228)
    04.43% (148,970,048B) 0x4C6151: parallel::RecalcStyleForNode$LT$$u27$a$GT$.ParallelPreorderDomTraversal::run_parallel::h69697d0dc0aea4dbsdo (parallel.rs:153)
    04.43% (148,970,048B) 0x4C67F7: parallel::recalc_style::hd7cbb802977b7d35Xdo (parallel.rs:361)

-----------------------------------------------------------------------------
02.74% (92,120,064B) 0x41D67F: construct::FlowConstructor$LT$$u27$a$GT$.PostorderNodeMutTraversal::process::h9ff427f9e5553ed0LRd (construct.rs:1196)
02.74% (92,120,064B) 0x4C68BB: traversal::ConstructFlows$LT$$u27$a$GT$.PostorderDomTraversal::process::h05a27862036cc1796or (traversal.rs:228)
02.74% (92,120,064B) 0x4C6151: parallel::RecalcStyleForNode$LT$$u27$a$GT$.ParallelPreorderDomTraversal::run_parallel::h69697d0dc0aea4dbsdo (parallel.rs:153)
02.74% (92,120,064B) 0x4C67F7: parallel::recalc_style::hd7cbb802977b7d35Xdo (parallel.rs:361)
@metajack
Copy link
Contributor

@metajack metajack commented Feb 23, 2015

09.38% (315,501,440B) 0x4D2446: css::matching::LayoutNode$LT$$u27$ln$GT$.PrivateMatchMethods::cascade_node_pseudo_element::h520664c44b061bbc1Wt (matching.rs:438)
09.36% (314,695,520B) 0x4CD1BD: css::matching::LayoutNode$LT$$u27$ln$GT$.MatchMethods::cascade_node::h4925eca48b7a52c5m7t (matching.rs:637)
09.36% (314,695,520B) 0x4C65A2: traversal::RecalcStyleForNode$LT$$u27$a$GT$.PreorderDomTraversal::process::h38470af429c5a97cfkr (traversal.rs:179)
09.36% (314,695,520B) 0x4C6015: parallel::RecalcStyleForNode$LT$$u27$a$GT$.ParallelPreorderDomTraversal::run_parallel::h69697d0dc0aea4dbsdo (parallel.rs:103)
09.36% (314,695,520B) 0x4C67F7: parallel::recalc_style::hd7cbb802977b7d35Xdo (parallel.rs:361)

Looking at the code for this one, it appears that this is all safe code. A new Arc is created for the style, then cloned and passed to the cache. Finally it is inserted into the mutable function argument. This ends up being stored in layout_data for the node, which is manipulated unsafely. So perhaps we're never freeing layout_data.shared_data?

@jdm
Copy link
Member

@jdm jdm commented Feb 23, 2015

That should happen as part of layout data reaping in the Node destructor.

@metajack
Copy link
Contributor

@metajack metajack commented Mar 12, 2015

I found that tests/ref/inline_block_img_a.html is a reduced test case for this. It leaks every flow every time and none of them ever get freed. It has 6 total flows.

metajack added a commit to metajack/servo that referenced this issue Mar 16, 2015
Cycles were being created in the flow tree since absolutely positioned
descendants had pointers to their containing blocks. This adds
WeakFlowRef (based on Weak<T>) and makes these backpointers weak
references. This also harmonizes our custom Arc<T>, FlowRef, to be
consistent with the upstream implementation.

Fixes servo#4915.
metajack added a commit to metajack/servo that referenced this issue Mar 16, 2015
Cycles were being created in the flow tree since absolutely positioned
descendants had pointers to their containing blocks. This adds
WeakFlowRef (based on Weak<T>) and makes these backpointers weak
references. This also harmonizes our custom Arc<T>, FlowRef, to be
consistent with the upstream implementation.

Fixes servo#4915.
metajack added a commit to metajack/servo that referenced this issue Mar 16, 2015
Cycles were being created in the flow tree since absolutely positioned
descendants had pointers to their containing blocks. This adds
WeakFlowRef (based on Weak<T>) and makes these backpointers weak
references. This also harmonizes our custom Arc<T>, FlowRef, to be
consistent with the upstream implementation.

Fixes servo#4915.
bors-servo pushed a commit that referenced this issue Mar 16, 2015
Cycles were being created in the flow tree since absolutely positioned
descendants had pointers to their containing blocks. This adds
WeakFlowRef (based on Weak<T>) and makes these backpointers weak
references. This also harmonizes our custom Arc<T>, FlowRef, to be
consistent with the upstream implementation.

Fixes #4915.
@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Mar 16, 2015

I retested after updating and I can confirm the leak is fixed. On the GotG page the "jemalloc-heap-allocated" measurement is bouncing around in the 61--65 MiB range instead of shooting for the moon. Excellent.

@nnethercote
Copy link
Contributor Author

@nnethercote nnethercote commented Mar 16, 2015

On the GotG page the "jemalloc-heap-allocated" measurement is bouncing around in the 61--65 MiB range instead of shooting for the moon.

I should add: before this fix, "jemalloc-heap-allocated" reached 340--380 MiB as soon as the page finished loading, before even wiggling the mouse at all. So the baseline memory usage is a lot lower as well. Double-excellent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

7 participants
You can’t perform that action at this time.