Improve memory reporter polling #106

Open
whimboo opened this Issue Mar 22, 2012 · 21 comments

Projects

None yet

5 participants

@whimboo

We should improve the current polling of the memory reporter to report lesser changes, especially because they are cluttering the log file. Here some ideas:

  • Increase the poll interval - easiest solution but gives us lesser updates and delays in showing actual values
  • Use a threshold between the current and the last value and only report if the difference is higher
  • We can trigger the polling manually after a GC/CC to make sure that we report right away.

This was a request by Bill and I kinda would like to see it in v0.3. It's not that hard to implement, so setting mentored label.

@dglol

Use a threshold between the current and the last value and only report if the difference is higher

I like this one the best if the threshold isn't too small. Something like 1/10th of a megabyte would work well, or 1/100th if consistency with about:memory matters.

@whimboo

So there is one problem. Right now we only report resident memory but in the future we want to add more reporters. So how could we decide when to update the internal memory stats? @amccreight, would you have an advice? Is there a reporter we could always use as reference to determine if a change in memory usage has been taken place above a given threshold?

@amccreight
@jlebar

I recommend polling based on where you are in your test, rather than using a time-based interval.

We expect RSS to fluctuate depending on what's happening in the browser -- whether you've had a GC, what page is loaded, etc. So you'll introduce noise by polling on a timer.

FYI, @Nephyrin has been working on http://areweslimyet.com, which uses mozmill to do a memory test based on TP5. Parts of membuster may be redundant with his test.

@Nephyrin

The relevant areweslimyet memory polling happens at:
https://github.com/Nephyrin/MozAreWeSlimYet/blob/master/mozmill_endurance_test/performance.js

The problem with this is it pulls a lot of junk into JS (there's a lot of memory reporters), so our endurance test uses a per-checkpoint event instead of a per-run event to get the data out so it needn't live in JS. This will still create oodles of garbage, however, so you need to be careful with when this runs vs when the GC is invoked.

@whimboo

So MemChaser is not primarily targeted for test harnesses. We will add an internal API later but that's all. It really something which users can make use of. So if a timer is not appropriate we should figure out a better story for polling.

Some ideas:

  • User opens/closes/switches tabs
  • User opens/closes/switches windows
  • Page unload/load events
  • CC/GC statistics observer

Not sure if we could listen for AJAX activity in background tabs, e.g. when Google Mail is open and which could also cause memory increases/drops.

@whimboo

@amccreight, @jlebar, or @Nephyrin can you respond to my latest proposal? How does it sound?

@jlebar

What exactly are you proposing?

The MemShirnk team has its own memory benchmarks at this point. To be honest, I can't say we're particularly concerned with MemChaser.

@whimboo

@jlebar, while you answer is true for your internal usage you miss the real point of the extension. It's getting developed to assist users in detecting memory leaks and broken behavior regarding the GC/CC features. More and more stuff gets added to guideline users to report such a broken behavior with as much as possible information included.

With your benchmark you will never reach the real world where users have installed a huge amount of extensions, all the different hardware configurations in use, relationship with other installed applications, or other weird behavior.

@amccreight, William McCloskey, and Nicholas Nethercote were always encouraging us to work on this extension and requested a lot of features. So your answer was a bit disappointing to us.

@jlebar

As I understand it, there are two separate use-cases here. One is using MemChaser to power a memory benchmark. The other is users installing MemChaser and using that to detect high memory usage and high GC/CC times.

I wasn't aware of the second use case until now. It seems that the main selling point is that you don't have to look at the error console in order to get the GC/CC times, and that you can get a log of memory and GC/CC stats written out to a file.

Perhaps getting GC/CC times output to the statusbar would be helpful for some users. But those stats don't help us fix a problem, they only identify one. Users with bugs to report will still have to open the error console in order to generate GC/CC dumps, and they'll still have to open about:memory to gather information to help us understand where the problem is coming from. So MemChaser does not buy us much, in its current form. Indeed, the state of the art moves quickly with respect to these diagnostic tools, and I'm not sure it's reasonable to expect an add-on to keep up.

Reporting a good memory bug is difficult, and we're a long way from it being easy for "regular" users to usefully report such a bug. It's not just a matter of getting us data from the browser; the user also needs to help us determine whether the site is causing the browser to use large amounts of memory, or which of her extensions is problematic. So ease of use for noticing that you're in a bad situation is not high on my personal list of priorities.

With your benchmark you will never reach the real world where users have installed a huge amount of extensions, all
the different hardware configurations in use, relationship with other installed applications, or other weird behavior.

We use telemetry to gain insight into this on aggregate.

With respect to using MemChaser as a user diagnostic tool, polling every few seconds to collect statistics is perfectly reasonable. But when you use it as a benchmarking tool, you shouldn't use time-based polling.

@amccreight

It sounds like they are not intended MemChaser to be used for anything but users trying to closely monitor their browser.

MemChaser is nice because it makes it very easy to almost continuously monitor GC/CC behavior, without any technical knowledge of going into about:config and changing a setting. The latter is useful because we get GC/CC reports from a variety of users with unknown technical background, and we can more easily just tell them to install the addon and know that they will be able to handle it. I also don't have to worry about them not restoring the preference and ending up in a state where the browser is doing some extra work forever that it doesn't really need to do.

Continuous monitoring is nice because it makes it easier for the user to relate specific actions they are making to GC/CC behavior, and probably more importantly to relate the GC/CC to the browser behavior they are seeing. If they are experiencing occasional pauses, are they GC or CC or something else? With MemChaser, it is easier to see that.

Obviously getting GC/CC times does not fix a problem, but identifying a problem is often the first step towards fixing it. I was having a problem a few months ago where my sweep times were insane, and after identifying it, Bill was able to write some very detailed additional diagnostics to figure out precisely where it was happening, which led us to a problem with an allocator in the JS engine. Right now, MemChaser wouldn't have helped with this, but it was only because I was monitoring my GC times closely that I noticed the problem. This is something that telemetry may not have been able to detect, because it would be lost in the noise. If 10% of sweep times are long, does that mean that they are bad 100% of the time for some users, or bad some of the time for all users?

It is also useful to know that a particular addon is causing difficulties. For CC optimizations, it is possible that an addon uses a weird feature that isn't used much on websites. For instance, ABP uses variants, so Olli had to improve CC optimizations in a particular way to handle ABP and other addons that store data in this way.

All that said, while I think that MemChaser is very useful for monitoring GC/CC behavior, I'm not sure how useful it will be for memory problems. GC/CC behavior is very temporal, which makes fine grained tracking very useful. For memory problems, we are rarely concerned with the second-to-second behavior of the browser. At this point, we are mostly worried about the "steady state" behavior, in terms of what memory looks like after many GCs and CCs have run. We expect the memory usage to spike up after opening a new tabs, then settling down a bit, so seeing that memory went up for 10 seconds probably doesn't tell us much that is useful. I guess it could be useful for fine-tuning some of these image-discarding heuristics that Justin and Kyle are working on.

@ghost

I think it's worthwhile to have a discussion of the goals of MemChaser, although I'm not sure if this is the right forum for it.

Right now we get reports from users of high GC or CC times. It often takes these users a while to reproduce the issue, and having the times in the status bar makes it a little easier to decide when to do a GC or a CC dump when we request it.

However, I think logging is the more useful aspect of MemChaser. Right now, we have no facility for getting historical data about GC or CC times or memory usage. Telemetry gives an overall picture, but it doesn't show how the data varies over time. For me, this sort of data would be very useful. I would like the know the relationship between the size of the GC heap and the GC times, as well as the relationship between the CC times and the number of GC nodes traversed.

The reason I asked about this issue is that I would like users to be able to enable logging for a long time (over a period of days) without having the log grow too large. In addition, it would be nice if we could log selectively. As an example, I have no use for RSS data in the log. So the longer we wait before putting new RSS data in the log, the happier I am.

If the MemShrink team doesn't have any interest in this sort of data, then it would be best to remove RSS altogether. However, I'm actually surprised that an extension like this wouldn't be useful to MemShrink. It's true that you can always add whatever measurements we want to the browser. But it's usually much easier and quicker to modify an extension. Rather than have users download a special build from tryserver, you can just ask them to install an xpi.

@jlebar

I would like the know the relationship between the size of the GC heap and the GC times, as well as the relationship
between the CC times and the number of GC nodes traversed.

If we want to know about this in aggregate, we should use telemetry.

The reason I asked about this issue is that I would like users to be able to enable logging for a long time (over a period
of days) without having the log grow too large.

If we compress the log, I imagine it wouldn't grow too quickly.

However, I'm actually surprised that an extension like this wouldn't be useful to MemShrink. It's true that you can always add whatever measurements we want to the browser. But it's usually much easier and quicker to modify an extension. Rather than have users download a special build from tryserver, you can just ask them to install an xpi.

I guess it could be useful to ask a user who's seeing pauses to install a restartless add-on to see if GC/CC times are the problem. But I'm not sure what other kinds of things we might put into an add-on rather than into the browser. Much of our work is in measuring memory usage, and these memory reporters have to be in gecko itself.

In terms of generating a useful log, if the log included things like "opened page X", "window Y destroyed", that could be useful. But we kind of need to log this data before the problem occurs, so the log needs to be always-on [1]. This is large scope creep for this add-on.

I think it's worthwhile to have a discussion of the goals of MemChaser, although I'm not sure if this is the right forum for it.

Indeed. Perhaps I've been too negative because I don't understand what it's for. :)

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=662814

@ghost

As I said above, telemetry doesn't have the information that I need because it doesn't say anything about time. This is the additional information that a log provides.

However, it sounds like I'm the only person likely to find this log useful. So can we either remove RSS measurements from the log or else reduce the frequency of RSS measurements to at most once per minute?

@jlebar

As I said above, telemetry doesn't have the information that I need because it doesn't say anything about time. This is the additional information that a log provides.

How does time factor into this? Like, if you want to know how the size of the GC heap affects GC durations, why do you need a time dimension?

Note that telemetry can't currently collect the data you want, because we collect the GC heap size and GC time histograms separately. (Maybe this is what you mean about wanting a time dimension?) But it sounds like what you want is a two-dimensional histogram of (heap size) x (gc duration). I've wanted something similar for (rss) x (browser uptime).

@ghost

Yeah, you're right. If we had correlations between variables in telemetry, that would be sufficient.

@jlebar

FWIW, I'm not holding my breath for telemetry to support multi-dimensional histograms. It would be a big change. So if this add-on gives you insight into that data, that's awesome.

@jlebar

Henrik, I think I initially conflated MemChaser and MemBuster. MemBuster is a framework for building memory benchmarks, and is duplicated in large part by areweslimyet.com. MemChaser is, I see now, unrelated.

If the goal is to expose data about Firefox's memory usage to users, I can get behind that! To answer your original question now that I understand it properly, a time-based interval makes sense for the interactive display. To make the log file useful, you'd probably want to include at least window creation and teardown notifications. There are observer topics for these (see content-document-global-created and outer-window-destroyed). Once you have these in the log, time-based polling is still probably fine.

I'm sorry for getting confused and off-track here.

@dglol

Another handy use case is allowing people to create better assumptions. By closing a specific website's tab, and to see your GC activity and memory drop is an intuitive signal that it's likely part of the problem, thus helping create more specific bug reports. The data, by the way, is hardly useful for any meaningful statistical model to fix the large-scale memory issue. It simply lacks the variables.

Also, I think time is a very useful unit for GC. You can get nasty pauses from typical GC durations but with a high frequency.

@whimboo

Thank you all for the replies! It's great to see the information flowing in. Given the overall goal of MemChaser I will make sure to finally be able to put together the wiki page with the features we want to support, so that we can talk about it. I have to finish up some other more important stuff in the next couple of days but I will send around an email to all of us.

So if the memory information is not useful in the log file we can simply turn it off for now. For the future I would like to make it configurable by the user, so it can be selected which entries have to be logged. Bill, would you be happy with it in the short term? We can fold this into a 0.3.1 release.

For the long term I like to see that you are happy with my proposal to only request memory information when triggered by actions. Lets clearly define what we will have to do once 0.3.1 is out.

@whimboo whimboo added a commit to whimboo/memchaser that referenced this issue Apr 11, 2012
@whimboo whimboo Do not log memory statistics because they are not pretty useful right…
… now. Fixes issue #106
40e93e7
@jlebar

For the long term I like to see that you are happy with my proposal to only request memory information when triggered by actions.

We are? I said earlier: "To answer your original question now that I understand it properly, a time-based interval makes sense for the interactive display."

@whimboo whimboo removed this from the 0.6 milestone Feb 14, 2014
@whimboo whimboo added the javascript label May 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment