Bug 1436615 - Lots of memory used for activity stream PNGs encoded as data URIs #4150
Conversation
@@ -190,6 +190,35 @@ Section.defaultProps = { | |||
export const SectionIntl = connect(state => ({Prefs: state.Prefs}))(injectIntl(Section)); | |||
|
|||
export class _Sections extends React.PureComponent { | |||
componentDidMount() { | |||
this.props.window.addEventListener("beforeunload", this._gcGBlobUrls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why do we need to use beforeunload and not just unload?
beforeunload gives web content an opportunity to block a tab close, which means that when a page (even about:newtab) sets a beforeunload event handler, we have to bypass a bunch of shortcuts and do messaging to see if a tab wants to block tab closing.
So could we swap this beforeunload out for something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I just read up more about beforeunload
, and I wasn't aware that it is a cancelable event. Thinking about it again, I think unload
would be the right choice especially for the call in Card.jsx
:
this.props.window.addEventListener("beforeunload", this.removeCardImageRef); // bad
We only want to remove the image reference only if the window is guaranteed to be unloaded. Doing it in beforeunload
creates a situation in which the reference count will be reduced by 1 even in the case when the tab cancels the unload event, and this violates the invariant that we have before (i.e. reference count refers to the number of images that are currently referencing a specific blob URL).
Yes, we can swap this with unload
.
@Mardak, let's take a look at this as a solution to get rid of data uris. I'm happy to schedule a meeting with the three of us to discuss the approach |
Hey @Mardak, thoughts on this approach? |
How often are multiple cards using the same blob? I would expect it to be one to one in most cases, so the reference counting Map seems to be excessive ?? I.e., Also, given that most of the time, there will be a preloaded new tab that is also referencing the same blob… I'm not actually sure what's the expected cleanup behavior there? Does it even make sense for the main/chrome side to |
Unfortunately, I do not have numbers to back those up. But you might be right -- it will be one to one in most cases. Cards will only reference the same blob only if those URLs have the same social graph images (e.g.
Blob objects in the main process that are no longer referenced by any URLs will be garbage collected by the GC automatically. Blob URLs, on the other hand, are pointers to the actual Blob objects in memory. These URLs created through
We do not need to manually clean up Blob objects. As long as there are no Blob URLs referencing them, they will be handled by the GC automatically.
The content process cannot access blob URLs created on the main process. This is a security error:
This means that we need to do |
One piece of information that might help influence things here - I spent some time this morning re-familiarizing myself with ObjectURLs. According to the documentation for createObjectURL:
I suspect I may have misled Jay a little bit here - in the event that we create an ObjectURL in the browser UI, we have to be careful to revoke the object URLs to avoid leaking the blobs. This is because the objectURL will be tied to the lifetime of the document it was created in - in the event that this was the browser UI, then it'll stick around until the window closes (which might only be when the user quits). However, in the Activity Stream case, I suspect the URL lifetime is tied to the about:newtab document lifetime, which goes away as soon as the tab is closed or the user browses away from about:newtab. We might want to double-check this by looking at about:memory, but I suspect we can drop the manual revocation here. |
In the common case where manually reference counting with a global Map ends up with each entry being 1 count, we now have allocated memory for the Map and code and unload listeners unnecessarily as the default behavior would be to clean up the blobs when leaving about:newtab anyway. ;) And that's just the runtime impact -- there's ongoing maintenance of the extra code too :) I think just switching away from base64 string representation of binary images is the big win here. 👍
There have been bug reports related to activity stream unnecessarily using a lot of memory for copies of the same image, and part of it is caused by the preloaded new tab (and message passing and different processes). I believe the preloaded new tab or any other about:newtabs that the user has open at the same time could be in any of the available content processes, so unless we force them to be in the same content process so that Blob URLs could be shared, it doesn't seem likely (??) there will be a benefit worth the complexity as you point out. |
Terminologies: Blob Objects and Blob Object URLs (which are references to Blob Objects). UpdatesI'd like to update some facts in my previous comment based on conversations with
URLs created through
This will not work. With the URL propagation mechanism that we have currently, it doesn't really makes sense to create a mapping in the content process since the URLs will be broadcasted to all of the content processes anyway. CommentWe know that the Blob Object URL's lifetime is tied to the about:newtab document. It is true that those URLs will be revoked automatically when the tab is closed or when the user browses away from about:newtab. By this reasoning, it looks like we can drop the manual revokation since it will be handled for us automatically eventually. This is potentially problematic in two cases:
In both cases, we have a leak. The generated blob object URL that points to the Blob in the main process prevents that Blob from GC until the document dies. That is when those blob URLs will be revoked automatically and the GC can kick in to remove those Blobs with no references (blob URLs). In Activity Stream, the generated blob objects are stored in LinksCache. This means that even if there are no longer code references to it, the blob will not be cleaned up by the GC. The GC will only clean it up only after the blob in the cache has been replaced with a new one. See:
An ideal way to revoke the Blob Object URL is to do it once the activity-stream/system-addon/content-src/components/Card/Card.jsx Lines 40 to 62 in 9a5e4d2
If we were to revoke the URL in the
IdeaGoal: Switch away from base64 string representation of binary images since it provides us with a huge win.
I agree with that statement. So here's what I'll do:
We will remove We will also need to be aware that these blob URLs are copies in all of the content processes (about 0.03MB each), but I guess that is fine given the complexity that we do not need to handle and this situation is very infrequent. How does this sound to you? @Mardak |
Just to muddy the waters a bit further:
@kmaglione has mentioned once or twice in passing that he'd like to see the activity-stream pages loaded somewhere other than in the content-process for security reasons. I think the concern is that if web-content is able to break into the activity-stream sandbox somehow (by iframing or whatever), it could then use the extra AS privs to conceivably trampoline to get more privs. So the suggestion I recall from him is that we move all the AS pages into their own process, which seems like a sensible thing to do. I suspect we don't want to block on the above, but if it's useful, we could consider doing that... |
This is definitely going to happen down the line, not just for about:newtab, but all about: pages that run in content processes. They might all get lumped into the same single content process, or maybe there'll be sub-strata, but generally we think it's a good idea from a security and privacy point of view to keep Activity Stream running in processes separate from normal web content. It's certainly a thing to keep in mind moving forward, but I don't think it needs to change the strategy here, unless I've missed something. |
Implemented the approach that I mentioned in my previous comment. If all is good, I'll squash those changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits except for a question about the NB comment as I'm not sure if it's wrong or just needs some more explanation.
isImageInState(image) { | ||
const {cardImage} = this.state; | ||
if (!image && !cardImage) { | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be combined with the return false
below as just return !image && !cardImage
with a comment relating to the 3 possible outcomes that aren't already handled in the if (image && cardImage)
block
} | ||
|
||
// cardImage and new image are both blobs. | ||
return cardImage.path !== undefined && image.path === cardImage.path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pretty sure you can just do return typeof image === "string" ? image === cardImage.url : image.path === cardImage.path
(i.e., if we have a string, it's not going to match a URL anyway. and if we have a blob, the existing path would be set. I suppose one case that the single tertiary would evaluate differently is if both image.path and cardImage.path were both undefined
… but I don't think we care about that)
|
||
// NB: updateCardImage needs to be called after isImageInState since cardImage's | ||
// state is modified in updateCardImage and isImageInState uses the old state. | ||
this.updateCardImage(nextProps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've read through this comment a few times and am still confused. What's the ordering of events / lifecycle stages that leads to this needing to be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are calling updateCardImage
in componentWillMount
during the mount process. For the update process, we cannot call updateCardImage
in componentWillUpdate
because updateCardImage
calls this.setState
, and hence we're calling it in componentWillReceiveProps
.
From componentWillUpdate() / UNSAFE_componentWillUpdate()
:
Note that you cannot call this.setState() here; nor should you do anything else (e.g. dispatch a Redux action) that would trigger an update to a React component before UNSAFE_componentWillUpdate() returns.
If you need to update state in response to props changes, use getDerivedStateFromProps() instead.
See: https://reactjs.org/docs/react-component.html#updating
I have updated this part of the code to support React >= 16.3 with minimal modifications. Will push shortly.
|
||
// NB: updateCardImage needs to be called after isImageInState since cardImage's | ||
// state is modified in updateCardImage and isImageInState uses the old state. | ||
this.updateCardImage(nextProps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separately, @k88hudson I noticed React renamed this method to UNSAFE_componentWillReceiveProps https://reactjs.org/docs/react-component.html#unsafe_componentwillreceiveprops Should we be doing something differently now?
|
||
_Card.defaultProps = { | ||
link: {}, | ||
url: global.URL || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this is only for testing? Any reason not to just override the global for the test instead of passing in special props?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. It's just for testing. I modified the code to override the global for the test because it looks cleaner after thinking about it again.
system-addon/lib/Screenshots.jsm
Outdated
// OS.File object used to easily read off-thread | ||
const file = await OS.File.open(imgPath, {read: true, existing: true}); | ||
let filePathResponse = await fetch(`file://${imgPath}`); | ||
let fileContents = await filePathResponse.blob(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: const
s
* Helper to conditionally revoke the card image if it is a blob. | ||
*/ | ||
maybeRevokeImageBlob() { | ||
if (this.state.cardImage && this.state.cardImage.path !== undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think we need to worry about a valid path being false-y, so just && this.state.cardImage.path
1222e46
to
28986cd
Compare
Updated code to address comments. I also made some functions static so that we do not need to make a lot of changes when we update React to >= 16.3 (currently at 16.2.0). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Functionally this looks good. I'll let @k88hudson comment on the usage of static methods / react bits.
I'm just finishing up another couple reviews but I'll get to this by end of day 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love this approach, I just have a few comments about making this a bit easier to read. Great work 👍
|
||
if (!image) { | ||
nextState.cardImage = null; | ||
} else if (typeof image === "string") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious that using typeof image === "string"
is used to determine whether the image type is a URL or local image blob/path. IMO it would be preferable to use a named function and check for the shape of the local image object explicitly, i.e. something like
/**
* isLocalImageObject - Checks if .image property on link object is a local image
* with blob data / a path
* @param {obj|string} image
* @returns {bool} true if image is a local image object, otherwise false
* (otherwise, image will be a URL as a string)
*/
static isLocalImageObject(image) {
return image && image.data && image.path;
}
// ...
if (!image) {
nextState.cardImage = null;
} else if (_Card.isLocalImageObject(image)) {
nextState.cardImage = {url: global.URL.createObjectURL(image.data), path: image.path};
} else {
nextState.cardImage = {url: image};
}
static isImageInState(state, image) { | ||
const {cardImage} = state; | ||
|
||
if (image && cardImage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, using a more explicit check here would make this easier to read
// rename getNextStateFromProps to getDerivedStateFromProps. | ||
componentWillMount() { | ||
const nextState = _Card.getNextStateFromProps(this.props, this.state); | ||
if (nextState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach! Easier to refactor 👍
let wrapper; | ||
let getNextStateFromPropsSpy = sinon.spy(Card, "getNextStateFromProps"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally we prefer create new spies inbeforeEach
with a sandbox
and call sandbox.restore
in afterEach
to avoid having to keep track of indiviudal spies / forgetting to reset them if additional ones get added
wrapper.instance().componentWillReceiveProps({link: ""}); | ||
assert.calledOnce(getNextStateFromPropsSpy); | ||
}); | ||
it("should cleanup state when componentWillUnmount is called", () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to test cleaning up state, you should check wrapper.state()
. Maybe also change this description to should cleanup state and revoke the image blob when the component unmounts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot call wrapper.state()
when the component has been unmounted. I'll reword this part.
}); | ||
it("should call getNextStateFromProps on componentWillReceiveProps", () => { | ||
wrapper.instance().componentWillReceiveProps({link: ""}); | ||
assert.calledOnce(getNextStateFromPropsSpy); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are ok, but personally I think it would be more useful to have tests checking the correct end state given a particular input rather than the implementation details. This makes the tests more readable and also makes refactoring easier.
Something like:
it("should set url and path on state.cardImage and create blobs if link.image is a blob", () => {
wrapper = shallow(<Card link={{image: TEST_BLOB_OBJECT}} />);
// ...
const state = wrapper.state("cardImage");
assert.deepEqual(state, expectedState);
assert.calledOnce(url.createObjectURL);
});
This applies to tests for getNextStateFromProps
below as well.
} | ||
} | ||
|
||
componentWillUnmount() { | ||
_Card.maybeRevokeImageBlob(this.state); | ||
this.setState({cardImage: null}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not necessary since we are unmounting anyway
Thank you @k88hudson! I have addressed your comments and updated the PR. What do you think about the new changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are great, thank you for refactoring those tests! 👍
* Helper to conditionally revoke the previous card image if it is a blob. | ||
*/ | ||
static maybeRevokeImageBlob(prevState) { | ||
if (prevState.cardImage && prevState.cardImage.path) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just use _Card.isLocalImageObject(prevState.cardImage)
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually won't work because state has a different shape
* @returns {bool} true if image is a local image object, otherwise false | ||
* (otherwise, image will be a URL as a string) | ||
*/ | ||
static isLocalImageObject(image) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment that is only works for props since state has .url
and not .data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
… data URIs Signed-off-by: Jay Lim <jlim@mozilla.com>
This patch fixes bug 1436615 in which lots of memory is used for activity stream PNGs encoded as data URIs when sending data URIs from the parent process to the child process through IPC.
We will use blobs instead of data URIs. Screenshots will be sent as blob objects from the parent process to the child process. The Card component will receive those blob objects and generate blob object URLs. Images in the same tab which reference to the same blob object will share the same blob URL and this is achieved by keeping a global
Map
of blob paths to blob metadata (URL, reference count, and path). Blob object URLs will be revoked when they are no longer in use.treeherder: https://treeherder.mozilla.org/#/jobs?repo=try&revision=87a4335396d684a3294ec983fc7ad06bcf6181c3