Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't copy large Arcs over IPC #10230

Open
Manishearth opened this issue Mar 28, 2016 · 24 comments
Open

Don't copy large Arcs over IPC #10230

Manishearth opened this issue Mar 28, 2016 · 24 comments

Comments

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Mar 28, 2016

Related to #9555, part of the cause (fixing this is not a complete solution).

result.send(Reply::GetFontTemplateReply(maybe_font_template)).unwrap();

We're sending an Arc<FontTemplateData> over IPC here, where FontTemplateData is a largish bucket of bytes. This is probably a holdover from the pre-IPC days.

This makes sense if we were using std's MPSC channels, but the Arc is redundant over IPC since a completely new Arc will be created on the other side and the entire bucket of bytes is copied over through the shared memory. The copying and shmem allocation here is pretty wasteful (and is probably causing the mysterious "out of buffers" issue leading to that panic.

I'm not sure if there is a solution that lets us share memory across processes safely. I think you can create an Arc-like abstraction with shm-backed storage, after all, shm itself is "reference counted" by the kernel and we can mark shm as readonly.

(How useful this would be depends on how the font cache thread determines when to eject an item from its cache. I haven't yet looked at this code; it fiddles with weak pointers.)

cc @pcwalton @antrik

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Mar 28, 2016

Not entirely certain but https://doc.servo.org/ipc_channel/ipc/struct.IpcSharedMemory.html might be what I'm looking for

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Mar 28, 2016

Yeah, it seems like IPCSharedMemory is being used everywhere for this purpose. Not sure how to make it work with the weakptr stuff in the font cache task.

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Mar 28, 2016

I've sort of figured out how to make this work. FontTemplateData is placed in readonly shm-backed storage (IPCSharedMemory?). The font cache task maintains a map of FontTemplate objects, which contain Option<SHMBacked<FontTemplateData>>. Note that the data is somewhere in shared memory, SHMBacked<T> is just a shmem handle. When a task requests a template, a copy of the handle is sent over (if one doesn't exist yet, it is created, like we do today), wrapped in a struct containing a heap key. A refcount associated with the FontTemplate is also bumped. In the destructor for the handle in the other process, a message containing the map key is sent back to the font cache thread, which decrements the refcount and evicts the template from the cache if the refcount reaches zero. The destructor also runs shm_unlink.

The process of evicting the template from the cache involves unlinking the shm-backed fonttemplatedata, but the FontTemplate struct hangs around with a None in it, much like what happens in the current code.

Note that the backing storage is managed by the kernel. In case of a rogue process, the most it can do is hold on to this memory longer even after the font cache thread thinks that it has deallocated it (the kernel won't deallocate until everyone has unlinked). It can also request the font cache thread to reduce the refcounts of arbitrary fonttemplates, but this isn't a safety issue; this just means that certain font templates may get evicted early whilst other processes are still using them.


Actually, come to think of it, we can use O_CREAT|O_EXCL somehow to make this work directly; just store the name and try to recreate it every time. We'd have to store the filename in FontTemplate, and attempt to open it with those flags. If we get an EEXIST, just send the fd across. If we get no error, populate the newly-created shmem buffer. If we get EACCESS, this means that the filename we have is stale (and was evicted by the kernel already) and has been claimed by another process.

There's a small issue with this solution: If a font template gets evicted by the kernel, and a new shm object is created with the same name by the same process, we might end up sending the wrong shm handle over. We'd need to avoid name reuse, perhaps by maintaining a list of shm names in use in the font cache process. Alternatively, use names that contain the name of the font itself.

I'm not sure if either of these solutions will work (there's probably a simpler solution I'm overlooking) since I'm very unclear on exactly how the SHM stuff works. Let me know if something is off!

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Mar 28, 2016

IpcSharedMemory is indeed intended for this, and we use it in other places.

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Mar 28, 2016

In that case we can definitely use the first option above. I'm not sure if the second option can be implemented with a platform-agnostic API.

We'd have to fix IPCSharedMemory on linux to try to use tmpfs harder or just use shmem though. Otherwise we're just reading from a regular file each time, which is precisely what the cache tries to avoid.

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Mar 28, 2016

@glennw Does the caching method outlined in #10230 (comment) look okay to you? Is there a better way of doing this?

@pcwalton
Copy link
Contributor

@pcwalton pcwalton commented Mar 29, 2016

IpcSharedMemory should be fixed anyway if it's not actually using shared memory on certain distros. I wouldn't block the fix for this on doing that.

@antrik
Copy link
Contributor

@antrik antrik commented Apr 3, 2016

I think there are two issues to be considered here. One is the fact that we are actually sending an Arc over IPC -- this seems very, very wrong. AIUI, this will totally break the reference counting (and thus result in memory leaks) in the receiver whenever the Arc has any other outstanding references at the time of serialisation.

I'm rather shocked that the type system doesn't protect us from that. Why does serde allow deriving Serialise for non-copy types? Is that ever useful? Maybe we should explicitly add a Copy bound to IpcChannel? (I don't think sending non-copy types over IPC can ever be correct...)

The other concern is the inefficiency of actually copying the data when sending as IPC messages. I hope CPU performance won't be too much of a concern once I'm done optimising the Linux implementation; but having the data redundant in memory in each process indeed seems considerable overhead. (Quickly going up to several MiB per process for just a handful of fonts...)

The first approach proposed by @Manishearth seems a bit fragile to me though: AIUI if a process dies without explicitly releasing the fonts, the cache process will still think it's in use? We would need some more reliable way to keep track of usage: perhaps monitoring when the client channel gets closed, or something along these lines.

(Not sure whether the second approach would avoid this problem?...)

Having said that, if the font data is just a straight copy of the font files -- which from a quick glance seems to be the case at least in some situations (maybe most?) -- caching it in the font process is not actually an optimisation at all. The operating system's virtual memory management can handle it just as well -- and in fact better -- if it's a file mapped directly from backing storage, rather than a copy in anonymous memory. And the handling gets simpler as well: just send a copy of the open font file descriptor. (No need for client reference tracking at all: it can stay mapped in the cache task indefinitely without substantial overhead, as the OS will just discard the pages from main memory if it's not used in a while.)

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Apr 3, 2016

I think there are two issues to be considered here. One is the fact that we are actually sending an Arc over IPC -- this seems very, very wrong. AIUI, this will totally break the reference counting (and thus result in memory leaks) in the receiver whenever the Arc has any other outstanding references at the time of serialisation.

It's wrong, but not that wrong. There will be two unrelated Arcs on either side of the channel. Serde doesn't copy pointers, it deep clones them.

Why does serde allow deriving Serialise for non-copy types? Is that ever useful?

Sure. It definitely makes sense for single owner types like Vec.

The thing is, folks want to use Serde to persist data across sessions, too. In this situation, even an Arc or an Rc is totally fine. There might be an Arc buried within some struct from an external library that you're using, and you want to be able to easily serialize/deserialize it. Serde needs to address this use case, so mostly everything is serializable.

A Copy bound to IPCChannel is too restrictive. Serde recreates vectors and stuff, so sending a vector across a channel is totally ok. Sending Arcs over a channel is an indication of a problem, but not necessarily one. We can use a new marker trait and create such a bound, however I'm not certain how useful this is.

The other concern is the inefficiency of actually copying the data when sending as IPC messages. I hope CPU performance won't be too much of a concern once I'm done optimising the Linux implementation; but having the data redundant in memory in each process indeed seems considerable overhead.

But if we use IPCSharedMemory, there's no copy, right? The idea is that the data is stored in shmem-backed storage from the get-go, and just share fds with other processes.

AIUI if a process dies without explicitly releasing the fonts, the cache process will still think it's in use?

Yes, but that's pretty much par for the game here. You have the same situation with Arcs over threads.

(Not sure whether the second approach would avoid this problem?...)

The second approach relies on the kernel entirely, so it should be fine.

Having said that, if the font data is just a straight copy of the font files -- which from a quick glance seems to be the case at least in some situations (maybe most?) -- caching it in the font process is not actually an optimisation at all. The operating system's virtual memory management can handle it just as well -- and in fact better

Hm, this is an interesting point. However, we may not want to rely on the OS's caching policy since it might lead to a lot of things staying cached when we don't want them to. @glennw might know more.

@metajack
Copy link
Contributor

@metajack metajack commented Apr 4, 2016

Does the deserialized Arc have its strong reference count reset to 1 or something appropriate? I think @antrik's point stands about this causing memory leaks if care is not taken.

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Apr 4, 2016

It just creates a new Arc.

(It would be hard not to by accident, serde doesn't have access to Arc's internals so all of this would require unsafe)

@antrik
Copy link
Contributor

@antrik antrik commented Apr 4, 2016

It just creates a new Arc.

Ah, interesting. Does it have special handling for Arcs, or is that what it always does with opaque types?

Sending Arcs over a channel is an indication of a problem, but not necessarily one.

Given the above explanation, I no longer see this as a problem at all...

(The inefficiency of sending large data seems an orthogonal problem to me, not really bound to Arcs in any way?)

AIUI if a process dies without explicitly releasing the fonts, the cache process will still think it's in use?

Yes, but that's pretty much par for the game here. You have the same situation with Arcs over threads.

My understanding is that thread-bound references usually should get released on thread panic during unwinding?

Either way, I don't think the potential of a dying content process leaving large resources such as fonts dangling is a matter we can take lightly...

Having said that, if the font data is just a straight copy of the font files -- which from a quick glance seems to be the case at least in some situations (maybe most?) -- caching it in the font process is not actually an optimisation at all. The operating system's virtual memory management can handle it just as well -- and in fact better

Hm, this is an interesting point. However, we may not want to rely on the OS's caching policy since it might lead to a lot of things staying cached when we don't want them to. @glennw might know more.

The point is that the OS's caching of file contents essentially comes for free, as the OS can just discard the buffers if they are not used for a while and there is a better use for the memory. This is not the case with any custom cache implementation using dedicated memory that the OS is not in a position to discard -- leaving this to the OS is strictly better here.

(This is typically the case for various kinds of caching, as -- at least on monolithic kernels -- the OS generally knows more about memory utilisation etc. than applications do...)

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Apr 4, 2016

Ah, interesting. Does it have special handling for Arcs, or is that what it always does with opaque types?

At no point does serde just serialize by picking up the bytes. It's not that kind of serializer. All serialize/deserialize impls for stdlib types are custom within serde itself.

Given the above explanation, I no longer see this as a problem at all...

The problem is that it doesn't work as expected (even though what is "expected" would lead to a segfault when used over IPC 😄). This is one of the places where ipc-channel doesn't mirror sync::mpsc in behavior and we should tread lightly here.

For example, the font cache thread uses Arcs and Weaks to manage its cache, which works file over sync::mpsc. If the Arc never actually leaves the thread because of IPC and is just deep cloned out, this cache won't work.

The point is that the OS's caching of file contents essentially comes for free

I guess so. It might be better to use tmpfs for this if available since the OS will cache that as well and it's faster. I'm not yet sure if one can get the OS to cache shmem; the Mach API has some nifty bits that make it possible to write an OS-managed shmem. The O_CREAT|O_EXCL trick might work on linux, but I'm not sure.

@antrik
Copy link
Contributor

@antrik antrik commented Apr 4, 2016

The problem is that it doesn't work as expected

Depends on your expectations I guess ;-)

I for one certainly do not expect a reference to magically span process boundaries...

The point is that the OS's caching of file contents essentially comes for free

I guess so. It might be better to use tmpfs for this if available since the OS will cache that as well and it's faster.

I think we are still not on the same page here.

One important bit I didn't mention explicitly that might not be obvious, is that the processes should map the font files directly into their address spaces with mmap(), rather then copying their contents with read(). The OS then only populates these mapped memory regions (by allocating physical memory pages, and reading in the contents from disk) on demand when they are actually accessed.

Later, when memory get full, and the pages haven't been used in a while, the OS will evict them. (They will be paged in again, if they happen to be accessed again at some later point.)

The crucial point is that, as we never write to this memory, the pages are "clean", and consequently the OS can just discard them without writing anything back: it knows that the contents on disk are the same. That's not the case if the data was copied into some separate memory region instead -- such as a file on a tmpfs, or a SHM region. If such anonymous pages need to be evicted, the OS has to actually store the memory contents to swap space, as it knows no other place to keep it.

Thus, keeping a copy of a font file in tmpfs or SHM is not more performant. Paging the data back in from swap space instead of the filesystem is only marginally faster (unless you happen to have some extra-fast swap device -- which is a niche case); but paging it out to swap space first instead of just discarding is much more expensive.

(Moreover, under severe memory pressure, the resulting disk thrashing tends to slow down the entire system, rather than just the affected process...)

The only case where we need to use a tmpfs file, or (preferably) a SHM region, is if the font doesn't come from an actual file. From a cursory look at the code, that might be the case in some situations; but I don't really know for sure without a more thorough investigation. (Or some input from @glennw I guess ;-) )

the Mach API has some nifty bits that make it possible to write an OS-managed shmem.

Actually, it's rather the other way round: Mach's external pager interface enables user processes to back memory objects that look like system-provided objects. (Such as memory-mappable files.)

We could use that, to have the font cache task provide a file object backed by its own memory, so it could be mapped by the content processes just like an ordinary file, in cases where the font doesn't come from an actual file... However, I'm not sure it's worth the effort -- especially considering that Linux (and many other systems I guess) require a different (probably SHM-based) approach to be implemented anyways.

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Apr 4, 2016

I for one certainly do not expect a reference to magically span process boundaries...

From an abstraction point of view, I meant 😄 ipc-channel as such is an abstraction one expects to behave similar to sync::mpsc. Here, it doesn't. We should be wary of that.

Later, when memory get full, and the pages haven't been used in a while, the OS will evict them.

Would they be shared, though? We want all processes requesting it to have readonly access to this file without duplicating it in memory.

@antrik
Copy link
Contributor

@antrik antrik commented Apr 4, 2016

@Manishearth yes, I'm pretty sure the OS shares multiple mappings of the same file (that's how shared libraries work for example) -- and even more so, if we transfer an open FD, rather then opening the file separately in every process...

BTW, GitHub fails to show the end of my previous message; so here it is again:

Actually, it's rather the other way round: Mach's external pager interface enables user processes to back memory objects that look like system-provided objects. (Such as memory-mappable files.)

We could use that, to have the font cache task provide a file object backed by its own memory, so it could be mapped by the content processes just like an ordinary file, in cases where the font doesn't come from an actual file... However, I'm not sure it's worth the effort -- especially considering that Linux (and many other systems I guess) require a different (probably SHM-based) approach to be implemented anyways.

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Apr 4, 2016

Right. This sounds like one way to do it, then. Does IPC-channel handle file descriptors well, or does this API need to be built into it?

@glennw do you think letting the OS handle the cache via mmap is okay?

@antrik
Copy link
Contributor

@antrik antrik commented Apr 4, 2016

Also, feel free to assign this issue to me -- unless it's considered urgent. It looks pretty interesting, and I'm already somewhat familiar with parts of the code in question. I would like to work on it once I'm done with ipc-channel...

@antrik
Copy link
Contributor

@antrik antrik commented Apr 4, 2016

@Manishearth hm, good point about ipc-channel and FDs... It does have infrastructure for sending FDs of other IPC channels and of shared memory regions -- but not for generic FDs as far as I can see :-( I guess this needs to be added first.

@Manishearth
Copy link
Member Author

@Manishearth Manishearth commented Apr 4, 2016

I don't think it's considered urgent. I'm personally a tad annoyed that the font cache on my machine is doing multiple extra copies and File I/O, but this doesn't block anything per se. The panic cascade that this used to cause is already fixed, of course. Perf is still ok on my machine.

There are multiple parts to this:

  • Making sure IPCSharedMemory doesn't do silly things on linux
  • Figuring out and implementing an IPC-safe cache policy for the font cache, either via IPCSharedMemory or by mmaped fds. Even if we decide against the mmap cache policy we should still have an abstraction in place for the font cache.

We should also perhaps have ipc-channel report (in debug mode or with a flag enabled) when it is doing copies of really large things (ignoring things in IPCSharedMemory of course), to catch future bugs of this kind.

@antrik
Copy link
Contributor

@antrik antrik commented Jul 29, 2016

I thought about this a bit more; and now I'm wondering whether instead of adding generic support in ipc-channel for sending file descriptors (as I suggested previously) that can represent mapped files or shared memory, it might perhaps be better to extend the existing IpcSharedMemory mechanism to work both with mapped files passed in by the caller, as well as "real" anonymous shared memory.

This would avoid complicating the descriptor handling in ipc-channel; and it might be more elegant for the Mach backend, where it's possible to work with memory objects rather than file descriptors...

@nox
Copy link
Member

@nox nox commented Oct 1, 2017

We still don't use IpcSharedMemory or anything like that for fonts, but we do for images.

@antrik
Copy link
Contributor

@antrik antrik commented Oct 1, 2017

@nox images are not typically loaded from files though, right? If we ignore efficient file mapping, the task becomes rather trivial...

@nox
Copy link
Member

@nox nox commented Oct 1, 2017

@antrik Oh right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.