-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Opening this mostly for my own sake, for future reference
Cache improvements
I haven't put too much thoughts in our existing cache and mostly tried for it to be safe. Maybe we can make it smarter to increase cache hits. For example, right now we expect an exact match in stream resolution - maybe a decoder can still be re-used for a stream whose resolution is strictly smaller?
I think this is slightly related to decoder re-configuration:
Decoder re-configuration
The NVDEC docs mention we could re-configure an existing decoder in some cases, typically when a "sequence change" occurs (stream resolution change).
DALI has some code-path for that too.
We cache the decoders, while the docs assume a new decoder would be instantiated from scratch, so maybe that's not needed.
CUDA Streams
We currently hard-code both the NVDEC stream and the NPP stream (for color-conversion) to be the current stream (as, e.g., specified by a context manager). Maybe... those could be different?
Threaded implementation
-
NVDEC docs mention the synchronous “mapping” stage could be done in a separate thread.
-
In this section they mention there could actually be 3 threads: 1 for demuxing (FFmpeg), 1 for decoding, one for mapping.
This shouldn’t be too hard to implement, but out of scope for now, and we should factor-in the maintenance cost of having to manage thread pools. It's also unclear to me whether this would make any difference when the user is already spawning N threads, each with its own VideoDecoder()
instead. It's possible we're already maxing out NVDEC there, and if that's the case the benefits of a threaded implementation would be minimal.