The conversion changed the memory layout, resulting in corruption on multiprocessor systems. Keep the same scanline stride across the image, so that the threads working on different parts of the image don't interfere with each other. Also avoid a memcpy, so it should be faster on single-cores too.
Output all that debug talk to stderr. This frees up stdout so you can grab to a stream instead, which might be very useful. Remove a few old debug lines. And don't mention the version number, it has been 1.0 since the dawn of ages, so no one seems to care.
…l works for all box types, and update as needed.
We cannot assume that the number of chroma lines equals half the number of luma lines. There are guaranteed to be more luma lines than the vertical resolution, and more chroma lines than half the vertical resolution. But that does not make their relation exactly 2:1. Use the reported number of chroma lines instead.