Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for more than 32 tracks per media type #1290

Merged
merged 5 commits into from
Oct 3, 2021

Conversation

JoelLinn
Copy link
Contributor

@JoelLinn JoelLinn commented Jul 6, 2021

The track selection is handled by a bit mask.
This mask was 32 bit wide so the maximum number of streamable tracks was limited to 32.
If the video contained more tracks, the logic in mp4_parser.c:2867 would wrap and try to stream more than one track at once for indexes greater than 31.

This patch uses an array of uint64_t as bitmask.
The maximum number of tracks is selected at compile time. It can be increased by adding -DNGX_VOD_MAX_TRACK_COUNT=256 to the cc configuration options and defaults to 64. (This mechanism is used by other nginx modules as well, for example lua.)

A number of macros and inline functions was added to the header which already defined the existing bitset macros. They are needed to simplify operations like (mask1 & mask2) != 0. With -O2 or greater they are optimized and the mentioned operation compiles to only 3 x86 instructions (for up to 256 tracks) if vector extensions are enabled (-mavx2).

I had to add a typedef for the track mask array in order to avoid array type decay when using multi dimensional arrays as function parameters (like size_t tracks_mask[MEDIA_TYPE_COUNT][MASK_ARRAY_SIZE]).

With the python script generate_many_tracks.py it's possible to create a mp4 with many language tracks (≈600), which I successfully tested with NGX_VOD_MAX_TRACK_COUNT=640 and the dash reference client (vlc seems to have a bug with such a high number of tracks). Not that we need that amount of tracks but the test should prove that this PR fixes the underlying issue for good.

Old text:

This patch widens the masks to 64 bits.
It also changes MAX_TRACK_COUNT to 64 in order to catch unsupported files deterministically.
I am not 100% sure about changing MAX_TRACK_COUNT though, I never looked into mixing - it might be possible that it's more sensible to keep it at 1024 and add another check somewhere else.

In our use case, we have one source language audio track and two additional tracks for each language (translation only + translation with quiet source language track). That effectively limits us to only 16 languages.

Edit:
We might hit the 64 track limit at a later point, but fixing that requires larger changes so this will do for now.
I would prefer to discuss the needed changes with maintainers before implementation.

@JoelLinn
Copy link
Contributor Author

If we want to support more than 64 tracks, we need a bitset type. There are two approaches:

  • A type (plus functions) that exposes bitset functionality on a dynamic number of bits, like https://github.com/goldsborough/bitset but based on the ngx array. Number of tracks is virtually unlimited but branching (loop) is required when working with the bitset (performance impact?)
  • Static size bitset type (based on compile time constants). Has no branching (the small constant size for loops get unrolled by the compiler) and compiles to the same assembly as current master if the bitset size is sizeof(size_t).

The dynamic approach is very elegant, especially since there are no templates in C so that the static bitset would probably be a macro mess.

@erankor What are your thoughts on this? Do we need to create some benchmarks?

@erankor
Copy link
Contributor

erankor commented Jul 25, 2021

Question is whether we actually need more than 64, it sounds like quite a lot to me...
If it is needed, I'd go with the static approach. mostly because the dynamic will probably add quite a few error cases to handle. And, there will probably also be complications when working with masks of different sizes.
I don't think the static approach will be messy, can use the existing vod_is_bit_set/vod_set_bit macros + add some function to test if all bits in a mask are set.

Few small comments regarding the existing PR (most are irrelevant if you implement for >64...) -

  1. can replace (uint64_t)1 with 1ULL
  2. no need for parse_utils_extract_uint64_token - the mask is 64, but track indexes can remain 32
  3. better add a const for 0xffff..., I should have done it when I wrote it, but with 64 it's more painful :)

@JoelLinn
Copy link
Contributor Author

I agree that most won’t need more than 64 tracks. Static limit it is then👍

@JoelLinn
Copy link
Contributor Author

I revised the existing PR and replaced the track masks with an array (like it has been with the language masks before).
The original comment was updated with the new approach.

Although it's a lot of changes and additions, the compiler is able to optimise most of it out - especially for a max track count of 64 (which fits in a single register)

@JoelLinn
Copy link
Contributor Author

I rebased and added the bitset test to the new CI pipeline.

@erankor
Copy link
Contributor

erankor commented Sep 2, 2021

@JoelLinn, I briefed through the changes, and while the code seems well written, it is more complex than I'd want it to be...
I think we should try to simplify it, I was thinking about 2 things -

  1. Assume MAX_TRACK_COUNT is a multiple of 64 - remove all the code that handles residual bits and force it during compilation (#if + #error)
  2. Use uint64_t for the mask - I don't really care about 32 bit systems, if that's the reason you used size_t... I want to keep it functioning for the few folks who use them, but if it will run slower there, I'm fine with that. There are a few things that become a bit simpler if you know the size of mask elements.

@jessp01
Copy link
Contributor

jessp01 commented Sep 2, 2021

I agree with @erankor's points. I don't think this code runs on many 32bit archs and I feel these are quite rare in general nowadays.. We shouldn't break functionality of course, but slower exec in favour of cleaner/simpler code flows is a well justified trade off, IMHO.

@JoelLinn
Copy link
Contributor Author

JoelLinn commented Sep 2, 2021

I switched to uint64_t and removed the residual handling. Added an error for MAX_TRACK_COUNT % 64 != 0

vod/media_set.h Outdated Show resolved Hide resolved
vod/media_set_parser.c Outdated Show resolved Hide resolved
ngx_http_vod_request_parse.c Outdated Show resolved Hide resolved
ngx_http_vod_request_parse.c Outdated Show resolved Hide resolved
test/generate_many_tracks.py Outdated Show resolved Hide resolved
vod/manifest_utils.c Show resolved Hide resolved
vod/media_format.h Outdated Show resolved Hide resolved
vod/media_set_parser.c Outdated Show resolved Hide resolved
vod/mkv/mkv_format.c Outdated Show resolved Hide resolved
vod/mp4/mp4_parser.c Outdated Show resolved Hide resolved
@JoelLinn
Copy link
Contributor Author

Integrated the requested changes, I hope I didn't miss anything.

@JoelLinn JoelLinn changed the title Allow for up to 64 tracks per media type (was 32) Allow for more than 32 tracks per media type Sep 12, 2021
Copy link
Contributor

@erankor erankor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelLinn added a few more comments, after these are fixed, will need to run regression tests, and then we should be able to merge

ngx_http_vod_request_parse.c Outdated Show resolved Hide resolved
vod/common.h Outdated Show resolved Hide resolved
vod/manifest_utils.c Outdated Show resolved Hide resolved
vod/manifest_utils.c Outdated Show resolved Hide resolved
vod/manifest_utils.c Outdated Show resolved Hide resolved
vod/manifest_utils.h Outdated Show resolved Hide resolved
vod/manifest_utils.h Outdated Show resolved Hide resolved
vod/media_format.h Outdated Show resolved Hide resolved
vod/media_format.h Outdated Show resolved Hide resolved
@JoelLinn
Copy link
Contributor Author

Pushed changes.

@JoelLinn
Copy link
Contributor Author

@JoelLinn added a few more comments, after these are fixed, will need to run regression tests, and then we should be able to merge

Any results from the regression tests yet?

Copy link
Contributor

@erankor erankor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoelLinn the regression tests completed successfully!
However, I reviewed the whole thing again now, and have a couple of small comments.
We're getting there... :)

vod/media_set_parser.c Outdated Show resolved Hide resolved
vod/common.h Outdated Show resolved Hide resolved
- Better efficiency by using larger type
- Additional macros/functions
- Convert uses for language mask
- for masks, use bitset based on array
- default to 64 tracks (masks fit into single integer on 64-bit systems)
@JoelLinn
Copy link
Contributor Author

JoelLinn commented Oct 1, 2021

@JoelLinn the regression tests completed successfully! However, I reviewed the whole thing again now, and have a couple of small comments. We're getting there... :)

Relieved to hear there were no regressions...
I made the two small changes you suggested and pushed them.

@erankor erankor merged commit 1cb4a5c into kaltura:master Oct 3, 2021
@erankor
Copy link
Contributor

erankor commented Oct 3, 2021

Merged, thanks!

@JoelLinn
Copy link
Contributor Author

JoelLinn commented Oct 3, 2021

Thanks for putting the work into the reviews. 👍
PRs that touch many files and existing functionality are strenuous to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants