Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup avformat-based preprocessors #2665

Merged
merged 9 commits into from May 27, 2021

Conversation

lu-zero
Copy link
Contributor

@lu-zero lu-zero commented May 15, 2021

If you like the set I can convert the other modules likewise and further factor out the common parts.

@januscla
Copy link

Thanks for your contribution, @lu-zero! Please make sure you sign our CLA, as it's a required step before we can merge this.

@lminiero
Copy link
Member

Thanks for the contribution @lu-zero! I definitely see value in the more compact approach you're suggesting, since we had indeed many similar sections that could be streamlined.

I only have one doubt related to the changes you made on AVPacket management, probably related to my limited knowledge of its memory management internals: I see the packet is allocated once before the loop is started, and then at each loop there's a av_packet_unref call that precedes its usage. Won't that first unref actually get rid of the packet before we get a chance of using it? First iteration in the loop aside, it's also not clear whether av_write_frame will add a reference to the packet when using it, as the documentation says it does not take ownership.

There is no need to allocate an encoder to mux. Makes possible to use
the postprocessor without having a vp8/vp9 encoder in avcodec.
@lu-zero
Copy link
Contributor Author

lu-zero commented May 17, 2021

packet_unref() does the same work as the former init_packet(). Since we do not use av_buffer, there is no actual refcounting on the payload.

void av_packet_unref(AVPacket *pkt)
{
    av_packet_free_side_data(pkt); // NO-OP we do not have side data
    av_buffer_unref(&pkt->buf); // NO-OP we do not use the buffer
    av_init_packet(pkt); // same as the deprecated function
    pkt->data = NULL; // basic sanitization
    pkt->size = 0;
}

@lminiero
Copy link
Member

Ack, thanks for the clarification 👍

@lu-zero
Copy link
Contributor Author

lu-zero commented May 19, 2021

@lminiero do you have test samples I can use to make sure everything works after the changes? I have samples for opus and vp8 but not for the others.

@lminiero
Copy link
Member

@lu-zero I don't have any recording in the other formats, but they should be quite easy to generate. If you have a Janus instance you control somewhere, you can open the echotest demo with a custom codec preference, e.g.:

https://.../echotest.html?vcodec=h264

to force a specific video codec to be used (acodec does the same for audio), and then on the JS console send a record: true command to start recording the session:

echotest.send({ message: { record: true }});

This will create a different mjr file for each medium (in EchoTest audio, video and data). Sending record: false or stopping the demo will finalize the files, and allow you to use them.

@lminiero
Copy link
Member

@lu-zero please let me know when you think the refactoring is done, so that I know when I can do a review 👍
I'm currently working on a separate branch that, due to some new features, will touch the way some of the packets are processed in the pprec tool as well, so I'll also need to know what I'll have to take into account in terms of refactoring there.

@lu-zero
Copy link
Contributor Author

lu-zero commented May 21, 2021

Do you have a known use for the silence packets in opus? I just commented it out since locally it seems to cause more problems that it solves. If you are happy with what's in there I'd say it could land.

@lminiero
Copy link
Member

Do you have a known use for the silence packets in opus? I just commented it out since locally it seems to cause more problems that it solves. If you are happy with what's in there I'd say it could land.

Actually yes, that's quite important. When we detect a packet loss, we have to inject a silence packet, otherwise the duration of the processed stream will be broken. That's because, at least from my understanding of libogg, packets can only be added to the format with monotonically increasing packet IDs, and so skipping missing packets reduces the duration of the stream (as the 20ms that were missed are not reflected in the audio recording). That might not be important in case of very few losses, but if there's many of them, this could lead to problems, like out-of-sync audio and video streams if they need to be combined ex-post.

Copy link
Member

@lminiero lminiero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a look at the code, and it looks like a really useful refactoring, thanks! I just added a few minor editorial comments inline (mostly related to the project code style), and a note on the Opus silence ingestion we discussed a few days ago. I haven't tested this yet, but plan to do it soon.

@@ -585,6 +585,8 @@ janus_pp_rec_SOURCES = \
postprocessing/pp-h264.h \
postprocessing/pp-av1.c \
postprocessing/pp-av1.h \
postprocessing/pp-avformat.h \
postprocessing/pp-avformat.c \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small nit, can you invert the order of .c and .h file here to match the others in the list?

#else
vStream = avformat_new_stream(fctx, 0);

vStream = janus_pp_new_video_avstream(fctx, AV_CODEC_ID_AV1, max_width, max_height);;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit: there's a double semicolon at the end.


/* We save the metadata part as a comment (see #1189) */
if(metadata)
av_dict_set(&ctx->metadata, "comment", metadata, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken indentation in the above lines: per the project code style, we use tabs, not spaces.

JANUS_LOG(LOG_ERR, "Error guessing format\n");
avformat_free_context(ctx);
return NULL;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same indentation issues for the above lines.


/* WebM output */
fctx = janus_pp_create_avformatcontext("ogg", metadata, destination);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that if we wanted to optionally allow people to output the processing to .mka, as you said in #2658, we'd only need to make the "ogg" part a configurable string where we can put a different format target, correct? Same for videos, I guess, where VP8/VP9 could be outputted to .mkv rather than .webm as we do now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the plan, ideally should be easy to select the output format (mp4 or mkv or webm).

I'll try to make the postprocessor able to mux both audio and video in a single pass later probably. One step at time first though :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to make the postprocessor able to mux both audio and video in a single pass later probably. One step at time first though :)

That's actually not needed, since our .mjr files only store one media stream anyway (audio, video or data). Any muxing after that should be done separately, e.g., using ffmpeg itself, and possibly using timing info from the .mjr header to sort out sync issues.

av_packet_unref(pkt);
pkt->stream_index = 0;
pkt->data = buffer;
pkt->size = bytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken indentation (spaces).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like in one of the machines I tested I do not have editorconfig support. Do you happen to have a clang-format setup I can use?

pkt->stream_index = 0;
pkt->data = buffer;
pkt->size = bytes;
pkt->pts = pkt->dts = av_rescale_q(tmp->ts - list->ts, timebase, fctx->streams[0]->time_base);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this maybe why you were getting issues with the silence packets being inserted manually as we did when using libogg directly? Does the ogg format target in ffmpeg automatically insert silence audio packets looking at what the pts is, or is still increased monotonically even in case of gaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample I used to test has the audio lasting much longer than the video, I thought the problem was in the audio, but probably it is the video having problems.

I still need to get a local setup to get better samples.

Inserting the silence seems fine with both ogg and mkv output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inserting the silence seems fine with both ogg and mkv output.

Ack, thanks!

I still need to get a local setup to get better samples.

There are a couple of old .mjr files you can play with in this folder of the repo: they use the old format, but should be ok if you want to test audio and video streams that have the same length.

#else
vStream->codec->codec_id = AV_CODEC_ID_VP8;
codec_id = AV_CODEC_ID_VP8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two codec_id lines don't need to be indented (the one below isn't, for instance), as macro ifs for us are not like "regular" ifs.

@lu-zero
Copy link
Contributor Author

lu-zero commented May 24, 2021

The formatting problems should be addressed now, I hope :)

'O', 'p', 'u', 's', 'H', 'e', 'a', 'd',
1, 2, 56, 1, 128, 187,
0, 0, 0, 0, 0,
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops looks like I missed this space-indentation in the previous review, sorry...

@lminiero
Copy link
Member

The formatting problems should be addressed now, I hope :)

Thanks! I just found one I missed before (I added a comment above), but the code changes do look fine to me. I plan to make some tests either later today or tomorrow morning, and in case they work fine for me it will be good to merge. Please make sure you sign our CLA before we get to that, though, as we won't be able to merge otherwise.

@lu-zero
Copy link
Contributor Author

lu-zero commented May 24, 2021

I think I already signed the CLA, it does not show on your side?

@lminiero
Copy link
Member

Mh no, it's not there. Did you fill the form at the end of the page? Maybe you only did the Github authentication on top? (the page is indeed a bit confusing in that regard)

@lu-zero
Copy link
Contributor Author

lu-zero commented May 24, 2021

I completed the form at least now.

@lminiero
Copy link
Member

Thanks, I do see it now! ✌️

av_init_packet(&avpacket);
avpacket.data = (uint8_t *)buffer;
avpacket.size = bytes;
AVPacket *avpacket = av_packet_alloc();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lu-zero any reason why this allocation is inside the loop? I see for all other codecs you just do it once before the while, and just unref/initialize the packet at each loop, while for G.722 this is happening in the while instead. Is it because we're actually decoding something here, rather than just saving to a container?

@lu-zero
Copy link
Contributor Author

lu-zero commented May 25, 2021 via email

if(pos >= nextPos) {
JANUS_LOG(LOG_WARN, "[SKIP] pos: %06" SCNu64 ", skipping remaining silence\n", pos);
JANUS_LOG(LOG_WARN, "[SKIP] pos: %06" SCNu64 ", skipping remaining silence\n", pos / 48 / 20 + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh looking at these changes again (with fresh eyes from @atoppi) there seems to be something broken in the way pos is updated and handled now. I see how you decided to normalize both pos and nextPos, but before pos depended on i, while now it doesn't: since tmp never changes in the for loop, neither will the value of pos, making the checks broken. Considering pos is used as a basis for the av_rescale_q below, this would likely not insert all the silences we need, in case multiple packets were lost in a row.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. Updated to increase the pos by 20 * 48.

@lu-zero lu-zero force-pushed the cleanup_avformat branch 2 times, most recently from 7f0a9ed to cd57e6c Compare May 25, 2021 10:53
@atoppi
Copy link
Member

atoppi commented May 25, 2021

hi @lu-zero ,
gave this PR a run with an opus recording sample (attached).
This specific sample has many 1-packet gaps and a single large gap of consecutive packets.
The new post-processor correctly filled all the gaps, so everything looks fine.

Anyway I have observed some differences in the ffprobe output:

Input #0, ogg, from 'old-pp.opus':
  Duration: 00:00:34.98, start: 0.000000, bitrate: 39 kb/s
    Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
    Metadata:
      comment         : {"t": "a", "c": "opus", "s": 1621947438860716, "u": 1621947438863033}
Input #0, ogg, from 'new-pp.opus':
  Duration: 00:00:34.96, start: -0.020000, bitrate: 30 kb/s
    Stream #0:0: Audio: opus, 48000 Hz, stereo, fltp
    Metadata:
      comment         : {"t": "a", "c": "opus", "s": 1621947438860716, "u": 1621947438863033}
      encoder         : Lavf58.29.100

As you can see the opus file from the PR (new-pp.opus):

  1. has a shorter duration (20 ms) but a negative start time that seems to "compensate"
  2. has a lower bitrate (30kbps vs 39kbps), indeed the file size is smaller (130K vs 171K)

What is the reason of those differences?

@lu-zero
Copy link
Contributor Author

lu-zero commented May 25, 2021

lavf adds the pre-skip, and has a different muxing strategy apparently, the same amount of packets is stored though.

ffprobe -show_packets shows that all the packets are stored that, I can see using vimdiff.

@atoppi
Copy link
Member

atoppi commented May 26, 2021

Thanks for clarifying, according to the rfc:

A 'pre-skip' field in the ID header (see Section 5.1) signals the
number of samples that SHOULD be skipped (decoded but discarded) at
the beginning of the stream

AFAIU that means that a player will basically skip (e.g. not play) the first 20 ms (1 packet) of the opus file, while the current pp-rec is generating a file that will play since the very first packet.
Still I don't understand if the two files are "synchronized", I mean if they start playing in the same point, have same duration and there is any "loss" of information in the new one (e.g. will that pre-skip packet played somehow? why drop it?).
Sorry for the questions, I'm not exactly a codec guy.

As of my other point

has a lower bitrate (30kbps vs 39kbps), indeed the file size is smaller (130K vs 171K)

do you have any comment ?

@lu-zero
Copy link
Contributor Author

lu-zero commented May 26, 2021

It is an estimate

    if (ic->pb && (filesize = avio_size(ic->pb)) > 0 && ic->duration > 0) {
        /* compute the bitrate */
        double bitrate = (double) filesize * 8.0 * AV_TIME_BASE /
                         (double) ic->duration;
        if (bitrate >= 0 && bitrate <= INT64_MAX)
            ic->bit_rate = bitrate;
    }

@lminiero
Copy link
Member

@lu-zero I think what @atoppi is mentioning are two separate things, that may or may not be an issue, and that's what we're trying to figure out:

  1. If the pre-skip is causing the first 20ms of audio to be ignored by players (and media processors), that's a problem, as that may cause some desync problems when we're going to use the resulting media file in other contexts, e.g., to mux it with the corresponding video file. Is this something that libavformat always does, or is there a way to disable it? Or is it related to the pts/dts generated by av_rescale_q, that may generate negative values at startup?
  2. The average bitstream is not just a matter of how it's calculated, but seems to be a consequence of files that actually are of a different size: the same MJR file (with some packet losses to test the silence ingestion) was converted to a 170KB file by the current processor, while it was converted to a 130KB file by the new one. This means that there are 40KB unaccounted for, one way or another, and it's not clear if it's because of differences in how silence is ingested. We'd expect the size to be more or less the same, as while there may be differences in how the ogg file header is created when using libogg difrecly vs. libavformat, we're manually adding a static silent audio packet, and packets that weren't lost all are expected to have the same in both files as well. We suspected it might be because of some transcoding happening somewhere (which would indeed result in a different Opus bitstream with a different bitrate), but looking at the code it does seem to be just saving the data to the container as we did for video streams already. Do you know what may be causing such a big difference instead?

Thanks!

@lu-zero
Copy link
Contributor Author

lu-zero commented May 26, 2021

@lu-zero I think what @atoppi is mentioning are two separate things, that may or may not be an issue, and that's what we're trying to figure out:

1. If the pre-skip is causing the first 20ms of audio to be ignored by players (and media processors), that's a problem, as that may cause some desync problems when we're going to use the resulting media file in other contexts, e.g., to mux it with the corresponding video file. Is this something that libavformat always does, or is there a way to disable it? Or is it related to the pts/dts generated by `av_rescale_q`, that may generate negative values at startup?

I set explicitly the packet duration. The original code assumed 20ms packets, I made it explicit. Now the timestamps are identical.

2. The average bitstream is not just a matter of how it's calculated, but seems to be a consequence of files that actually are of a different size: the same MJR file (with some packet losses to test the silence ingestion) was converted to a 170KB file by the current processor, while it was converted to a 130KB file by the new one. This means that there are 40KB unaccounted for, one way or another, and it's not clear if it's because of differences in how silence is ingested. We'd expect the size to be more or less the same, as while there may be differences in how the ogg file header is created when using libogg difrecly vs. libavformat, we're manually adding a static silent audio packet, and packets that weren't lost all are expected to have the same in both files as well. We suspected it might be because of some transcoding happening somewhere (which would indeed result in a different Opus bitstream with a different bitrate), but looking at the code it does seem to be just saving the data to the container as we did for video streams already. Do you know what may be causing such a big difference instead?

It is the muxing overhead, you can use ffprobe -show_packets ${file} | grep size | cut -d '=' -f 2 | paste -sd+ | bc to check.

Your sample has 128947 bytes of actual data.

You can remux using ffmpeg -i old.opus -c copy new.opus and get

172K	old.opus
132K	new.opus

@lminiero
Copy link
Member

Oh I wasn't aware of muxing overhead and that it could be so high, thanks for the clarification and for the helpful snippet!

I set explicitly the packet duration. The original code assumed 20ms packets, I made it explicit. Now the timestamps are identical.

Do you mean in a new commit? I don't see it yet.

@lu-zero
Copy link
Contributor Author

lu-zero commented May 26, 2021

The push happened a bit later than the message :)

@atoppi
Copy link
Member

atoppi commented May 27, 2021

@lu-zero tried with another recording sample of my voice and got the following with the new pp-rec

[ogg @ 0x61b000000080] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
[ogg @ 0x61b000000080] Encoder did not produce proper pts, making some up.
audio-new.opus is 70993 bytes
Bye!

Current version does not show any warning.
The generated file seems ok, though.

@lu-zero
Copy link
Contributor Author

lu-zero commented May 27, 2021

Pull and try again please.

@atoppi
Copy link
Member

atoppi commented May 27, 2021

thanks, now everything looks good to me 👍

@lminiero
Copy link
Member

We can merge then! ✌️
Thanks again for your contribution @lu-zero! (and for your patience going through our silly questions 🤭 )

@lminiero lminiero merged commit 161fe7a into meetecho:master May 27, 2021
@lu-zero lu-zero deleted the cleanup_avformat branch May 27, 2021 13:37
@lu-zero
Copy link
Contributor Author

lu-zero commented May 27, 2021

Thank you for the review :)

@atoppi atoppi mentioned this pull request Jun 1, 2021
BogdanovKirill pushed a commit to 3dEYE/janus-gateway that referenced this pull request Jun 10, 2021
commit 9c9d335
Author: Lionel Nicolas <lionelnicolas@users.noreply.github.com>
Date:   Thu Jun 10 03:04:43 2021 -0400

    Fix streaming plugin mutex unlock when disabling mountpoint (meetecho#2690)

commit 2d83e96
Author: Yurii Cherniavskyi <yurii.cherniavskyi@gmail.com>
Date:   Mon Jun 7 16:02:41 2021 +0300

    Fix SIP plugin unhold request docs typo (meetecho#2688)

commit 2cd0118
Author: August Black <augustblack@gmail.com>
Date:   Mon Jun 7 01:10:49 2021 -0700

    minor adjustment to the audiobridge docs (meetecho#2687)

commit de2117e
Author: nicolasduteil <nduteil@freedev.org>
Date:   Tue Jun 1 11:26:29 2021 +0200

    fix: [janus_sip] Fix "call_id" property in "missed_call" events (meetecho#2679)

commit 9eeeb38
Author: Alessandro Toppi <atoppi@meetecho.com>
Date:   Mon May 31 15:57:41 2021 +0200

    Fix status vector parsing for incoming twcc feedbacks (resolves meetecho#2677).

commit 8a25f6e
Merge: d3b39b9 394fb48
Author: Alessandro Toppi <atoppi@meetecho.com>
Date:   Fri May 28 13:29:54 2021 +0200

    Merge pull request meetecho#2675 from kmeyerhofer/actions/fix

    GH Actions, fix variable name

commit d3b39b9
Author: Lorenzo Miniero <lminiero@gmail.com>
Date:   Fri May 28 11:09:30 2021 +0200

    Fixed race condition in VideoRoom

commit 394fb48
Author: Kurt Meyerhofer <k@kcmr.io>
Date:   Thu May 27 14:52:08 2021 -0600

    Fixes variable name.

commit b45cd37
Author: Lorenzo Miniero <lminiero@gmail.com>
Date:   Thu May 27 18:31:55 2021 +0200

    Clarify that libnice 0.1.18 is recommended

commit 5757a37
Author: Lorenzo Miniero <lminiero@gmail.com>
Date:   Thu May 27 17:08:17 2021 +0200

    Spatial audio support in AudioBridge via stereo mixing (meetecho#2446)

commit 161fe7a
Author: Luca Barbato <luca.barbato@gmail.com>
Date:   Thu May 27 15:29:01 2021 +0200

    Cleanup avformat-based preprocessors (meetecho#2665)

commit 7b010cd
Author: lucylu-star <78361868+lucylu-star@users.noreply.github.com>
Date:   Tue May 25 17:09:40 2021 +0800

    Fixed broken simulcast support in VideoCall plugin (meetecho#2671)

commit 4ae44a4
Author: nicolasduteil <nduteil@freedev.org>
Date:   Mon May 24 17:57:34 2021 +0200

    feat: support for custom call-id in subscribe request + add 'call_id' property to subscribe & notify related events (meetecho#2664)

commit 4294f20
Author: Lorenzo Miniero <lminiero@gmail.com>
Date:   Mon May 24 11:02:48 2021 +0200

    Fixed missing macro when using pthread mutexes (fixes meetecho#2666)

commit f22ab0d
Author: Lorenzo Miniero <lminiero@gmail.com>
Date:   Wed May 19 12:03:32 2021 +0200

    Fixed warning

commit b3f3f17
Author: Alessandro Toppi <atoppi@meetecho.com>
Date:   Tue May 18 12:10:47 2021 +0200

    Remove duplicated flag for fuzzing coverage.

commit 4a7560c
Author: nu774 <honeycomb77@gmail.com>
Date:   Fri May 14 00:26:36 2021 +0900

    janus-pp-rec: support HEVC AP(aggregation packet) (meetecho#2662)

commit 5db4be2
Author: Lorenzo Miniero <lminiero@gmail.com>
Date:   Wed May 12 17:43:43 2021 +0200

    Fixed out of bounds array access

commit 69f56f4
Author: nicolasduteil <nduteil@freedev.org>
Date:   Tue May 11 14:36:22 2021 +0200

    feat: support for SUBSCRIBE expiry (Expires header) in sip plugin (meetecho#2661)

commit b047ccf
Author: Lorenzo Miniero <lminiero@gmail.com>
Date:   Mon May 10 09:33:27 2021 +0200

    Fixed types

commit f8e8c5e
Author: Chris Wiggins <chris@wiggins.nz>
Date:   Mon May 10 19:26:45 2021 +1200

    RabbitMQ Transport Reconnect Logic (meetecho#2651)

commit 280e8e4
Author: Lorenzo Miniero <lminiero@gmail.com>
Date:   Fri May 7 12:54:30 2021 +0200

    Add per-participant recording options in AudioBridge to join API as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants