Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the few seconds of "preparation time" from the recording #454

Closed
LukasKalbertodt opened this issue Mar 4, 2020 · 10 comments
Closed
Assignees
Labels
crowd-funded Part of our crowdfunding compaigns
Milestone

Comments

@LukasKalbertodt
Copy link
Member

Usually, at the very beginning of the recording, the user is "preparing" something. For example, switching to powerpoint, looking at the webcam or pointing the camera. These few seconds in the beginning feel unprofessional and are useless. They should not be in the recording. Something similar at the very end of the video: after the user has spoken the last word, they have to click on "stop recording". We probably want to remove that too.

We discussed this in person already several times and see basically two solutions:

  1. We have a proper countdown that gives the user visual and sound cues about when the actual recording will start. (Other recording programs I tried have a big countdown overlay over the screen, but that's not technically possible for us.) So we have to rely on sound effects. But this requires the user to have a sound output, which is fine in most instances but not all.

  2. Let the user trim the start and end in the "review step" (coming soon). The start and end time would be transferred to the Opencast server where the video is actually cut then. The user could cut precisely and we developer don't have to sit through a 5s countdown just to test something. However, users might cut away too much and if users download their videos, it is not cut. We could in theory cut the video in the browser, but that's resource intensive, slow and results in a probably pretty bad quality.

@oas777
Copy link

oas777 commented Mar 4, 2020

My two cents: Proper countdown.

@guest271314
Copy link

Let the user trim the start and end in the "review step" (coming soon). The start and end time would be transferred to the Opencast server where the video is actually cut then. The user could cut precisely and we developer don't have to sit through a 5s countdown just to test something. However, users might cut away too much and if users download their videos, it is not cut. We could in theory cut the video in the browser, but that's resource intensive, slow and results in a probably pretty bad quality.

To get any time slice of a recorded file Media Fragment URI https://www.w3.org/TR/media-frags/ can be used, where the exact time ranges intended to be kept in the media can be re-recorded. E.g., for a 10 second video where the expected result is to trim 2 seconds from the beginning and 2 seconds from the end of the file once a Blob URL is created using first pass of MediaRecorder, the media can be loaded at a <video> element with src set to a media fragment identifier, e.g.,

initialMediaRecorder.ondataavailable = e => {
  // determine which time slices are not intended to be kept here
  const blobURL = URL.createObjectURL(e.data);
  video.src = `${blobURL}#t=2,8`;
   video.onplay = _ => {
     video.recorder  = new MediaRecorder(video.captureStream());
     video.recorder.start();
     video.recorder.ondataavailable =  e => {
       // do stuff with trimmed 6 seconds of trimmed  video
     }
  }
  video.onpause = _ => video.recorder.stop();
}

FWIW used media fragment URI's extensively at https://github.com/guest271314/MediaFragmentRecorder to record specific time slices of multiple video and audio fragments to a single media file.

@mtneug
Copy link

mtneug commented Apr 14, 2020

ffmpeg allows trimming recordings on keyframes without re-encoding which is very quick. I wonder if this is possible in the browser as well. The disadvantage is that the cutpoints are not so precise and as I have noted elsewhere browsers could produce crazy long distances between frames.

@guest271314
Copy link

Am not sure about "quick" in the browser as well being applicable consistently. In practice every frame of video and audio can be saved, and discarded or removed from a resulting media playback. Chromium and Firefox behave very differently in the case of image processing, in brief see https://bugs.chromium.org/p/chromium/issues/detail?id=1065675

The above numbers were for incrementing currentTime by 1/60. seekToFrame() implementation at Nightly, Firefox appears to use 30 frames per second. Incrementing by 1/30 at rudimentary seekToNextFrame attempt at linked plnkr

Chromium 83

{ originalVideoDuration: 41.582, elapsedTimeToSeekAllVideoFrames: 157.491 }

Nightly 76

{ originalVideoDuration: 41.582, elapsedTimeToSeekAllVideoFrames: 40.4 }

What is the complete requirement (including time restrictions)?

@guest271314
Copy link

@mtneug Am banned from SO and related SE sites for another several years so cannot post an answer to the question there.

MediaRecorder implementation at Chromium does not set duration, which impacts replay at Chromium of the video, e.g., cannot seek to last frame using currentTime alone without calling play().

However, it is possible to capture every video frame nonetheless https://plnkr.co/edit/k6Itrjzzj60vT2fC?preview, and set whatever frame rate that is required for streaming the images as video https://plnkr.co/edit/4Tb91b?preview.

@guest271314
Copy link

@mtneug See also https://plnkr.co/edit/Inb676?preview. Where the frame rate is set to different value depending on the current set of frames being rendered where input is variable width and height frames (larger dimensions, in general, require less frames per second).

    const rs = new ReadableStream({
      async pull(controller) {
        for (const frame of frames) {
          const [{
            duration, frameRate, width, height
          }] = frame;
          const framesLength = frame.length - 1;
          const frameDuration = Math.ceil((duration * 1000) / framesLength);
          for (let i = 1; i < framesLength; i++) {
            const response = await (await fetch(frame[i])).arrayBuffer();
            controller.enqueue({
              response, frameDuration
            });
          }
        }
        controller.close();
      }
    });

@mtneug
Copy link

mtneug commented Apr 14, 2020

@guest271314 thanks for the detailed answers. I don't think I understood everything you are saying 😅. Sadly I don't have any idea about the browser media recording APIs. All I see is that recordings produced with this tool are, let's say, on the strange side. Do I understand you correctly, that it is possible to produce videos with a (more) constant frame rate? Does this apply to screen/webcam recordings?

The video I mentioned in SE has a very very large gap (a single frame covering 0:00:09.548000 - 0:01:03.862000). If you look at it again, Chrome didn't even bothered to place an I-frame at 0:01:03.862000:

Input #0, matroska,webm, from 'input.mkv':
  Metadata:
    ENCODER         : Lavf58.29.100
  Duration: 00:06:10.38, start: 0.000000, bitrate: 243 kb/s
    Stream #0:0(eng): Video: h264 (Constrained Baseline), yuv420p(progressive), 1920x1048, SAR 1:1 DAR 240:131, 30.30 fps, 30.30 tbr, 1k tbn, 60 tbc (default)
    Metadata:
      DURATION        : 00:06:10.383000000
0:00:00.000000
I
0:00:02.893000
P
0:00:04.489000
P
0:00:04.560000
P
0:00:04.646000
P
0:00:04.721000
P
0:00:04.799000
P
0:00:04.831000
P
0:00:04.902000
P
0:00:04.970000
P
0:00:05.099000
P
0:00:05.160000
P
0:00:05.234000
P
0:00:05.529000
P
0:00:05.599000
P
0:00:05.668000
P
0:00:06.302000
P
0:00:06.373000
P
0:00:07.151000
P
0:00:07.177000
P
0:00:07.259000
P
0:00:07.341000
P
0:00:07.463000
P
0:00:07.534000
P
0:00:07.605000
P
0:00:09.548000
P
0:01:03.862000
P
0:01:03.969000
P
0:01:04.040000
P
0:01:04.110000
P
0:01:04.189000
I
0:01:04.252000
P
0:01:04.319000
P
0:01:04.388000
P
0:01:04.458000
P

The video captured a screen recording of another Chrome window. You can see that the large gap was produced when the user showed static content and did not move the mouse. It seems no change = no new frame.

So say you want to trim this recording from 00:00:02 to 00:00:40. Normally, this can be done with the following ffmpeg command:

$ time ffmpeg -ss 00:00:02 -i input.mkv -to 00:00:38 cut.mkv
________________________________________________________
Executed in  629,53 millis    fish           external 
   usr time  957,07 millis  119,00 micros  956,96 millis 
   sys time  297,89 millis  618,00 micros  297,27 millis 

$ ffprobe -hide_banner cut.mkv 
Input #0, matroska,webm, from 'cut.mkv':
  Metadata:
    ENCODER         : Lavf58.29.100
  Duration: 00:00:07.59, start: 0.891000, bitrate: 122 kb/s
    Stream #0:0(eng): Video: h264 (High), yuv420p(progressive), 1920x1048 [SAR 1:1 DAR 240:131], 30.30 fps, 30.30 tbr, 1k tbn, 60.61 tbc (default)
    Metadata:
      ENCODER         : Lavc58.54.100 libx264
      DURATION        : 00:00:07.590000000

As you can see from the ffprobe command, the produced video is 7 seconds long instead of 38. I'm actually quite surprised that this is the case as I was hoping ffmpeg would at least set a frame at 38 seconds for a correct duration. This is not the case even when using -vsync cfr and is only fixed when e.g. using -filter:v 'fps=30' in the filter chain. This is actually done in the default encoding profiles in Opencast (a solution I really dislike and complained about in another context).

Anyway, my original point was that using the above command will re-encode the video. Instead ffmpeg allows cutting without re-encoding:

$ time ffmpeg -ss 00:00:02 -i input.mkv -to 00:00:38 -c copy cut.mkv
________________________________________________________
Executed in  161,99 millis    fish           external 
   usr time   40,59 millis   84,00 micros   40,51 millis 
   sys time   30,38 millis  558,00 micros   29,82 millis 

$ ffprobe -hide_banner cut.mkv
Input #0, matroska,webm, from 'cut.mkv':
  Metadata:
    ENCODER         : Lavf58.29.100
  Duration: 00:00:09.58, start: 0.000000, bitrate: 139 kb/s
    Stream #0:0(eng): Video: h264 (Constrained Baseline), yuv420p(progressive), 1920x1048, SAR 1:1 DAR 240:131, 30.30 fps, 30.30 tbr, 1k tbn, 60 tbc (default)
    Metadata:
      DURATION        : 00:00:09.581000000

This is much faster as ffmpeg just copies over the frames without decoding them. The disadvantage is, that this is only possible within GOPs, i.e between I-frames, as only then the referred frames are present. This is the reason the resulting video, in this case, is 9 seconds long. ffmpeg could not start copying frames starting from 00:00:02 and had to use the nearest I-frames which is at 00:00:00.

The advantage is of course that this solution is much much faster (3,88x in this case, but this is misleading given the short video; trimming a longer section was over 1000x on my system).

In my earlier comment, I was wondering if the same can be done within the browser, i.e. copy over frames without decoding them. From your comment, I understand that seeking speed is quite low. But this is decoding frames, right? Maybe this can be avoided? If this is possible, the original video still would need a smaller and more constant GOP. Even if this is not possible, my SE question would basically be solved for me if OC Studio could be forced to produce a reasonable more constant framerate.

@guest271314
Copy link

The concept of the above code is to capture individual images (and audio, if required). Individual images can be posted to server. If the case arises that the first N frames, the middle N frames, or the end of file frames, or any frame at in the set can be removed before encoding the video using either canvas and if necessary, MediaStreamTrack, locally, or native application at server.

@guest271314
Copy link

Do I understand you correctly, that it is possible to produce videos with a (more) constant frame rate?

Yes. Capture N images, set whatever frame rate is required.

Does this apply to screen/webcam recordings?

Yes. The initial captured MediaStreamTrack from getUserMedia(), getDisplayMedia(), captureStream() can be captured again as a series of images and audio in an Array, Map, JSON or other data structure.

Then remove or add any image or audio frames within the datasets and playback set any playback rate for the output. Even where the input and output is infinite it is possible to set a playback rate, see thenickdude/webm-writer-js#8 (comment). Similarly with audio specific frames from Float32Arrays can be removed, modified.

@LukasKalbertodt
Copy link
Member Author

This issue was closed by #618. Discussion about cutting in Studio itself is now tracked by #621.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crowd-funded Part of our crowdfunding compaigns
Projects
None yet
Development

No branches or pull requests

5 participants