Implement video transcript annotation using HTML `<video>` and WebVTT #1294

robertknight · 2024-02-16T16:42:21Z

This PR implements general video transcript annotation support based on the HTML <video> element and WebVTT or SRT format transcripts. WebVTT is the transcript format supported natively by browsers. SRT is a widely used and very similar text-based format, which is used by Canvas Studio.

There are several components to this:

A new HTML video transcript view to handles URLs of the form /video?url={url}&media_url={media_url}&transcript={transcript} where url is the canonical URL of the video, media_url is the URL to load the video from (currently only one is supported, but we could allow multiple to support different formats and resolutions in future) and transcript is the URL of a transcript in WebVTT or SRT format. The media_url is optional and if not provided, url is used as a default.
A new API view which takes a URL pointing to a WebVTT or SRT format transcript, fetches it, parses it using the webvtt library and then returns it to the frontend.
An alternate video player in the frontend which uses HTML <video> and <track> elements to render the video and subtitles, instead of the YouTube video player. The choice of player is controlled by a player configuration set by the backend.

Testing:

Check out this branch, then go to http://localhost:9083/video?url=https%3A%2F%2Finteractive-examples.mdn.mozilla.net%2Fmedia%2Fcc0-videos%2Ffriday.mp4&transcript=https%3A%2F%2Finteractive-examples.mdn.mozilla.net%2Fmedia%2Fexamples%2Ffriday.vtt

You should then the video presented with the browser's native video player, and the transcript presented in the same way as for YouTube videos. This particular example is taken from an MDN demo. Note that it is expected that this particular video is low resolution.

Part of #1293

This is the result of adding `webvtt-py` to `prod.in` then running: ``` make requirements sed -I '' 's/-r requirements\/\(.*\)/-r \1/' *.txt ``` Where the sed command modifies the .txt files to match what Dependabot produces.

robertknight · 2024-02-23T07:38:25Z

requirements/prod.in

@@ -22,3 +22,4 @@ whitenoise
 google-auth-oauthlib
 marshmallow
 webargs
+webvtt-py


webvtt-py only declares support for Python versions up to 3.9, but looking at the code it relies only on basic text processing APIs which are unlikely to break in future.

Should we be worried that, if they don't support newer Python versions, it's because it is not super well maintained?

The last release is 3 years old, but I can also imagine there's not a lot of libraries covering this use case.

Should we be worried that, if they don't support newer Python versions, it's because it is not super well maintained?

WebVTT is a pretty simple format so I think the library was essentially "done" and doesn't need much maintenance. If we do find issues then we can create PRs or create our own parsing code.

robertknight · 2024-02-23T07:43:38Z

via/services/transcript.py

+    def _get_vtt(self, url: str) -> WebVTT:
+        response = self._http_service.get(url)
+        content = response.text.strip()
+        content_buf = StringIO(content)


A malicious user could potentially disrupt the service by passing the URL of a very large transcript file. We could mitigate this by capping the size of the transcript that we're willing to process (eg. take only the first 1 MB).

With that considered, can the HTTP Service stream the response if the server supports it? That would allow us to fetch only up to 1MB of data or until the end of file is reached, whatever comes first.

The response object here is a Response instance from the requests library. This can indeed read the response in a streaming fashion.

robertknight · 2024-02-23T07:49:31Z

via/services/transcript.py

+        if content.startswith("WEBVTT"):
+            return WebVTT.read_buffer(content_buf)
+
+        # If video is not WebVTT, assume SRT.


We support SRT because that is a common format that Canvas Studio produces, and WebVTT because it is the web-native format. The webvtt-py library also supports SBV which is another simple text format, that we could add support for in future.

Implement backend services to support video transcript annotation based on web standards, namely the HTML `<video>` element and WebVTT.

Implement frontend UI for video transcript annotation that uses the native `<video>` and `<track>` elements to render the video.

acelaya

Very nice! 👏🏼

I left a couple minor comments. Nothing critical or that needs to be tackled now.

The overall logic works as expected.

acelaya · 2024-02-23T12:53:00Z

via/services/transcript.py

+    def _get_vtt(self, url: str) -> WebVTT:
+        response = self._http_service.get(url)
+        content = response.text.strip()
+        content_buf = StringIO(content)


With that considered, can the HTTP Service stream the response if the server supports it? That would allow us to fetch only up to 1MB of data or until the end of file is reached, whatever comes first.

acelaya · 2024-02-23T13:06:47Z

via/static/scripts/video_player/components/VideoPlayerApp.tsx

+  videoId?: string;
+
+  /** URL of the video to load in a `<video>` element. */
+  videoURL?: string;


I guess this means we would have either videoId or videoURL, depending on the player prop, but never both or none.

It might not be super clear to have two optional props with a similar purpose, so maybe we can think on some alternative approaches:

Perhaps we could make videoURL mandatory and remove videoId. Then we extract the logic to compose the YouTube URL from the ID out of the YouTubeVideoPlayer component.

Another option would be to have a prop which is simply video, uniquely representing a video. We know that for YouTube that's the ID, for HTML is the video URL, and for other types, we'll see.

And finally, since we are already using the player prop as discriminator to render YouTubeVideoPlayer or HTMLVideoPlayer, we could define the props as a union type: ({ videoId: string; type: 'youtube' } | { videoURL: string; type: 'html-video' }) && CommonProps.

All options have pros and cons. Current implementation is definitely the simplest, even if it is the less strict one.

We can take this discussion separately, to avoid adding extra complexity to this PR.

Perhaps we could make videoURL mandatory and remove videoId. Then we extract the logic to compose the YouTube URL from the ID out of the YouTubeVideoPlayer component.

The YouTubeVideoPlayer component constructs a URL for the embedded player, which is different than the "standard" URL of the YouTube video. There are several URL formats for YouTube videos, so passing just the ID avoids needing to parse those different formats on the client side, instead the backend handles extracting the ID from the URL provided by the user.

And finally, since we are already using the player prop as discriminator to render YouTubeVideoPlayer or HTMLVideoPlayer, we could define the props as a union type

Something like this seems the cleanest to me.

acelaya · 2024-02-23T13:14:47Z

via/templates/view_video.html.jinja2

@@ -16,7 +16,16 @@
      {
        "client_config": {{ client_config | tojson }},
        "client_src": "{{ client_embed_url }}",
+        "player": {{ player | tojson }},


Why is the tojson filter needed here? Isn't player a string?

tojson will wrap the string in quotes. "{{ player }}" would have worked equally well.

acelaya · 2024-02-23T13:18:56Z

requirements/prod.in

@@ -22,3 +22,4 @@ whitenoise
 google-auth-oauthlib
 marshmallow
 webargs
+webvtt-py


Should we be worried that, if they don't support newer Python versions, it's because it is not super well maintained?

The last release is 3 years old, but I can also imagine there's not a lot of libraries covering this use case.

robertknight force-pushed the web-standards-video branch 9 times, most recently from e0a9019 to a14663c Compare February 22, 2024 17:19

Add webvtt-py dependency

f5e873d

This is the result of adding `webvtt-py` to `prod.in` then running: ``` make requirements sed -I '' 's/-r requirements\/\(.*\)/-r \1/' *.txt ``` Where the sed command modifies the .txt files to match what Dependabot produces.

robertknight force-pushed the web-standards-video branch from a14663c to 43739f0 Compare February 23, 2024 07:29

robertknight commented Feb 23, 2024

View reviewed changes

robertknight added 2 commits February 23, 2024 07:52

Implement backend support for HTML video annotation

afebac3

Implement backend services to support video transcript annotation based on web standards, namely the HTML `<video>` element and WebVTT.

Implement frontend support for HTML video annotation

e4c2ce9

Implement frontend UI for video transcript annotation that uses the native `<video>` and `<track>` elements to render the video.

robertknight force-pushed the web-standards-video branch from bb444d9 to e4c2ce9 Compare February 23, 2024 07:52

robertknight marked this pull request as ready for review February 23, 2024 08:16

marcospri self-requested a review February 23, 2024 09:01

acelaya self-requested a review February 23, 2024 09:41

acelaya approved these changes Feb 23, 2024

View reviewed changes

robertknight merged commit 27150a3 into main Feb 23, 2024
7 checks passed

robertknight deleted the web-standards-video branch February 23, 2024 13:53

robertknight mentioned this pull request Feb 23, 2024

Return 4xx status if fetching WebVTT / SRT transcript fails #1298

Open

This was referenced Mar 18, 2024

Implement general video annotation based on HTML <video> and related standards #1293

Closed

Support WebVTT web native video transcript annotations hypothesis/product-backlog#1518

Closed

robertknight mentioned this pull request Apr 8, 2024

Support disabling video download controls via allow_download query param #1329

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement video transcript annotation using HTML `<video>` and WebVTT #1294

Implement video transcript annotation using HTML `<video>` and WebVTT #1294

robertknight commented Feb 16, 2024 •

edited

Loading

robertknight Feb 23, 2024

acelaya Feb 23, 2024

robertknight Feb 23, 2024

robertknight Feb 23, 2024

acelaya Feb 23, 2024

robertknight Feb 23, 2024

robertknight Feb 23, 2024

acelaya left a comment

acelaya Feb 23, 2024

acelaya Feb 23, 2024 •

edited

Loading

robertknight Feb 23, 2024

acelaya Feb 23, 2024

robertknight Feb 23, 2024

acelaya Feb 23, 2024

Implement video transcript annotation using HTML <video> and WebVTT #1294

Implement video transcript annotation using HTML <video> and WebVTT #1294

Conversation

robertknight commented Feb 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acelaya left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

acelaya Feb 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Implement video transcript annotation using HTML `<video>` and WebVTT #1294

Implement video transcript annotation using HTML `<video>` and WebVTT #1294

robertknight commented Feb 16, 2024 •

edited

Loading

acelaya Feb 23, 2024 •

edited

Loading