Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement video transcript annotation using HTML <video> and WebVTT #1294

Merged
merged 3 commits into from
Feb 23, 2024

Conversation

robertknight
Copy link
Member

@robertknight robertknight commented Feb 16, 2024

This PR implements general video transcript annotation support based on the HTML <video> element and WebVTT or SRT format transcripts. WebVTT is the transcript format supported natively by browsers. SRT is a widely used and very similar text-based format, which is used by Canvas Studio.

There are several components to this:

  1. A new HTML video transcript view to handles URLs of the form /video?url={url}&media_url={media_url}&transcript={transcript} where url is the canonical URL of the video, media_url is the URL to load the video from (currently only one is supported, but we could allow multiple to support different formats and resolutions in future) and transcript is the URL of a transcript in WebVTT or SRT format. The media_url is optional and if not provided, url is used as a default.
  2. A new API view which takes a URL pointing to a WebVTT or SRT format transcript, fetches it, parses it using the webvtt library and then returns it to the frontend.
  3. An alternate video player in the frontend which uses HTML <video> and <track> elements to render the video and subtitles, instead of the YouTube video player. The choice of player is controlled by a player configuration set by the backend.

Testing:

Check out this branch, then go to http://localhost:9083/video?url=https%3A%2F%2Finteractive-examples.mdn.mozilla.net%2Fmedia%2Fcc0-videos%2Ffriday.mp4&transcript=https%3A%2F%2Finteractive-examples.mdn.mozilla.net%2Fmedia%2Fexamples%2Ffriday.vtt

You should then the video presented with the browser's native video player, and the transcript presented in the same way as for YouTube videos. This particular example is taken from an MDN demo. Note that it is expected that this particular video is low resolution.

Part of #1293

@robertknight robertknight force-pushed the web-standards-video branch 9 times, most recently from e0a9019 to a14663c Compare February 22, 2024 17:19
This is the result of adding `webvtt-py` to `prod.in` then running:

```
make requirements
sed -I '' 's/-r requirements\/\(.*\)/-r \1/' *.txt
```

Where the sed command modifies the .txt files to match what Dependabot produces.
@@ -22,3 +22,4 @@ whitenoise
google-auth-oauthlib
marshmallow
webargs
webvtt-py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webvtt-py only declares support for Python versions up to 3.9, but looking at the code it relies only on basic text processing APIs which are unlikely to break in future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be worried that, if they don't support newer Python versions, it's because it is not super well maintained?

The last release is 3 years old, but I can also imagine there's not a lot of libraries covering this use case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be worried that, if they don't support newer Python versions, it's because it is not super well maintained?

WebVTT is a pretty simple format so I think the library was essentially "done" and doesn't need much maintenance. If we do find issues then we can create PRs or create our own parsing code.

def _get_vtt(self, url: str) -> WebVTT:
response = self._http_service.get(url)
content = response.text.strip()
content_buf = StringIO(content)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A malicious user could potentially disrupt the service by passing the URL of a very large transcript file. We could mitigate this by capping the size of the transcript that we're willing to process (eg. take only the first 1 MB).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that considered, can the HTTP Service stream the response if the server supports it? That would allow us to fetch only up to 1MB of data or until the end of file is reached, whatever comes first.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response object here is a Response instance from the requests library. This can indeed read the response in a streaming fashion.

if content.startswith("WEBVTT"):
return WebVTT.read_buffer(content_buf)

# If video is not WebVTT, assume SRT.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support SRT because that is a common format that Canvas Studio produces, and WebVTT because it is the web-native format. The webvtt-py library also supports SBV which is another simple text format, that we could add support for in future.

Implement backend services to support video transcript annotation based on web
standards, namely the HTML `<video>` element and WebVTT.
Implement frontend UI for video transcript annotation that uses the native
`<video>` and `<track>` elements to render the video.
@robertknight robertknight marked this pull request as ready for review February 23, 2024 08:16
@marcospri marcospri self-requested a review February 23, 2024 09:01
@acelaya acelaya self-requested a review February 23, 2024 09:41
Copy link
Contributor

@acelaya acelaya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! 👏🏼

I left a couple minor comments. Nothing critical or that needs to be tackled now.

The overall logic works as expected.

def _get_vtt(self, url: str) -> WebVTT:
response = self._http_service.get(url)
content = response.text.strip()
content_buf = StringIO(content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that considered, can the HTTP Service stream the response if the server supports it? That would allow us to fetch only up to 1MB of data or until the end of file is reached, whatever comes first.

Comment on lines +39 to +42
videoId?: string;

/** URL of the video to load in a `<video>` element. */
videoURL?: string;
Copy link
Contributor

@acelaya acelaya Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this means we would have either videoId or videoURL, depending on the player prop, but never both or none.

It might not be super clear to have two optional props with a similar purpose, so maybe we can think on some alternative approaches:

  • Perhaps we could make videoURL mandatory and remove videoId. Then we extract the logic to compose the YouTube URL from the ID out of the YouTubeVideoPlayer component.
  • Another option would be to have a prop which is simply video, uniquely representing a video. We know that for YouTube that's the ID, for HTML is the video URL, and for other types, we'll see.
  • And finally, since we are already using the player prop as discriminator to render YouTubeVideoPlayer or HTMLVideoPlayer, we could define the props as a union type: ({ videoId: string; type: 'youtube' } | { videoURL: string; type: 'html-video' }) && CommonProps.

All options have pros and cons. Current implementation is definitely the simplest, even if it is the less strict one.

We can take this discussion separately, to avoid adding extra complexity to this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could make videoURL mandatory and remove videoId. Then we extract the logic to compose the YouTube URL from the ID out of the YouTubeVideoPlayer component.

The YouTubeVideoPlayer component constructs a URL for the embedded player, which is different than the "standard" URL of the YouTube video. There are several URL formats for YouTube videos, so passing just the ID avoids needing to parse those different formats on the client side, instead the backend handles extracting the ID from the URL provided by the user.

And finally, since we are already using the player prop as discriminator to render YouTubeVideoPlayer or HTMLVideoPlayer, we could define the props as a union type

Something like this seems the cleanest to me.

@@ -16,7 +16,16 @@
{
"client_config": {{ client_config | tojson }},
"client_src": "{{ client_embed_url }}",
"player": {{ player | tojson }},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the tojson filter needed here? Isn't player a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tojson will wrap the string in quotes. "{{ player }}" would have worked equally well.

@@ -22,3 +22,4 @@ whitenoise
google-auth-oauthlib
marshmallow
webargs
webvtt-py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be worried that, if they don't support newer Python versions, it's because it is not super well maintained?

The last release is 3 years old, but I can also imagine there's not a lot of libraries covering this use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants