-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement video transcript annotation using HTML <video>
and WebVTT
#1294
Conversation
e0a9019
to
a14663c
Compare
This is the result of adding `webvtt-py` to `prod.in` then running: ``` make requirements sed -I '' 's/-r requirements\/\(.*\)/-r \1/' *.txt ``` Where the sed command modifies the .txt files to match what Dependabot produces.
a14663c
to
43739f0
Compare
@@ -22,3 +22,4 @@ whitenoise | |||
google-auth-oauthlib | |||
marshmallow | |||
webargs | |||
webvtt-py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
webvtt-py only declares support for Python versions up to 3.9, but looking at the code it relies only on basic text processing APIs which are unlikely to break in future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be worried that, if they don't support newer Python versions, it's because it is not super well maintained?
The last release is 3 years old, but I can also imagine there's not a lot of libraries covering this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be worried that, if they don't support newer Python versions, it's because it is not super well maintained?
WebVTT is a pretty simple format so I think the library was essentially "done" and doesn't need much maintenance. If we do find issues then we can create PRs or create our own parsing code.
def _get_vtt(self, url: str) -> WebVTT: | ||
response = self._http_service.get(url) | ||
content = response.text.strip() | ||
content_buf = StringIO(content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A malicious user could potentially disrupt the service by passing the URL of a very large transcript file. We could mitigate this by capping the size of the transcript that we're willing to process (eg. take only the first 1 MB).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that considered, can the HTTP Service stream the response if the server supports it? That would allow us to fetch only up to 1MB of data or until the end of file is reached, whatever comes first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The response
object here is a Response
instance from the requests
library. This can indeed read the response in a streaming fashion.
if content.startswith("WEBVTT"): | ||
return WebVTT.read_buffer(content_buf) | ||
|
||
# If video is not WebVTT, assume SRT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We support SRT because that is a common format that Canvas Studio produces, and WebVTT because it is the web-native format. The webvtt-py library also supports SBV which is another simple text format, that we could add support for in future.
Implement backend services to support video transcript annotation based on web standards, namely the HTML `<video>` element and WebVTT.
Implement frontend UI for video transcript annotation that uses the native `<video>` and `<track>` elements to render the video.
bb444d9
to
e4c2ce9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! 👏🏼
I left a couple minor comments. Nothing critical or that needs to be tackled now.
The overall logic works as expected.
def _get_vtt(self, url: str) -> WebVTT: | ||
response = self._http_service.get(url) | ||
content = response.text.strip() | ||
content_buf = StringIO(content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With that considered, can the HTTP Service stream the response if the server supports it? That would allow us to fetch only up to 1MB of data or until the end of file is reached, whatever comes first.
videoId?: string; | ||
|
||
/** URL of the video to load in a `<video>` element. */ | ||
videoURL?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this means we would have either videoId
or videoURL
, depending on the player
prop, but never both or none.
It might not be super clear to have two optional props with a similar purpose, so maybe we can think on some alternative approaches:
- Perhaps we could make
videoURL
mandatory and removevideoId
. Then we extract the logic to compose the YouTube URL from the ID out of theYouTubeVideoPlayer
component. - Another option would be to have a prop which is simply
video
, uniquely representing a video. We know that for YouTube that's the ID, for HTML is the video URL, and for other types, we'll see. - And finally, since we are already using the
player
prop as discriminator to renderYouTubeVideoPlayer
orHTMLVideoPlayer
, we could define the props as a union type:({ videoId: string; type: 'youtube' } | { videoURL: string; type: 'html-video' }) && CommonProps
.
All options have pros and cons. Current implementation is definitely the simplest, even if it is the less strict one.
We can take this discussion separately, to avoid adding extra complexity to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could make videoURL mandatory and remove videoId. Then we extract the logic to compose the YouTube URL from the ID out of the YouTubeVideoPlayer component.
The YouTubeVideoPlayer
component constructs a URL for the embedded player, which is different than the "standard" URL of the YouTube video. There are several URL formats for YouTube videos, so passing just the ID avoids needing to parse those different formats on the client side, instead the backend handles extracting the ID from the URL provided by the user.
And finally, since we are already using the player prop as discriminator to render YouTubeVideoPlayer or HTMLVideoPlayer, we could define the props as a union type
Something like this seems the cleanest to me.
@@ -16,7 +16,16 @@ | |||
{ | |||
"client_config": {{ client_config | tojson }}, | |||
"client_src": "{{ client_embed_url }}", | |||
"player": {{ player | tojson }}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the tojson
filter needed here? Isn't player
a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tojson
will wrap the string in quotes. "{{ player }}"
would have worked equally well.
@@ -22,3 +22,4 @@ whitenoise | |||
google-auth-oauthlib | |||
marshmallow | |||
webargs | |||
webvtt-py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be worried that, if they don't support newer Python versions, it's because it is not super well maintained?
The last release is 3 years old, but I can also imagine there's not a lot of libraries covering this use case.
This PR implements general video transcript annotation support based on the HTML
<video>
element and WebVTT or SRT format transcripts. WebVTT is the transcript format supported natively by browsers. SRT is a widely used and very similar text-based format, which is used by Canvas Studio.There are several components to this:
/video?url={url}&media_url={media_url}&transcript={transcript}
whereurl
is the canonical URL of the video,media_url
is the URL to load the video from (currently only one is supported, but we could allow multiple to support different formats and resolutions in future) andtranscript
is the URL of a transcript in WebVTT or SRT format. Themedia_url
is optional and if not provided,url
is used as a default.<video>
and<track>
elements to render the video and subtitles, instead of the YouTube video player. The choice of player is controlled by aplayer
configuration set by the backend.Testing:
Check out this branch, then go to http://localhost:9083/video?url=https%3A%2F%2Finteractive-examples.mdn.mozilla.net%2Fmedia%2Fcc0-videos%2Ffriday.mp4&transcript=https%3A%2F%2Finteractive-examples.mdn.mozilla.net%2Fmedia%2Fexamples%2Ffriday.vtt
You should then the video presented with the browser's native video player, and the transcript presented in the same way as for YouTube videos. This particular example is taken from an MDN demo. Note that it is expected that this particular video is low resolution.
Part of #1293