Skip to content

jina-ai/executor-video-loader

Repository files navigation

VideoLoader

The VideoLoader executor helps in loading the video components into Jina's Document type. It extracts the image frames, audio, and subtitle from the video using the ffmpeg-python.

The extracted image frames, audio, and subtitle are stored as chunks with the following attributes,

Image frame chunks have the modality of image, the audio chunks have the modality of audio, and the subtitle chunks have the modality of text. During parsing of subtitles, duplicated captions are merged to form unique captions in a heuristic way and are returned as single chunk with their respective starting and ending time in tags.

data stored in modality location tags
image frames blob (dtype=uint8) image the index of the frame {'timestampe': 0.5}, the timestamp of the frame in seconds. {'video_uri': 'video.mp4'}, the uri of the video
audio blob (dtype=float32) audio N/A {'sample_rate': 140000}, the sample rate of the audio. {'video_uri': 'video.mp4'}, the uri of the video.
subtitle text (dtype=str) text the index of the subtitle in the video {'beg_in_seconds': 0.5}, the beginning of a caption in seconds,
{'end_in_seconds': 0.6}, the end of a caption in seconds. {'video_uri': 'video.mp4'}, the uri of the video.