Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed audio-video synchronisation problem in read_video() when using pts as unit #3791

Merged
merged 10 commits into from
May 25, 2021

Conversation

prabhat00155
Copy link
Contributor

No description provided.

@prabhat00155 prabhat00155 requested a review from bjuncek May 7, 2021 14:23
@prabhat00155 prabhat00155 marked this pull request as ready for review May 7, 2021 14:23
@bjuncek bjuncek self-assigned this May 11, 2021
Copy link
Contributor

@bjuncek bjuncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
I've just put minor style edits.

Also, we should consider adding tests for this case to be able to catch the error in the future

torchvision/io/video.py Outdated Show resolved Hide resolved
torchvision/io/_video_opt.py Outdated Show resolved Hide resolved
@prabhat00155 prabhat00155 requested a review from bjuncek May 21, 2021 15:11
Copy link
Contributor

@bjuncek bjuncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Can you just clarify the above comment?

config.video_fps, info['video_fps'], delta=0.0001
)
arr = torch.Tensor(arr)
if arr.shape == visual.shape:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this necessary?
Wouldn't we expect them to be the same shape?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some off by 1, which may be related to the differences in how the closest frame is found in pyav and torchvision(as we had discussed offline).

arr.append(frame.to_ndarray())
_, audio, _ = io.read_video(full_path, start_pts=start_pts_val, pts_unit='pts')
arr = torch.as_tensor(np.concatenate(arr, axis=1))
if arr.shape == audio.shape:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

@prabhat00155 prabhat00155 merged commit f584309 into pytorch:master May 25, 2021
@prabhat00155 prabhat00155 deleted the prabhat00155/video_sync branch May 25, 2021 21:15
facebook-github-bot pushed a commit that referenced this pull request Jun 10, 2021
…en using `pts` as unit (#3791)

Summary:
* Fixed audio-video synchronisation problem in read_video() when using  as unit

* Addressed review comments

* Added unit test

Reviewed By: NicolasHug

Differential Revision: D29027305

fbshipit-source-id: 0a9dc3c877825af7836e7db313cfd2779542b457
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants