Skip to content

Conversation

NicolasHug
Copy link
Contributor

@NicolasHug NicolasHug commented Mar 13, 2025

Towards #549

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 13, 2025
@NicolasHug NicolasHug mentioned this pull request Mar 12, 2025
7 tasks

def get_samples_played_in_range(
self, start_seconds: float, stop_seconds: Optional[float] = None
) -> AudioSamples:
Copy link
Contributor Author

@NicolasHug NicolasHug Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like start_seconds should default to the stream's beginning, but that can be done later.

Copy link
Contributor

@scotts scotts Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do that, we should also mirror that in the video API. We could try to play the same trick that range() does, where the semantics of the first parameter changes based on the number of parameters, but I'd (softly) rather not do that.

Copy link
Contributor Author

@NicolasHug NicolasHug Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll update #150 eventually with more of these desirable default behaviors. That would be a good onboarding PR.

def __repr__(self):
return _frame_repr(self)


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below: the audio decoding API returns the new AudioSamples class rather than a pure Tensor. I think it has the following benefits:

  • users can keep track of the sample_rate within that struct without having to handle it separately
  • for some edge-cases, like with our mp3 test asset, the stream's beginning isn't 0. So we also returns pts_seconds, which may not always be equal to start_seconds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, and it mirrors the video API. In the video API, __getitem__() returns just the tensor, but the named methods return Frame or FrameBatch. I think we should probably do that for audio as well.

sample_rate: int

def __post_init__(self):
# This is called after __init__() when a Frame is created. We can run
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# This is called after __init__() when a Frame is created. We can run
# This is called after __init__() when an AudioSamples instance is created. We can run

if (startSeconds == stopSeconds) {
// For consistency with video
return AudioFramesOutput{torch::empty({0}), 0.0};
return AudioFramesOutput{torch::empty({0, 0}), 0.0};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by, the video APIs return something of shape (0, C, H, W) (where C H W are from the metadata). Here, (0, 0) is the best we can do, at least it preserves the number of dimensions.

output_pts_seconds = first_pts

num_samples = frames.shape[1]
last_pts = first_pts + num_samples / self.metadata.sample_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't last_pts be a float here? Is that your intention, or should we call round()?

Copy link
Contributor Author

@NicolasHug NicolasHug Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's a float. I didn't add the _seconds everywhere, but they're all in seconds. I can do it if you think it improves clarity.

We do call round() just below, which I think is enough? Or is there an edge-case I'm missing?

@NicolasHug NicolasHug merged commit 1fd20b2 into meta-pytorch:main Mar 13, 2025
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants