-
Notifications
You must be signed in to change notification settings - Fork 68
Add AudioDecoder docs and tutorial #582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
f6a7f4e
WIP
NicolasHug 179a01c
Merge branch 'main' of github.com:pytorch/torchcodec into sample_rate
NicolasHug 2d97555
Remove old code
NicolasHug 9af4bc8
WI:P
NicolasHug 2adf496
WIP
NicolasHug ef93be4
Fix clipping
NicolasHug db740a6
Merge branch 'main' of github.com:pytorch/torchcodec into sample_rate
NicolasHug ca15232
Driveby, remove preAllocatedOutputTensor
NicolasHug 6aa7b09
Rename avFrame into srcAVFrame
NicolasHug f858d0c
Add flushing
NicolasHug 70ac31e
Put back normal compilation flags
NicolasHug 8deb079
Add tests
NicolasHug 7b09315
Add tests
NicolasHug af4e88a
Nit
NicolasHug 975b0fb
Fix test assets
NicolasHug 7cb2271
Merge branch 'main' of github.com:pytorch/torchcodec into sample_rate
NicolasHug ee1c7b7
Nit
NicolasHug bf9aed2
NULL -> nullptr
NicolasHug f0e2cdd
Use optional
NicolasHug 6f0694e
Merge branch 'main' of github.com:pytorch/torchcodec into audio_tutorial
NicolasHug 7f9d3b0
Add AudioDecoder docs and tutorial
NicolasHug 85d4fc1
Debug
NicolasHug cfb190b
Fix sample rate conversion bug with multi-channel data
NicolasHug dc1f6d7
WIP
NicolasHug b3f37c7
Add test
NicolasHug 8ed45a7
Merge branch 'downsample' into audio_tutorial
NicolasHug 1acd939
Docs
NicolasHug 0b5db95
Add more
NicolasHug 54ad867
Merge branch 'main' of github.com:pytorch/torchcodec into audio_tutorial
NicolasHug File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14,3 +14,4 @@ torchcodec | |
|
|
||
| Frame | ||
| FrameBatch | ||
| AudioSamples | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # Copyright (c) Meta Platforms, Inc. and affiliates. | ||
| # All rights reserved. | ||
| # | ||
| # This source code is licensed under the BSD-style license found in the | ||
| # LICENSE file in the root directory of this source tree. | ||
|
|
||
| """ | ||
| ======================================== | ||
| Decoding audio streams with AudioDecoder | ||
| ======================================== | ||
|
|
||
| In this example, we'll learn how to decode an audio file using the | ||
| :class:`~torchcodec.decoders.AudioDecoder` class. | ||
| """ | ||
|
|
||
| # %% | ||
| # First, a bit of boilerplate: we'll download an audio file from the web and | ||
| # define an audio playing utility. You can ignore that part and jump right | ||
| # below to :ref:`creating_decoder_audio`. | ||
| import requests | ||
| from IPython.display import Audio | ||
|
|
||
|
|
||
| def play_audio(samples): | ||
| return Audio(samples.data, rate=samples.sample_rate) | ||
|
|
||
|
|
||
| # Audio source is CC0: https://opengameart.org/content/town-theme-rpg | ||
| # Attribution: cynicmusic.com pixelsphere.org | ||
| url = "https://opengameart.org/sites/default/files/TownTheme.mp3" | ||
| response = requests.get(url, headers={"User-Agent": ""}) | ||
| if response.status_code != 200: | ||
| raise RuntimeError(f"Failed to download video. {response.status_code = }.") | ||
|
|
||
| raw_audio_bytes = response.content | ||
|
|
||
| # %% | ||
| # .. _creating_decoder_audio: | ||
| # | ||
| # Creating a decoder | ||
| # ------------------ | ||
| # | ||
| # We can now create a decoder from the raw (encoded) audio bytes. You can of | ||
| # course use a local audio file and pass the path as input. You can also decode | ||
| # audio streams from videos! | ||
|
|
||
| from torchcodec.decoders import AudioDecoder | ||
|
|
||
| decoder = AudioDecoder(raw_audio_bytes) | ||
|
|
||
| # %% | ||
| # The has not yet been decoded by the decoder, but we already have access to | ||
| # some metadata via the ``metadata`` attribute which is an | ||
| # :class:`~torchcodec.decoders.AudioStreamMetadata` object. | ||
| print(decoder.metadata) | ||
|
|
||
| # %% | ||
| # Decoding samples | ||
| # ---------------- | ||
| # | ||
| # To get decoded samples, we just need to call the | ||
| # :meth:`~torchcodec.decoders.AudioDecoder.get_samples_played_in_range` method, | ||
| # which returns an :class:`~torchcodec.AudioSamples` object: | ||
|
|
||
| samples = decoder.get_samples_played_in_range(start_seconds=0) | ||
|
|
||
| print(samples) | ||
| play_audio(samples) | ||
|
|
||
| # %% | ||
| # The ``.data`` field is a tensor of shape ``(num_channels, num_samples)`` and | ||
| # of float dtype with values in [-1, 1]. | ||
| # | ||
| # The ``.pts_seconds`` field indicates the starting time of the output samples. | ||
| # Here it's 0.025 seconds, even though we asked for samples starting from 0. Not | ||
| # all streams start exactly at 0! This is not a bug in TorchCodec, this is a | ||
| # property of the file that was defined when it was encoded. | ||
| # | ||
| # We only output the *start* of the samples, not the end or the duration. Those can | ||
| # be easily derived from the number of samples and the sample rate: | ||
|
|
||
| duration_seconds = samples.data.shape[1] / samples.sample_rate | ||
| print(f"Duration = {int(duration_seconds // 60)}m{int(duration_seconds % 60)}s.") | ||
|
|
||
| # %% | ||
| # Specifying a range | ||
| # ------------------ | ||
| # | ||
| # By default, | ||
| # :meth:`~torchcodec.decoders.AudioDecoder.get_samples_played_in_range` decodes | ||
| # the entire audio stream, but we can specify a custom range: | ||
|
|
||
| samples = decoder.get_samples_played_in_range(start_seconds=10, stop_seconds=70) | ||
|
|
||
| print(samples) | ||
| play_audio(samples) | ||
|
|
||
| # %% | ||
| # Custom sample rate | ||
| # ------------------ | ||
| # | ||
| # We can also decode the samples into a desired sample rate using the | ||
| # ``sample_rate`` parameter of :class:`~torchcodec.decoders.AudioDecoder`. The | ||
| # ouput will sound the same, but note that the number of samples greatly | ||
| # increased: | ||
|
|
||
| decoder = AudioDecoder(raw_audio_bytes, sample_rate=16_000) | ||
| samples = decoder.get_samples_played_in_range(start_seconds=0) | ||
|
|
||
| print(samples) | ||
| play_audio(samples) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I'm writing this, I feel like we should just set the
duration_secondsfield ourselves and make it part ofAudioSamples? It's not like memory is a problem, and it would be consistent with video frames.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it would be useful. I was wondering the same thing when reading the tutorial.