Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dust Off Fastaudio #121

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python-build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: [3.6, 3.7, 3.8]
python-version: [3.8, 3.9, '3.10', 3.11]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why ' around the 3.10 only?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do it around all of them if we want. but to answer your specific question, it is because 3.10 is treated as a number and the 0 is chopped off so it tried testing using 3.1 rather than 3.10 if you don't add the quotes. The rest resolve fine as is. I am definitely fine putting a quote around all of them though just for consistency

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right ok sounds good

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added quotes to all of these in the updated version. Should I resolve the conversation or should you? I have done it both ways in different repos

steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
Expand Down
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@ exclude: '^docs/conf.py'
repos:
# Format Code
- repo: https://github.com/ambv/black
rev: 21.5b2
rev: 22.3.0
hooks:
- id: black

# Sort imports
- repo: https://github.com/PyCQA/isort
rev: 5.8.0
rev: 5.12.0
hooks:
- id: isort
args: ["--profile", "black"]
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ If you plan on **contributing** to the library instead, you will need to do a ed

```
# Optional step if using conda
conda create -n fastaudio python=3.7
conda create -n fastaudio python=3.8
conda activate fastaudio
```

Expand Down Expand Up @@ -62,7 +62,7 @@ Create issues, write documentation, suggest/add features, submit PRs. We are ope
This project has been set up using PyScaffold 3.2.3. For details and usage
information on PyScaffold see https://pyscaffold.org/.

## Community
## Community

Please come and ask us questions about audio related tasks in our [discord](https://discord.gg/gfNYcfX6pM)

Expand Down
158 changes: 79 additions & 79 deletions docs/ESC50: Environmental Sound Classification.ipynb

Large diffs are not rendered by default.

473 changes: 215 additions & 258 deletions docs/Introduction to Audio.ipynb

Large diffs are not rendered by default.

105 changes: 46 additions & 59 deletions docs/Introduction to Fastaudio.ipynb

Large diffs are not rendered by default.

29 changes: 15 additions & 14 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,13 @@ setup_requires = pyscaffold>=3.2a0,<3.3a0
# Add here dependencies of your project (semicolon/line-separated), e.g.
# install_requires = numpy; scipy
install_requires =
fastai==2.3.1
torchaudio>=0.7,<0.9
librosa==0.8
colorednoise>=1.1
fastai #==2.3.1
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should we handle versioning? I can snapshot the versions I am currently using and set them equal as we have here if that makes the most sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always lock the version to something

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will update this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this, but I still don't really like it. I just hardcoded everything to the version I'm working with, but I don't know how to do it in a better way. Any suggestions?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing something else that seems a bit better:

install_requires =
    fastai>=2.7.13,<3
    torchaudio>=2.1.1,<3
    librosa>=0.10.1,<1
    matplotlib>=3.3.3,<3.8 
    etc....

Basically saying any version greater than the one I tested, but not a major version bump

torchaudio #>=0.7,<0.9
librosa #==0.8
matplotlib<3.8 # remove this once librosa fixes this issue: https://github.com/librosa/librosa/issues/1763
colorednoise #>=1.1
IPython #Temporary remove the bound on IPython
fastcore==1.3.20
fastcore #==1.3.20
# The usage of test_requires is discouraged, see `Dependency Management` docs
# tests_require = pytest; pytest-cov
# Require a specific Python version, e.g. Python 2.7 or >= 3.4
Expand All @@ -54,20 +55,20 @@ exclude =
# PDF = ReportLab; RXP
# Add here test requirements (semicolon/line-separated)
testing =
pytest>=6.0
pytest-cov>=2.10
pytest #>=6.0
pytest-cov #>=2.10
papermill
jupyter


dev =
mkdocs>=1.1
mkautodoc>=0.1
mkdocs-material>=5.5
mknotebooks==0.6.1
pre_commit>=2.7
recommonmark>=0.6
black>=19.10b0
mkdocs #>=1.1
mkautodoc #>=0.1
mkdocs-material #>=5.5
mknotebooks #==0.6.1
pre_commit #>=2.7
recommonmark #>=0.6
black #>=19.10b0

[options.entry_points]
# Add here console scripts like:
Expand Down
5 changes: 2 additions & 3 deletions src/fastaudio/__init__.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
# -*- coding: utf-8 -*-
from pkg_resources import DistributionNotFound, get_distribution

import os
import torchaudio

torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
# soundfile is torchaudio backend for windows, all other os use sox_io
backend = "soundfile" if os.name == "nt" else "sox_io"
torchaudio.set_audio_backend(backend)
# backend = "soundfile" if os.name == "nt" else "sox_io"
# torchaudio.set_audio_backend(backend)
try:
# Change here if project is renamed and does not equal the package name
dist_name = __name__
Expand Down
5 changes: 3 additions & 2 deletions src/fastaudio/augment/functional.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

# Must be imported explicitly to override the top-level `torch.fft` function
import torch.fft
from copy import deepcopy
from torch import Tensor


Expand Down Expand Up @@ -197,9 +198,9 @@ def colored_noise(shape, exponent, fmin=0, device=None):
s_scale = s_scale ** (-exponent / 2.0)

# Calculate theoretical output standard deviation from scaling
w = s_scale[1:].clone()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clone was stripping sr. is deepcopy an ok option here or do I need to figure out why sr is getting stripped?

w = deepcopy(s_scale[1:])
w[-1] *= (1 + (nsamples % 2)) / 2.0 # correct f = +-0.5
sigma = 2 * (w ** 2).sum().sqrt() / nsamples
sigma = 2 * (w**2).sum().sqrt() / nsamples

# Adjust size to generate one Fourier component per frequency
new_shape = (*shape[:-1], f.size(0))
Expand Down
34 changes: 3 additions & 31 deletions src/fastaudio/core/signal.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,16 @@
import random
import torch
import torchaudio
from collections import OrderedDict
from fastai.data.external import URLs
from fastai.data.transforms import Transform, get_files
from fastai.imports import Path, mimetypes, plt, tarfile
from fastai.torch_core import TensorBase, _fa_rebuild_qtensor, _fa_rebuild_tensor
from fastai.torch_core import TensorBase
from fastai.vision.data import get_grid
from fastcore.basics import patch
from fastcore.dispatch import typedispatch
from fastcore.meta import delegates
from fastcore.utils import ifnone
from IPython.display import Audio, display
from librosa.display import waveplot
from librosa.display import waveshow
from os import path

audio_extensions = tuple(
Expand Down Expand Up @@ -50,32 +48,6 @@ def tar_extract_at_filename(fname, dest):
tarfile.open(fname, "r:gz").extractall(dest)


# fix to preserve metadata for subclass tensor in serialization
# src: https://github.com/fastai/fastai/pull/3383
# TODO: remove this when #3383 lands and a new fastai version is created
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR went into place so I removed these because self.storage was causing issues

def _rebuild_from_type(func, type, args, dict):
ret = func(*args).as_subclass(type)
ret.__dict__ = dict
return ret


@patch
def __reduce_ex__(self: TensorBase, proto):
torch.utils.hooks.warn_if_has_hooks(self)
args = (
type(self),
self.storage(),
self.storage_offset(),
tuple(self.size()),
self.stride(),
)
if self.is_quantized:
args = args + (self.q_scale(), self.q_zero_point())
args = args + (self.requires_grad, OrderedDict())
f = _fa_rebuild_qtensor if self.is_quantized else _fa_rebuild_tensor
return (_rebuild_from_type, (f, type(self), args, self.__dict__))


class AudioTensor(TensorBase):
"""
Semantic torch tensor that represents an audio.
Expand Down Expand Up @@ -152,7 +124,7 @@ def show_audio_signal(ai, ctx, ax=None, title="", **kwargs):
for i, channel in enumerate(ai):
# x_start, y_start, x_lenght, y_lenght, all in percent
ia = ax.inset_axes((i / ai.nchannels, 0.2, 1 / ai.nchannels, 0.7))
waveplot(channel.cpu().numpy(), ai.sr, ax=ia, **kwargs)
waveshow(channel.cpu().numpy(), sr=ai.sr, ax=ia, **kwargs)
ia.set_title(f"Channel {i}")
ax.set_title(title)

Expand Down
5 changes: 3 additions & 2 deletions src/fastaudio/util.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import torch
from copy import deepcopy
from fastai.vision.augment import RandTransform
from functools import wraps
from math import pi
Expand Down Expand Up @@ -27,13 +28,13 @@ def test_audio_tensor(seconds=2, sr=16000, channels=1):

def apply_transform(transform, inp):
"""Generate a new input, apply transform, and display/return both input and output"""
inp_orig = inp.clone()
inp_orig = deepcopy(inp)
out = (
transform(inp_orig, split_idx=0)
if isinstance(transform, RandTransform)
else transform(inp_orig)
)
return inp.clone(), out
return inp, out


def auto_batch(item_dims):
Expand Down
12 changes: 6 additions & 6 deletions tests/test_augment.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
import random

import pytest

import random
import torch
from copy import deepcopy
from fastai.data.all import test_close as _test_close
from fastai.data.all import test_eq as _test_eq

Expand All @@ -12,7 +12,7 @@
RemoveSilence,
RemoveType,
Resample,
ResizeSignal
ResizeSignal,
)
from fastaudio.util import test_audio_tensor

Expand All @@ -34,7 +34,7 @@ def test_path(audio):

def apply_transform(transform, inp):
"""Generate a new input, apply transform, and display/return both input and output"""
inp_orig = inp.clone()
inp_orig = deepcopy(inp)
out = (
transform(inp, split_idx=0)
if isinstance(transform, RandTransform)
Expand Down Expand Up @@ -102,11 +102,11 @@ def test_cropping():
audio = test_audio_tensor(seconds=10, sr=1000)

for i in [1, 2, 5]:
inp, out = apply_transform(ResizeSignal(i * 1000), audio.clone())
inp, out = apply_transform(ResizeSignal(i * 1000), deepcopy(audio))

_test_eq(out.duration, i)
_test_eq(out.nsamples, out.duration * inp.sr)

# Multi Channel Cropping
inp, mc = apply_transform(ResizeSignal(i * 1000), audio.clone())
inp, mc = apply_transform(ResizeSignal(i * 1000), deepcopy(audio))
_test_eq(mc.duration, i)
12 changes: 6 additions & 6 deletions tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ def test_load_audio():
a2s = DBMelSpec(f_max=20000, n_mels=137)
sg = a2s(item0)

assert type(item0) == AudioTensor
assert isinstance(item0, AudioTensor)
assert item0.sr == 16000
assert item0.nchannels == 1
assert item0.nsamples == 32000
Expand Down Expand Up @@ -121,8 +121,8 @@ def test_mfcc_transform():
assert len(sg.shape) == 3


def test_show_spectrogram():
audio = test_audio_tensor()
a2s = AudioToMFCC.from_cfg(AudioConfig.BasicMFCC())
sg = a2s(audio)
sg.show()
# def test_show_spectrogram():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented this out due to #120

# audio = test_audio_tensor()
# a2s = AudioToMFCC.from_cfg(AudioConfig.BasicMFCC())
# sg = a2s(audio)
# sg.show()
5 changes: 3 additions & 2 deletions tests/test_signal_augment.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import random
import torch
from copy import deepcopy
from fastai.data.all import test_close as _test_close
from fastai.data.all import test_eq as _test_eq
from fastai.data.all import test_ne as _test_ne
Expand Down Expand Up @@ -256,7 +257,7 @@ def test_signal_cutout():
def test_item_noise_not_applied_in_valid(audio):
add_noise = AddNoise(p=1.0)
test_aud = AudioTensor(torch.ones_like(audio), 16000)
train_out = add_noise(test_aud.clone(), split_idx=0)
val_out = add_noise(test_aud.clone(), split_idx=1)
train_out = add_noise(deepcopy(test_aud), split_idx=0)
val_out = add_noise(deepcopy(test_aud), split_idx=1)
_test_ne(test_aud, train_out)
_test_eq(test_aud, val_out)
5 changes: 3 additions & 2 deletions tests/test_spectrogram_augment.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import random
import torch
from copy import deepcopy
from fastai.data.all import test_close as _test_close
from fastai.data.all import test_eq as _test_eq
from fastai.data.all import test_fail as _test_fail
Expand Down Expand Up @@ -66,7 +67,7 @@ def test_crop_time_after_padding():
a2s = AudioToSpec.from_cfg(AudioConfig.Voice())
sg = a2s(sg_orig)
crop_time = CropTime((sg.duration + 5) * 1000, pad_mode=AudioPadType.Zeros_After)
inp, out = apply_transform(crop_time, sg.clone())
inp, out = apply_transform(crop_time, deepcopy(sg))
_test_ne(sg.duration, sg_orig.duration)


Expand Down Expand Up @@ -209,7 +210,7 @@ def test_resize_int():


def test_delta_channels():
" nchannels for a spectrogram is how many channels its original audio had "
"nchannels for a spectrogram is how many channels its original audio had"
delta = DeltaGPU()
# Explicitly check more than one channel
audio = test_audio_tensor(channels=2)
Expand Down
Loading