Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AudioSpectrogram failed to be plotted under ClassificationInterpretation #95

Closed
riven314 opened this issue May 16, 2021 · 9 comments · Fixed by #97
Closed

AudioSpectrogram failed to be plotted under ClassificationInterpretation #95

riven314 opened this issue May 16, 2021 · 9 comments · Fixed by #97
Labels
bug Something isn't working released

Comments

@riven314
Copy link
Contributor

riven314 commented May 16, 2021

I have made a public kaggle kernel for reproducing the bug as your reference (look at session 3. Failed to Show AudioSpectrogram for the main point):
https://www.kaggle.com/alexlwh/failed-to-show-audiospec-tensor

Essentially, I tried to show the predictions v.s. ground truth using ClassificationInterpretation as follows:

interp = ClassificationInterpretation.from_learner(learner)

@patch
def __getitem__(self: Interpretation, idxs):
    if not is_listy(idxs):
        idxs = [idxs]
    attrs = 'inputs,preds,targs,decoded,losses'
    res = L([getattr(self, attr)[idxs] for attr in attrs.split(',')])
    return res

@patch
@delegates(TfmdDL.show_results)
def show_at(self: Interpretation, idxs, **kwargs):
    inp, _, targ,dec, _ = self[idxs]
    self.dl.show_results((inp, dec), targ, **kwargs)

interp.show_at(0)

And then I got the following error:

/opt/conda/lib/python3.7/site-packages/fastaudio/core/spectrogram.py in show(self, ctx, ax, title, **kwargs)
     75     def show(self, ctx=None, ax=None, title="", **kwargs):
     76         "Show spectrogram using librosa"
---> 77         return show_spectrogram(self, ctx=ctx, ax=ax, title=title, **kwargs)
     78 
     79 

/opt/conda/lib/python3.7/site-packages/fastaudio/core/spectrogram.py in show_spectrogram(sg, title, ax, ctx, **kwargs)
     87         ia = ax.inset_axes((i / sg.nchannels, 0.2, 1 / sg.nchannels, 0.7))
     88         z = specshow(
---> 89             channel.cpu().numpy(), ax=ia, **sg._all_show_args(show_y=i == 0), **kwargs
     90         )
     91         ia.set_title(f"Channel {i}")

/opt/conda/lib/python3.7/site-packages/librosa/display.py in specshow(data, x_coords, y_coords, x_axis, y_axis, sr, hop_length, fmin, fmax, tuning, bins_per_octave, key, Sa, mela, thaat, ax, **kwargs)
    843     # Get the x and y coordinates
    844     y_coords = __mesh_coords(y_axis, y_coords, data.shape[0], **all_params)
--> 845     x_coords = __mesh_coords(x_axis, x_coords, data.shape[1], **all_params)
    846 
    847     axes = __check_axes(ax)

/opt/conda/lib/python3.7/site-packages/librosa/display.py in __mesh_coords(ax_type, coords, n, **kwargs)
    915     if ax_type not in coord_map:
    916         raise ParameterError("Unknown axis type: {}".format(ax_type))
--> 917     return coord_map[ax_type](n, **kwargs)
    918 
    919 

/opt/conda/lib/python3.7/site-packages/librosa/display.py in __coord_time(n, sr, hop_length, **_kwargs)
   1194 def __coord_time(n, sr=22050, hop_length=512, **_kwargs):
   1195     """Get time coordinates from frames"""
-> 1196     return core.frames_to_time(np.arange(n + 1), sr=sr, hop_length=hop_length)

/opt/conda/lib/python3.7/site-packages/librosa/core/convert.py in frames_to_time(frames, sr, hop_length, n_fft)
    186     samples = frames_to_samples(frames, hop_length=hop_length, n_fft=n_fft)
    187 
--> 188     return samples_to_time(samples, sr=sr)
    189 
    190 

/opt/conda/lib/python3.7/site-packages/librosa/core/convert.py in samples_to_time(samples, sr)
    304     """
    305 
--> 306     return np.asanyarray(samples) / float(sr)
    307 
    308 

TypeError: float() argument must be a string or a number, not 'NoneType'

I suspect it is because at some point the AudioSpectrogram.sr failed to be propagated to ClassificationInterpretation, rendering interp.inputs.sr = None. (Not sure exactly at which point the issue happens yet):

interp.inputs._all_show_args()
>>
{'x_coords': None,
 'y_coords': None,
 'x_axis': 'time',
 'y_axis': 'mel',
 'sr': None,
 'hop_length': 1024,
 'fmin': None,
 'fmax': None,
 'tuning': 0.0,
 'bins_per_octave': 12,
 'key': 'C:maj',
 'Sa': None,
 'mela': None,
 'thaat': None,
 'cmap': 'viridis'}
@riven314 riven314 changed the title AudioSpectrogram failed to be shown under ClassificationInterpretation AudioSpectrogram failed to be plotted under ClassificationInterpretation May 16, 2021
@scart97 scart97 added the bug Something isn't working label May 17, 2021
@scart97
Copy link
Collaborator

scart97 commented May 17, 2021

I never tried to run InterpretationClassification on audio models, so it's not surprising that it's broken. Why did you have to @patch those two methods?

@riven314
Copy link
Contributor Author

@scart97
just for trying out a patch trick from a Zach's tutorial: https://walkwithfastai.com/basics.interp
The bug persists even if I dont do the patch trick, e.g. :

inputs, preds, targs, decoded, losses = learner.get_preds(
    dl=None, with_input=True, 
    with_loss=True, with_decoded=True, act=None
)
learner.dls[1].show_results((inputs, decoded), targs)

@scart97
Copy link
Collaborator

scart97 commented May 18, 2021

Didn't find the root cause yet, but I could isolate the problem better. The issue is not related to ClassificationInterpretation directly, the input tensor doesn't have the correct sr attribute from the beginning when iterating over the data. To make it more clear, if you grab a batch with dls.one_batch() it will have all the properties as expected, but iterating over causes problems:

Screenshot from 2021-05-18 17-19-59

I guess the problem needs to be fixed on fastai, but I don't have enough knowledge about the dataloaders stuff yet to properly understand what is happening here.

@riven314
Copy link
Contributor Author

riven314 commented May 19, 2021

@scart97
Thanks for narrowing down the scope. I found sth strange on top of that:
AudioSpectrogram managed to inherit sr when the iteration has NO multiprocessing. So I think something fishy happens when we do multiprocessing in DataLoader. For better visibility, I have updated my kaggle kernel for reproducing the bug:
https://www.kaggle.com/alexlwh/failed-to-show-audiospec-tensor

e.g.
Screenshot from 2021-05-19 23-06-07

@kevinbird15
Copy link
Collaborator

kevinbird15 commented May 22, 2021

After digging into this with Arto from fastai discord today, Arto submitted a PR that will fix TensorBase with his PR: fastai/fastai#3383

I believe we can do something similar as a temporary fix, but once we go to pytorch 1.8.1, we should be able to rely on Tensor.__reduce_ex__ directly since the new version handles the object.__dict__ properly (which ends up being the issue in pytorch 1.7.1 and fastai (for slightly different reasons). Should I create a temporary work around similar to what Arto did in fastai to fix it with our current version of fastaudio that uses fastai 2.2.7 or do we want to increase the version requirement to pytorch 1.8.1 instead (which would also fix the issue).

@riven314
Copy link
Contributor Author

riven314 commented May 22, 2021

Have tried out the fix. It could fix the issue I raised here (i.e. Interpretation.show_at failed to work for AudioTensor)
Below is my updated public kernel with the fix:
https://www.kaggle.com/alexlwh/failed-to-show-audiospec-tensor

@kevinbird15
Copy link
Collaborator

kevinbird15 commented May 22, 2021

so adding that code back here, we would need to add this somewhere:

# fix to preserve metadata for subclass tensor in serialization
# src: https://github.com/fastai/fastai/pull/3383
from fastai.torch_core import _fa_rebuild_qtensor, _fa_rebuild_tensor

def _rebuild_from_type(func, type, args, dict):
    ret = func(*args).as_subclass(type)
    ret.__dict__ = dict
    return ret

@patch
def __reduce_ex__(self: TensorBase, proto):
    torch.utils.hooks.warn_if_has_hooks(self)
    args = (type(self), self.storage(), self.storage_offset(), tuple(self.size()), self.stride())
    if self.is_quantized: args = args + (self.q_scale(), self.q_zero_point())
    args = args + (self.requires_grad, OrderedDict())
    f = _fa_rebuild_qtensor if self.is_quantized else  _fa_rebuild_tensor
    return (_rebuild_from_type, (f, type(self), args, self.__dict__))

Then we should be able to keep the same 2.2.7 requirement and stick with all the same dependencies right?

Where would these chunks of code be added in fastaudio? maybe instead of patching TensorBase, this __reduce_ex__ should be added to AudioTensor? That would be my proposal at least.

@scart97
Copy link
Collaborator

scart97 commented May 22, 2021

The fix needs to be applied at the Tensorbase, as it affects both the audio and the spectrogram tensors. The plan is to keep this patch at fastaudio until fastai has a new version that fixes it upstream, then we bump the minimal fastai required by the library

@scart97
Copy link
Collaborator

scart97 commented May 22, 2021

🎉 This issue has been resolved in version 0.1.6 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working released
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants