Self-Attention for Audio Super-resolution

Paper arXiv

Paper IEEE Xplore

Dependencies

The model is implemented in Python 3.7.12
Additional packages:

tensorflow==2.6.0
keras==2.6.0
h5py==3.1.0
numpy==1.19.5
scipy==1.4.1

Data

Data preparation is the same as TFiLM . The main dataset is VCTK speech. The model expects .h5 archives contaning pairs of high and low resolution sound patches.

Pre-trained models

The shared models were trained in a similar way to previous audio super-resolution models for fair comparison purposes. You will be able to get better results by training them longer. Moreover, the models do not generalize to other domains. If you recordings are very different from the dataset, you should collect additional data.

Scale	Dataset	Model
2	VCTK Multi	Download
4	VCTK Multi	Download
8	VCTK Multi	Download

Training

Run the codes/train.py script.

usage: train.py [-h] [--model {afilm,tfilm}] --train TRAIN --val VAL
                [-e EPOCHS] [--batch-size BATCH_SIZE] [--logname LOGNAME]
                [--layers LAYERS] [--alg ALG] [--lr LR]
                [--save_path SAVE_PATH] [--r R] [--pool_size POOL_SIZE]
                [--strides STRIDES]

optional arguments:
  -h, --help            show this help message and exit
  --model {afilm,tfilm}
                        model to train
  --train TRAIN         path to h5 archive of training patches
  --val VAL             path to h5 archive of validation set patches
  -e EPOCHS, --epochs EPOCHS
                        number of epochs to train
  --batch-size BATCH_SIZE
                        training batch size
  --logname LOGNAME     folder where logs will be stored
  --layers LAYERS       number of layers in each of the D and U halves of the
                        network
  --alg ALG             optimization algorithm
  --lr LR               learning rate
  --save_path SAVE_PATH
                        path to save the model
  --r R                 upscaling factor
  --pool_size POOL_SIZE
                        size of pooling window
  --strides STRIDES     pooling stide

Testing

Run the codes/test.py script.

usage: test.py [-h] --pretrained_model PRETRAINED_MODEL
               [--out-label OUT_LABEL] [--wav-file-list WAV_FILE_LIST]
               [--layers LAYERS] [--r R] [--sr SR] [--patch_size PATCH_SIZE]

optional arguments:
  -h, --help            show this help message and exit
  --pretrained_model PRETRAINED_MODEL
                        path to pre-trained model
  --out-label OUT_LABEL
                        append label to output samples
  --wav-file-list WAV_FILE_LIST
                        list of audio files for evaluation
  --layers LAYERS       number of layers in each of the D and U halves of the
                        network
  --r R                 upscaling factor
  --sr SR               high-res sampling rate
  --patch_size PATCH_SIZE
                        Size of patches over which the model operates

Citation

@INPROCEEDINGS{9596082,
author={Rakotonirina, Nathanaël Carraz},
booktitle={2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP)}, 
title={Self-Attention for Audio Super-Resolution}, 
year={2021},
volume={},
number={},
pages={1-6},
keywords={Training;Recurrent neural networks;Convolution;Superresolution;Modulation;Machine learning;Network architecture;audio super-resolution;bandwidth extension;self-attention},
doi={10.1109/MLSP52302.2021.9596082}}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
codes		codes
img		img
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Attention for Audio Super-resolution

Paper arXiv

Paper IEEE Xplore

Dependencies

Data

Pre-trained models

Training

Testing

Citation

About

Releases

Packages

Languages

License

ncarraz/AFILM

Folders and files

Latest commit

History

Repository files navigation

Self-Attention for Audio Super-resolution

Paper arXiv

Paper IEEE Xplore

Dependencies

Data

Pre-trained models

Training

Testing

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages