This forked repository has updates for modelling Audio Textures. Please see the original official README from NVIDIA here for details on licenses and citations.
Note: This version of StyleGAN2 is not compatible with PyTorch>1.8. I use PyTorch 1.7 for my experiments.
Please see links for the datasets I used for my experiments -
- TokWotel (Wood and Metal hits separated) - https://drive.google.com/file/d/1xjU868UgJBwnkrFEXlJg1S5u-SUNK6ag/view?usp=sharing
- A subset and pre-processed Greatest Hits Dataset - https://drive.google.com/file/d/1U3QRj3GQTlCcLj4BriSaWd3JYIP5sE4W/view?usp=sharing (original here)
Please use the notebook called pghi-test.ipynb to visualise the spectrogram representations.
To training new networks use the commands below. Note that the datasets directory should contain the '*.wav' files with no sub-directory structure. Also, all my experiments were unconditional training. For conditioned training you will need an additional dataset.json
as explained in the original NVIDIA README.
The flag --aug=noaug is important. The augmentations (rotation etc.,) used in the computer vision domain will not work for audio spectrograms learning.
python train.py --outdir=training-runs --data=datasets/tokwotel --gpus=1 --aug=noaug --dry-run
python train.py --outdir=training-runs --data=datasets/tokwotel --gpus=1 --aug=noaug
python train.py --outdir=training-runs --data=datasets/vis-data-256-split --gpus=1 --aug=noaug --dry-run
python train.py --outdir=training-runs --data=datasets/vis-data-256-split --gpus=1 --aug=noaug
We use PGHI method to generate Spectrograms. StyleGAN architectures for audio learn spectrogram representations as images and thus need to be scaled from [-50,0] to [0,255]. For this, please use the generate-rescaled-final.ipynb