<a href="https://colab.research.google.com/github/michele-perrone/SpectrogramPlayer/blob/main/Source/Dataset_Importer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Dataset Importer
This notebook download the datasets and stores them in separate folders located on Google Drive. This is done to avoid downloading the datasets every time we need them in a notebook.

In [None]:
# Mount the Google Drive

from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
# For Kaggle datasets: configure the Kaggle environment

import os
os.environ['KAGGLE_CONFIG_DIR'] = '/content/gdrive/MyDrive/SpectrogramPlayer/'
!cd '/content/gdrive/MyDrive/SpectrogramPlayer/' && chmod 600 kaggle.json

## LJSpeech dataset
Dataset containing speech fragments

In [None]:
!kaggle datasets download -d rahulbhalley/ljspeech11

Downloading ljspeech11.zip to /content
 99% 2.97G/2.99G [00:21<00:00, 162MB/s]
100% 2.99G/2.99G [00:21<00:00, 150MB/s]


In [None]:
!unzip \*.zip -d '/content/gdrive/MyDrive/SpectrogramPlayer/LJSPEECH11' && rm *.zip

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/LJSPEECH11/LJSpeech-1.1/wavs/LJ030-0111.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/LJSPEECH11/LJSpeech-1.1/wavs/LJ030-0112.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/LJSPEECH11/LJSpeech-1.1/wavs/LJ030-0113.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/LJSPEECH11/LJSpeech-1.1/wavs/LJ030-0114.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/LJSPEECH11/LJSpeech-1.1/wavs/LJ030-0115.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/LJSPEECH11/LJSpeech-1.1/wavs/LJ030-0116.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/LJSPEECH11/LJSpeech-1.1/wavs/LJ030-0117.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/LJSPEECH11/LJSpeech-1.1/wavs/LJ030-0118.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/LJSPEECH11/LJSpeech-1.1/wavs/LJ030-0119.wav  
  inflating: /content/gdri

## GTZan dataset
Dataset containing audio excerpts of different musical genres

In [None]:
!kaggle datasets download -d andradaolteanu/gtzan-dataset-music-genre-classification

Downloading gtzan-dataset-music-genre-classification.zip to /content
 99% 1.20G/1.21G [00:06<00:00, 161MB/s]
100% 1.21G/1.21G [00:07<00:00, 186MB/s]


In [None]:
!unzip \*.zip -d '/content/gdrive/MyDrive/SpectrogramPlayer/GTZAN' && rm *.zip

Archive:  gtzan-dataset-music-genre-classification.zip
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/Data/features_30_sec.csv  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/Data/features_3_sec.csv  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/Data/genres_original/blues/blues.00000.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/Data/genres_original/blues/blues.00001.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/Data/genres_original/blues/blues.00002.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/Data/genres_original/blues/blues.00003.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/Data/genres_original/blues/blues.00004.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/Data/genres_original/blues/blues.00005.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/Data/genres_original/blues/blues.00006.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/Data/genres_original/blues/

## Urban8k dataset
Dataset containing various acoustic scenes from the city

In [None]:
!kaggle datasets download -d chrisfilo/urbansound8k

In [None]:
!unzip \*.zip -d '/content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K' && rm *.zip

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K/fold4/151877-5-1-0.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K/fold4/154758-5-0-0.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K/fold4/154758-5-0-1.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K/fold4/154758-5-0-10.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K/fold4/154758-5-0-11.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K/fold4/154758-5-0-12.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K/fold4/154758-5-0-13.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K/fold4/154758-5-0-14.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K/fold4/154758-5-0-15.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/URBAN8K/fold4/154758-5-0-16.wav  
  inflating: /content/gdrive/MyDrive/Spectrogram

## Tut Urban Acoustic Scenes dataset
Dataset containing audio excerpts of urban sounds

In [None]:
base_url = "https://zenodo.org/record/1228142/files/TUT-urban-acoustic-scenes-2018-development.audio."
extension = ".zip"

for i in range(1, 14):
  url_string = base_url + str(i) + extension
  !wget $url_string>0

In [None]:
!unzip \*.zip -d '/content/gdrive/MyDrive/SpectrogramPlayer/TUT' && rm *.zip

Archive:  TUT-urban-acoustic-scenes-2018-development.audio.1.zip
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/TUT/TUT-urban-acoustic-scenes-2018-development/audio/airport-barcelona-0-0-a.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/TUT/TUT-urban-acoustic-scenes-2018-development/audio/airport-barcelona-0-1-a.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/TUT/TUT-urban-acoustic-scenes-2018-development/audio/airport-barcelona-0-10-a.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/TUT/TUT-urban-acoustic-scenes-2018-development/audio/airport-barcelona-0-11-a.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/TUT/TUT-urban-acoustic-scenes-2018-development/audio/airport-barcelona-0-12-a.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/TUT/TUT-urban-acoustic-scenes-2018-development/audio/airport-barcelona-0-13-a.wav  
  inflating: /content/gdrive/MyDrive/SpectrogramPlayer/TUT/TUT-urban-acoustic-scenes-2018-development/a

## MAESTRO dataset
Dataset containing annotated piano performances