# Known-Artist Live Song ID: A Hashprint Approach

This notebook is a python implementation of the [original matlab version] (http://pages.hmc.edu/ttsai/assets/livesongid.tar.gz). The pipeline consists of the following modules:

1. Preprocessing with Constant-Q Transform (CQT)
2. Learning PCA filters
3. Obtaining hashprint representation by applying PCA filters to preprocessed CQT matrix
4. Generating database with the hashprint of pitch-shifted CQT matrix of reference tracks
5. Matching query against database

In [1]:
GIT_DIR = '/home/zhwang/workspace/live-song-id/'
%cd $GIT_DIR

/home/zhwang/workspace/live-song-id


## Data Preprocessing

For a given artist, we first preprocess the list of reference tracks by taking CQT transform for each audio file.

In [2]:
from preprocess import *
import os

In [3]:
# audio_path - where audio files are stored
# list_dir - where list of softlinks are stored
# cqt_dir - where results will be stored
audio_path = '/home/nbanerjee/SoftLinks/'
list_dir = audio_path + '/Lists/'
cqt_dir = '/home/zhwang/ttemp/livesong_results/' # change this
artist = 'taylorswift'
out_dir = os.path.join(cqt_dir, artist+'_out')
file_paths = get_allpaths(artist, list_dir)

%cd $cqt_dir
%mkdir $out_dir
%cd $out_dir

/data1/zhwang/livesong_results
mkdir: cannot create directory '/home/zhwang/ttemp/livesong_results/taylorswift_out': File exists
/data1/zhwang/livesong_results/taylorswift_out


In [None]:
f = open(artist + '_cqtList.txt', 'w')
for cur_file in file_paths:
    print('==> Taking CQT of %s'%cur_file)
    y, sr = librosa.load(audio_path + cur_file + '.wav')
    Q = librosa.cqt(y, sr=sr, fmin=130.81, n_bins=121, bins_per_octave=24)
    logQ = preprocess(Q, 3)
    cur_file_path = os.path.join(os.getcwd(), cur_file + '.dat')
    np.savetxt(cur_file_path, logQ)
    f.write(cur_file_path + '\n')
f.close()


## Learning PCA Filters

In [4]:
%cd $GIT_DIR
from PCA import *

/home/zhwang/workspace/live-song-id


Next, we generate the filters for the hashprint representation by taking PCA on the covariance matrix for all reference songs for this artist. First, we compute the covariance matrix:

In [None]:
f = open(os.path.join(out_dir, artist + '_cqtList.txt'), 'r')
num_features = 64
nbins = 121
m = 20 # number of frames

In [8]:
accum_cov = np.zeros((nbins * m, nbins * m))
count = 0

for line in f:
    print('==> Computing covariance matrix for %s'%(os.path.basename(line)))
    Q = np.loadtxt(line[:-1])
    A = getTDE(Q)
    if A.shape[1] > 1:
        accum_cov += np.cov(A.T)
    count += 1
f.close()

==> Computing covariance matrix for taylorswift_ref1.dat

==> Computing covariance matrix for taylorswift_ref2.dat

==> Computing covariance matrix for taylorswift_ref3.dat

==> Computing covariance matrix for taylorswift_ref4.dat

==> Computing covariance matrix for taylorswift_ref5.dat

==> Computing covariance matrix for taylorswift_ref6.dat

==> Computing covariance matrix for taylorswift_ref7.dat

==> Computing covariance matrix for taylorswift_ref8.dat

==> Computing covariance matrix for taylorswift_ref9.dat

==> Computing covariance matrix for taylorswift_ref10.dat

==> Computing covariance matrix for taylorswift_ref11.dat

==> Computing covariance matrix for taylorswift_ref12.dat

==> Computing covariance matrix for taylorswift_ref13.dat

==> Computing covariance matrix for taylorswift_ref14.dat

==> Computing covariance matrix for taylorswift_ref15.dat

==> Computing covariance matrix for taylorswift_ref16.dat

==> Computing covariance matrix for taylorswift_ref17.dat

==> Co

Next, we compute PCA from these covariance matrices.

In [14]:
evals, evecs = LA.eig(accum_cov / count)
ind = np.argsort(-evals)
evecs = (evecs[:, ind])[:, :num_features]
np.savetxt(os.path.join(out_dir, artistst + '_model.dat'), evecs)

## Applying Filters to CQT Matrix

## Generating Database

## Matching Query Against Database