# Reference

**Library**

* [librosa](https://librosa.org/doc/main/feature.html)


**Tech Blog**

* [제 06절 아날로그 신호](https://m.blog.naver.com/gkenq/220679236344)
* [AI에게 어떻게 음성을 가르칠까?](https://tech.kakaoenterprise.com/66)
* [음성인식 입문하기](https://pyy0715.github.io/Audio/)
* [음악 데이터 - spectral_centroid, sepectral_rolloff](https://0equal2.tistory.com/144)
* [[Librosa] music/audio processing library Librosa 사용법 Tutorial - (3) Audio feature extraction](https://bo-10000.tistory.com/entry/Librosa-musicaudio-processing-library-Librosa-%EC%82%AC%EC%9A%A9%EB%B2%95-Tutorial-3-Audio-feature-extraction)

* [Tonnetz 음악이론을 딥러닝에 접목시키기](https://inspiringpeople.github.io/data%20analysis/tonnetz-dl/)


**You Tube**

* [DMQA, Introduction to Analysis for Sound Data](https://www.youtube.com/watch?v=1Hhj14QhkaE&t=718s)


**데이콘 코드 공유**
* [mel-spectrogram, mfcc 활용한 CNN ensemble + 5fold / public 0.98](https://dacon.io/en/codeshare/5153)

# Setting

## Library

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

import os
from tqdm.auto import tqdm
import random

In [2]:
import librosa
import librosa.display
import IPython.display as ipd

In [3]:
import warnings
warnings.filterwarnings(action='ignore') 

## Fixed Random Seed

In [4]:
def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)

seed_everything(42) # Seed 고정

# Load Data Set

## Google Drive Mount

In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Unzip File

In [6]:
!unzip -qq '/content/drive/MyDrive/머신러닝 엔지니어링/데이콘/기계 고장 진단/data/기계_고장.zip'

## Load Train / Test Set

In [7]:
df_train = pd.read_csv('./train.csv') # 모두 정상 Sample
df_test = pd.read_csv('./test.csv')

In [8]:
print(df_train.shape)
df_train.head()

(1279, 4)


Unnamed: 0,SAMPLE_ID,SAMPLE_PATH,FAN_TYPE,LABEL
0,TRAIN_0000,./train/TRAIN_0000.wav,2,0
1,TRAIN_0001,./train/TRAIN_0001.wav,0,0
2,TRAIN_0002,./train/TRAIN_0002.wav,0,0
3,TRAIN_0003,./train/TRAIN_0003.wav,2,0
4,TRAIN_0004,./train/TRAIN_0004.wav,2,0


In [9]:
print(df_test.shape)
df_test.head()

(1514, 3)


Unnamed: 0,SAMPLE_ID,SAMPLE_PATH,FAN_TYPE
0,TEST_0000,./test/TEST_0000.wav,2
1,TEST_0001,./test/TEST_0001.wav,2
2,TEST_0002,./test/TEST_0002.wav,0
3,TEST_0003,./test/TEST_0003.wav,0
4,TEST_0004,./test/TEST_0004.wav,0


# Feature Extraction

## Introduction

**Sound**

* 공기나 물 같은 매질의 진동을 통해 전달되는 종파
* 사람의 귀에 들려오는 소리는 공기로 전해오는 파동

**소리의 3요소**

* 세기 (소리의 크기)
* 높낮이 (소리의 높고 낮음)
* 음색 (소리의 색상)

**아날로그 신호**

* 자연계에 포함되어 있는 연속적인 파형
* 주기 신호와 비주기 신호로 분류할 수 있다

**주기 신호**

* 연속적으로 반복된 패턴으로 구성
* 사이클 (Cycle) - 하나의 완성된 패턴
* 정형파와 비정형파로 분류할 수 있다

**비주기 신호**

* 시간에 따라 반복된 패턴이나 사이클 없이 항상 변한다


**정형파**

1. 진폭 (Amplitude)

  - 신호의 크기나 또는 세기를 나타낸다
  - 신호의 높이
  - 단위 : V (volt)

2. 주기 (Period)와 주파수 (Frequency)

  - 주기 : 하나의 사이클을 완성하는데 필요한 시간
  - 주파수 : 주기의 역수, 1초에 완성되는 주기 횟수
  - 높은 주파수 : 짧은 기간내의 변화
  - 낮은 주파수 : 긴 기간에 걸친 변화
  - 단위 : HZ (cycle / second)

3. 위상 (phase)

  - 임의의 시간에서 반송파 사이클의 상대적인 위치
  - 시간 0에서 대한 파형의 상대적인 위치
  - 시간축을 따라 앞뒤로 이동될 수 있는 파형에서 그 이동된 양
  - 첫 사이클의 상태
  - 단위 : degree

4. 대역폭 (Bandwidth)

  - 복합신호의 대역폭은 신호에 포함된 최고 주파수와 최저 주파수 사이의 차이

**Sampling Rate**

* 연속된 신호를 디지털 신호로 바꾸는 1초당 들리는 Sample 개수

* 소리를 컴퓨터에 입력시키기 위해 음파를 숫자로 표현할 필요가 있음

* 샘플링 레이트는 아날로그 신호를 숫자화 시킨 것

**스펙트럼 (Spectrum)**

* 푸리에 변환을 통해 time domain에서 frequency domain으로 바뀐 그래프
* 특정 시간 길이의 음석 조각 (프레임)이 각각의 주파수 성분들을 얼마칸큼 갖고 있는지 의미

**스펙트로그램 (Spectrogram)**

* 파형 (Waveform)과 스펙트럼 (Spectrum) 조합
* 음향 신호를 주파수, 진폭, 시간으로 구분
* X축 : 시간
* Y축 : 주파수
* Z축 : 진폭



## Zero Cross Rate

* Compute the zero-crossing rate of an audio time series
* 신호의 부호가 바뀌는 비율

In [10]:
def get_zero_crossing_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 mfcc 추출
        zero = librosa.feature.zero_crossing_rate(y=y)

        y_feature = []
        for e in zero:

            # 추출된 MFCC들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 MFCC들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e, 0.1)

            y_feature.append(e)

        features.append(y_feature)
    
    zero_df = pd.DataFrame(features,
                           columns=['Zero Crossing Rate'])

    print(zero_df.shape)

    return zero_df

In [11]:
zero_train = get_zero_crossing_feature(df_train)
zero_test = get_zero_crossing_feature(df_test)

zero_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 1)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 1)


Unnamed: 0,Zero Crossing Rate
0,0.133064
1,0.047472
2,0.057276
3,0.130589
4,0.142584


## RMS

* Compute root-mean-square (RMS) value for each frame, either from the audio samples y or from a spectrogram S.

In [12]:
def get_rms_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 RMS 추출
        rms = librosa.feature.rms(y=y)

        y_feature = []
        for e in rms:

            # 추출된 RMS의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 RMS의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e, 0.1)

            y_feature.append(e)

        features.append(y_feature)
    
    rms_df = pd.DataFrame(features,
                           columns=['RMS'])

    print(rms_df.shape)

    return rms_df

In [13]:
rms_train = get_rms_feature(df_train)
rms_test = get_rms_feature(df_test)

rms_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 1)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 1)


Unnamed: 0,RMS
0,0.005121
1,0.004604
2,0.004401
3,0.005163
4,0.004931


## Poly Feature


* Get coefficients of fitting an nth-order polynomial to the columns of a spectrogram

In [14]:
def get_poly_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 poly 추출
        poly = librosa.feature.poly_features(
                                             y=y,
                                             sr=sr,
                                             #order=2
                                             )

        y_feature = []
        for e in poly:

            # 추출된 Poly들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 Poly들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e, 0.1)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['Poly'+str(i) for i in range(len(features[0]))]
    
    poly_df = pd.DataFrame(features,
                           columns=columns)

    print(poly_df.shape)

    return poly_df

In [15]:
poly_train = get_poly_feature(df_train)
poly_test = get_poly_feature(df_test)

poly_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 2)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 2)


Unnamed: 0,Poly0,Poly1
0,-3.2e-05,0.20298
1,-2.2e-05,0.125001
2,-2.1e-05,0.124319
3,-3.2e-05,0.20278
4,-3e-05,0.195355


## Spectral

### Spectral Centroid

* Compute the spectral centroid

* Each frame of a magnitude spectrogram is normalized and treated as a distribution over frequency bins, from which the mean (centroid) is extracted per frame

* 음성의 각 프레임마다 평균(중심) 주파수를 반환

* Spectrum의 질량중심(Center of mass)을 구한다. Frequency들의 magnitude에 따른 centroid들의 weighted sum이라고 생각하면 된다.

In [16]:
def get_spectral_centroid_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 Spectral Centroid 추출
        centroid = librosa.feature.spectral_centroid(y=y, sr=sr)

        y_feature = []
        for e in centroid:

            # 추출된 Spectral Centroid들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 Spectral Centroid들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['spectral_centroid_'+str(i) for i in range(len(features[0]))]
    
    centroid_df = pd.DataFrame(features,
                           columns=columns)

    print(centroid_df.shape)

    return centroid_df

In [17]:
spectral_centroid_train = get_spectral_centroid_feature(df_train)
spectral_centroid_test = get_spectral_centroid_feature(df_test)

spectral_centroid_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 1)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 1)


Unnamed: 0,spectral_centroid_0
0,1746.248047
1,966.565838
2,1206.676823
3,1731.6838
4,1845.114687


### Spectral Bandwidth

* Compute p’th-order spectral bandwidth.

* Spectrum의 bandwidth를 측정한다. Frequency들의 magnitude에 따른 centroid의 s.t.d.들의 weighted sum이라고 생각하면 된다.

In [18]:
def get_spectral_bandwidth_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 Spectral Bandwidth 추출
        bandwidth = librosa.feature.spectral_bandwidth(y=y, sr=sr)

        y_feature = []
        for e in bandwidth:

            # 추출된 Spectral Bandwidth들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 Spectral Bandwidth들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e, 0.1)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['spectral_bandwidth_'+str(i) for i in range(len(features[0]))]
    
    bandwidth_df = pd.DataFrame(features,
                           columns=columns)

    print(bandwidth_df.shape)

    return bandwidth_df

In [19]:
spectral_bandwidth_train = get_spectral_bandwidth_feature(df_train)
spectral_bandwidth_test = get_spectral_bandwidth_feature(df_test)

spectral_bandwidth_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 1)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 1)


Unnamed: 0,spectral_bandwidth_0
0,1731.017118
1,1345.701719
2,1619.794231
3,1727.921959
4,1727.019823


### Spectral Contrast

* Compute spectral contrast

* Each frame of a spectrogram S is divided into sub-bands. For each sub-band, the energy contrast is estimated by comparing the mean energy in the top quantile (peak energy) to that of the bottom quantile (valley energy). High contrast values generally correspond to clear, narrow-band signals, while low contrast values correspond to broad-band noise.

In [20]:
def get_spectral_contrast_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 Spectral Contrast 추출
        contrast = librosa.feature.spectral_contrast(y=y, sr=sr)

        y_feature = []
        for e in contrast:

            # 추출된 Spectral Contrast들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 Spectral Contrast들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['spectral_contrast_'+str(i) for i in range(len(features[0]))]
    
    contrast_df = pd.DataFrame(features,
                           columns=columns)

    print(contrast_df.shape)

    return contrast_df

In [21]:
spectral_contrast_train = get_spectral_contrast_feature(df_train)
spectral_contrast_test = get_spectral_contrast_feature(df_test)

spectral_contrast_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 7)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 7)


Unnamed: 0,spectral_contrast_0,spectral_contrast_1,spectral_contrast_2,spectral_contrast_3,spectral_contrast_4,spectral_contrast_5,spectral_contrast_6
0,16.696441,12.598988,14.744823,15.444423,15.381834,15.325508,52.93793
1,20.836776,11.471659,15.647303,15.401937,15.448171,17.106672,47.677406
2,21.56204,12.946827,16.768408,14.494164,15.234417,16.276049,51.275824
3,17.150536,12.129708,15.350182,15.838211,15.511526,15.375888,53.091559
4,16.29705,12.64251,14.279309,15.827333,15.504191,15.40679,53.62698


### Spectral Flatness

* Compute spectral flatness

* Spectral flatness (or tonality coefficient) is a measure to quantify how much noise-like a sound is, as opposed to being tone-like 1. A high spectral flatness (closer to 1.0) indicates the spectrum is similar to white noise. It is often converted to decibel.

* Spectrum의 noisiness (또는 tonality)를 구한다. Frequency들의 기하평균'(geometric mean)에 대한 산술평균(arithmetic mean)의 비를 이용해 구할 수 있다. 1에 가까울수록 white noise (maximum flatness)를 의미한다.

In [22]:
def get_spectral_flatness_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 Spectral Flatness 추출
        flatness = librosa.feature.spectral_flatness(y=y)

        y_feature = []
        for e in flatness:

            # 추출된 Spectral Flatness들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 Spectral Flatness들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e, 0.1)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['spectral_flatness_'+str(i) for i in range(len(features[0]))]
    
    flatness_df = pd.DataFrame(features,
                           columns=columns)

    print(flatness_df.shape)

    return flatness_df

In [23]:
spectral_flatness_train = get_spectral_flatness_feature(df_train)
spectral_flatness_test = get_spectral_flatness_feature(df_test)

spectral_flatness_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 1)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 1)


Unnamed: 0,spectral_flatness_0
0,0.045295
1,0.004149
2,0.009043
3,0.043398
4,0.051099


### Spectral Rolloff

* Compute roll-off frequency.

* The roll-off frequency is defined for each frame as the center frequency for a spectrogram bin such that at least roll_percent (0.85 by default) of the energy of the spectrum in this frame is contained in this bin and the bins below. This can be used to, e.g., approximate the maximum (or minimum) frequency by setting roll_percent to a value close to 1 (or 0).

* 스펙트로그렘에서 roll percent 위치에 차지하는 주파수를 구함

In [24]:
def get_spectral_rolloff_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 Spectral Rolloff 추출
        rolloff = librosa.feature.spectral_rolloff(y=y,
                                                   #roll_percent=0.85
                                                   )

        y_feature = []
        for e in rolloff:

            # 추출된 Spectral Rolloff들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 Spectral Rolloff들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['spectral_rolloff_'+str(i) for i in range(len(features[0]))]
    
    rolloff_df = pd.DataFrame(features,
                           columns=columns)

    print(rolloff_df.shape)

    return rolloff_df

In [25]:
spectral_rolloff_train = get_spectral_rolloff_feature(df_train)
spectral_rolloff_test = get_spectral_rolloff_feature(df_test)

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 1)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 1)


## Mel Spectrogram

* 사람은 낮은 주파수를 높은 주파수보다 더 예민하게 받아들임

* 500 ~ 1500 Hz 가 바뀌는건 예민하게 인식하는데 반해 10000Hz ~ 11000Hz가 바뀌는 것은 잘 인식 못한다는 것.

* 주파수 단위를 다음 공식에 따라 멜 스케일로 변환 

* 해당 scale을 적용해 spectogram을 만든 것이 melspectogram

In [26]:
def get_melspectrogram_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 melspectrogram 추출
        melspectrogram = librosa.feature.melspectrogram(y=y,
                                                        sr=sr,
                                                        n_fft=2048,
                                                        hop_length=512,
                                                        )

        y_feature = []
        for e in melspectrogram:

            # 추출된 melspectrogram들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 melspectrogram들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e, 0.1)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['Mel_Spectrogram_'+str(i) for i in range(len(features[0]))]
    
    melspectrogram_df = pd.DataFrame(features,
                           columns=columns)

    print(melspectrogram_df.shape)

    return melspectrogram_df

In [27]:
melspectrogram_train = get_melspectrogram_feature(df_train)
melspectrogram_test = get_melspectrogram_feature(df_test)

melspectrogram_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 128)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 128)


Unnamed: 0,Mel_Spectrogram_0,Mel_Spectrogram_1,Mel_Spectrogram_2,Mel_Spectrogram_3,Mel_Spectrogram_4,Mel_Spectrogram_5,Mel_Spectrogram_6,Mel_Spectrogram_7,Mel_Spectrogram_8,Mel_Spectrogram_9,...,Mel_Spectrogram_118,Mel_Spectrogram_119,Mel_Spectrogram_120,Mel_Spectrogram_121,Mel_Spectrogram_122,Mel_Spectrogram_123,Mel_Spectrogram_124,Mel_Spectrogram_125,Mel_Spectrogram_126,Mel_Spectrogram_127
0,0.006786,0.022543,0.021407,0.011658,0.015504,0.020965,0.015224,0.011332,0.022828,0.013995,...,4.7e-05,4e-05,3.1e-05,2.6e-05,2.179996e-05,2.011412e-05,2.071835e-05,1.987173e-05,8.922321e-06,4.079391e-07
1,0.04831,0.080445,0.236145,0.023955,0.03228,0.073725,0.018354,0.012419,0.012543,0.0063,...,1e-06,1e-06,1e-06,1e-06,9.875146e-07,9.252844e-07,9.92663e-07,8.609017e-07,3.431887e-07,1.745894e-08
2,0.032049,0.048137,0.197989,0.010388,0.027858,0.078355,0.014652,0.012361,0.011168,0.005235,...,7e-06,1e-05,6e-06,6e-06,4.137496e-06,6.134112e-06,6.415576e-06,5.692499e-06,2.158605e-06,8.88227e-08
3,0.010575,0.029999,0.030134,0.013963,0.014673,0.018446,0.011953,0.014399,0.020747,0.01338,...,4.5e-05,4.1e-05,3.4e-05,2.4e-05,2.145566e-05,2.014808e-05,1.818825e-05,1.782373e-05,8.351272e-06,4.463063e-07
4,0.001411,0.005008,0.005036,0.003626,0.002263,0.002829,0.00398,0.006516,0.016007,0.011028,...,4.8e-05,4.8e-05,3.8e-05,2.7e-05,2.278587e-05,2.367262e-05,2.226782e-05,2.002815e-05,8.942849e-06,4.63012e-07


## MFCC

* Mel-Spectrogram이라는 피쳐에 대해 행렬을 압축해서 표현해주는 DCT 연산을 수행

In [28]:
def get_mfcc_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 mfcc 추출
        mfcc = librosa.feature.mfcc(y=y,
                                    sr=sr,
                                    n_mfcc=128,
                                    #dct_type=2
                                    )

        y_feature = []
        for e in mfcc:

            # 추출된 MFCC들의 평균을 산술Feature로 사용
            e = np.mean(e)

            # 추출된 MFCC들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e, 0.1)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['MFCC_'+str(i) for i in range(len(features[0]))]
    
    mfcc_df = pd.DataFrame(features,
                           columns=columns)

    print(mfcc_df.shape)

    return mfcc_df

In [29]:
mfcc_train = get_mfcc_feature(df_train)
mfcc_test = get_mfcc_feature(df_test)

mfcc_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 128)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 128)


Unnamed: 0,MFCC_0,MFCC_1,MFCC_2,MFCC_3,MFCC_4,MFCC_5,MFCC_6,MFCC_7,MFCC_8,MFCC_9,...,MFCC_118,MFCC_119,MFCC_120,MFCC_121,MFCC_122,MFCC_123,MFCC_124,MFCC_125,MFCC_126,MFCC_127
0,-332.689484,96.704391,-14.929521,21.968111,-8.563829,-2.02196,-11.857611,3.893353,-5.748076,3.539912,...,0.53368,0.660617,0.524346,-0.307885,-0.814918,-0.123952,0.535305,0.113357,-0.800878,-0.867296
1,-438.377899,142.276978,-2.118732,30.589058,0.734739,15.532813,-2.802753,4.227826,-1.891904,3.577837,...,0.179785,-0.031554,0.05012,0.377868,0.766223,0.740194,0.287944,0.007076,0.350023,0.168382
2,-419.17099,123.297798,10.11094,21.655056,-1.095648,11.256332,-3.402523,1.567492,3.890199,3.804655,...,0.472421,0.330321,0.200077,0.07306,0.516295,0.852534,0.380594,-0.057465,-0.105068,-0.298017
3,-333.733124,97.450333,-13.966936,22.235878,-9.349174,-2.870443,-11.308705,6.399221,-2.479952,3.890206,...,0.084635,0.459112,-0.024202,0.227796,-0.581687,-0.259305,-0.126211,0.116488,-0.928069,-0.161903
4,-333.012543,90.00338,-21.694469,14.749146,-18.316071,-9.914346,-16.342524,2.575432,-6.690783,-0.875636,...,0.058081,0.142688,-0.039779,0.551953,-0.547507,-0.372035,-0.214538,0.094469,-0.619701,-0.231777


## Chroma

* 모든 음표는 12개의 pitch(음높이)와 octave로 구성되어 있다

* Chroma는 각 audio frame의 octave는 무시하고, 12개의 pitch class에 대한 분포를 나타낸 것이다

* Pitch class는 C, C#, D, D#, E, F, F#, G, G#, A, A#, B로 구성되어 있다

### Chroma stft

In [30]:
def get_chroma_stft_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 Chroma 추출
        chroma = librosa.feature.chroma_stft(y=y, sr=sr,
                                             n_chroma=12)

        y_feature = []
        for e in chroma:

            # 추출된 Chroma들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 Chroma들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e, 0.1)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['Chroma_stft_'+str(i) for i in range(len(features[0]))]

    chroma_df = pd.DataFrame(features,
                             columns=columns)
    
    print(chroma_df.shape)

    return chroma_df

In [31]:
chroma_stft_train = get_chroma_stft_feature(df_train)
chroma_stft_test = get_chroma_stft_feature(df_test)

chroma_stft_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 12)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 12)


Unnamed: 0,Chroma_stft_0,Chroma_stft_1,Chroma_stft_2,Chroma_stft_3,Chroma_stft_4,Chroma_stft_5,Chroma_stft_6,Chroma_stft_7,Chroma_stft_8,Chroma_stft_9,Chroma_stft_10,Chroma_stft_11
0,0.354076,0.449614,0.691313,0.737164,0.868736,0.368256,0.373235,0.550208,0.682635,0.442081,0.371972,0.448557
1,0.699832,0.883752,0.749316,0.479447,0.313469,0.294811,0.24159,0.255539,0.305991,0.353667,0.5045,0.578966
2,0.643748,0.743568,0.532853,0.347367,0.660792,0.713795,0.348799,0.212565,0.232006,0.249673,0.320929,0.44495
3,0.351917,0.345404,0.483628,0.583686,0.923054,0.483008,0.306648,0.415996,0.64316,0.518243,0.316253,0.422899
4,0.394154,0.36135,0.502068,0.585573,0.874034,0.493424,0.344644,0.505876,0.762814,0.576599,0.379295,0.471015


### Chroma cqt

* Constant-Q chromagram

In [32]:
def get_chroma_cpt_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 Chroma 추출
        chroma = librosa.feature.chroma_cqt(y=y, sr=sr,
                                             n_chroma=12)

        y_feature = []
        for e in chroma:

            # 추출된 Chroma들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 Chroma들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e)
            
            y_feature.append(e)

        features.append(y_feature)

    columns = ['Chroma_cqt_'+str(i) for i in range(len(features[0]))]

    chroma_df = pd.DataFrame(features,
                             columns=columns)
    
    print(chroma_df.shape)

    return chroma_df

In [33]:
chroma_cqt_train = get_chroma_cpt_feature(df_train)
chroma_cqt_test = get_chroma_cpt_feature(df_test)

chroma_cqt_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 12)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 12)


Unnamed: 0,Chroma_cqt_0,Chroma_cqt_1,Chroma_cqt_2,Chroma_cqt_3,Chroma_cqt_4,Chroma_cqt_5,Chroma_cqt_6,Chroma_cqt_7,Chroma_cqt_8,Chroma_cqt_9,Chroma_cqt_10,Chroma_cqt_11
0,0.757129,0.740473,0.833681,0.777558,0.864816,0.575127,0.668361,0.822592,0.771891,0.744747,0.624666,0.748008
1,0.839595,0.940688,0.406635,0.357162,0.386162,0.376999,0.326338,0.33402,0.364956,0.37238,0.448817,0.456221
2,0.78151,0.980583,0.380988,0.299062,0.374296,0.352752,0.313407,0.266047,0.305897,0.299812,0.339886,0.458151
3,0.734154,0.722609,0.857241,0.664273,0.864151,0.624331,0.61623,0.653386,0.71983,0.696828,0.555433,0.678292
4,0.638771,0.692994,0.747055,0.709295,0.892845,0.542818,0.717615,0.758963,0.851609,0.679199,0.548626,0.635112


### Chroma cens

* Computes the chroma variant “Chroma Energy Normalized” (CENS)

* To compute CENS features, following steps are taken after obtaining chroma vectors using chroma_cqt: 1.

  - L-1 normalization of each chroma vector

  - Quantization of amplitude based on “log-like” amplitude thresholds

  - (optional) Smoothing with sliding window. Default window length = 41 frames

  - (not implemented) Downsampling

* CENS features are robust to dynamics, timbre and articulation, thus these are commonly used in audio matching and retrieval applications.

In [34]:
def get_chroma_cens_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 Chroma 추출
        chroma = librosa.feature.chroma_cens(y=y, sr=sr,
                                             n_chroma=12)

        y_feature = []
        for e in chroma:

            # 추출된 Chroma들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 Chroma들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e)
            
            y_feature.append(e)

        features.append(y_feature)

    columns = ['Chroma_cens_'+str(i) for i in range(len(features[0]))]

    chroma_df = pd.DataFrame(features,
                             columns=columns)
    
    print(chroma_df.shape)

    return chroma_df

In [35]:
chroma_cens_train = get_chroma_cens_feature(df_train)
chroma_cens_test = get_chroma_cens_feature(df_test)

chroma_cens_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 12)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 12)


Unnamed: 0,Chroma_cens_0,Chroma_cens_1,Chroma_cens_2,Chroma_cens_3,Chroma_cens_4,Chroma_cens_5,Chroma_cens_6,Chroma_cens_7,Chroma_cens_8,Chroma_cens_9,Chroma_cens_10,Chroma_cens_11
0,0.282184,0.268971,0.335462,0.282312,0.349912,0.2416,0.255334,0.322682,0.284497,0.274509,0.247907,0.27694
1,0.476561,0.525678,0.241936,0.196037,0.21289,0.185767,0.146801,0.183199,0.1964,0.212965,0.255334,0.257958
2,0.472701,0.555245,0.247842,0.186213,0.215908,0.223785,0.197759,0.124727,0.179923,0.159576,0.211274,0.296072
3,0.291148,0.276122,0.383755,0.255967,0.370322,0.26251,0.25211,0.250821,0.283331,0.268485,0.237202,0.257384
4,0.25333,0.266444,0.291047,0.277979,0.396557,0.230716,0.282971,0.293179,0.367034,0.264801,0.231437,0.249517


## Tonnetz

* Computes the tonal centroid features (tonnetz)

* This representation uses the method of to project chroma features onto a 6-dimensional basis representing the perfect fifth, minor third, and major third each as two-dimensional coordinates.

* 1739년 오일러에 의해 처음 기술된 것으로 음악의 tonality와 tonal space의 관계를 graphical한 방식으로 표현한 것

In [37]:
def get_tonnetz_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)
        
        # librosa패키지를 사용하여 tonnetz 추출
        tonnetz = librosa.feature.tonnetz(y=y, sr=sr,
                                          n_chroma=12,
                                          )

        y_feature = []
        for e in tonnetz:

            # 추출된 tonnetz들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 tonnetz들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['Tonnetz_'+str(i) for i in range(len(features[0]))]
    
    tonnetz_df = pd.DataFrame(features,
                           columns=columns)

    print(tonnetz_df.shape)

    return tonnetz_df

In [38]:
tonnetz_train = get_tonnetz_feature(df_train)
tonnetz_test = get_tonnetz_feature(df_test)

tonnetz_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 6)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 6)


Unnamed: 0,Tonnetz_0,Tonnetz_1,Tonnetz_2,Tonnetz_3,Tonnetz_4,Tonnetz_5
0,0.040525,-0.008865,0.032459,0.030543,0.006169,-0.002654
1,-0.05028,-0.004938,-0.10322,0.07416,0.041979,0.004034
2,-0.044634,-0.048258,-0.123949,0.084217,0.042226,-0.004571
3,0.049264,-0.008371,-0.007788,0.035929,-0.004756,-0.008518
4,0.031129,-0.039075,0.022361,0.045907,0.006324,-0.005855


## Rhythm

* Compute the tempogram: local autocorrelation of the onset strength envelope.

In [39]:
def get_rhythm_feature(df):
    features = []
    for path in tqdm(df['SAMPLE_PATH']):
        # librosa패키지를 사용하여 wav 파일 load
        y, sr = librosa.load(path, sr=16000)

        # librosa패키지를 사용하여 onset 추출
        onset_envelope = librosa.onset.onset_strength(y=y, sr=sr)
        
        # librosa패키지를 사용하여 Rhythm 추출
        rhythm = librosa.feature.tempogram(y=y, sr=sr,
                                           onset_envelope=onset_envelope)

        y_feature = []
        for e in rhythm:

            # 추출된 tonnetz들의 산술평균을 Feature로 사용
            e = np.mean(e)

            # 추출된 tonnetz들의 절사평균을 Feature로 사용
            #e = stats.trim_mean(e)

            y_feature.append(e)

        features.append(y_feature)

    columns = ['Rhythm'+str(i) for i in range(len(features[0]))]
    
    rhythm_df = pd.DataFrame(features,
                           columns=columns)

    print(rhythm_df.shape)

    return rhythm_df

In [40]:
rhythm_train = get_rhythm_feature(df_train)
rhythm_test = get_rhythm_feature(df_test)

rhythm_train.head()

  0%|          | 0/1279 [00:00<?, ?it/s]

(1279, 384)


  0%|          | 0/1514 [00:00<?, ?it/s]

(1514, 384)


Unnamed: 0,Rhythm0,Rhythm1,Rhythm2,Rhythm3,Rhythm4,Rhythm5,Rhythm6,Rhythm7,Rhythm8,Rhythm9,...,Rhythm374,Rhythm375,Rhythm376,Rhythm377,Rhythm378,Rhythm379,Rhythm380,Rhythm381,Rhythm382,Rhythm383
0,1.0,0.964344,0.952677,0.959862,0.959969,0.958196,0.960642,0.956272,0.952788,0.954784,...,1.611968e-08,9.374177e-09,5.121984e-09,2.586015e-09,1.177305e-09,4.652507e-10,1.496642e-10,3.464429e-11,4.259505e-12,5.6484e-17
1,1.0,0.960866,0.946292,0.95063,0.952427,0.951049,0.951087,0.951133,0.948267,0.942403,...,1.373749e-08,7.990373e-09,4.366766e-09,2.205181e-09,1.004139e-09,3.968974e-10,1.277002e-10,2.956626e-11,3.63604e-12,7.541711e-17
2,1.0,0.963696,0.950603,0.955985,0.957491,0.959731,0.950146,0.953026,0.956636,0.946794,...,1.333814e-08,7.75741e-09,4.239041e-09,2.140458e-09,9.745611e-10,3.851606e-10,1.239062e-10,2.868253e-11,3.526617e-12,6.96763e-17
3,1.0,0.964074,0.951244,0.957835,0.961687,0.95448,0.952974,0.959107,0.955363,0.949023,...,1.449737e-08,8.430142e-09,4.605828e-09,2.325229e-09,1.058486e-09,4.182538e-10,1.345305e-10,3.113792e-11,3.828134e-12,8.448203000000001e-17
4,1.0,0.964556,0.949203,0.956888,0.962539,0.963658,0.955853,0.952292,0.95167,0.954568,...,1.530646e-08,8.898824e-09,4.860912e-09,2.453513e-09,1.116652e-09,4.411366e-10,1.418529e-10,3.282254e-11,4.033965e-12,6.389284000000001e-17
