# BirdNET Spectrogram Data and Logistic Regression

#### In this notebook, we load the preprocessed BirdNET spectrogram dataset and explore using Logistic Regression.

In [15]:
## Import necessary modules
import numpy as np
import os
import librosa
import keras
import pandas as pd
import time

import matplotlib.pyplot as plt

from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score

In [16]:
import classifiers

In [17]:
spectrograms = pd.read_csv('/home/birdsong/processed_data/spectrograms/spectrograms.csv').values
labels_tp = pd.read_csv('/home/birdsong/processed_data/spectrograms/labels.csv').values[:,1]
labels = np.concatenate((labels_tp
                         , np.full(7781, -1, dtype = int)))
probabilities = pd.read_csv('/home/birdsong/processed_data/spectrograms/probabilities.csv').values

In [18]:
print(spectrograms.shape)
print(labels.shape)
print(probabilities.shape)

(8997, 1409)
(8997,)
(1216, 25)


In [19]:
## Label the false-positive data with 0 and true-positive data with 1 in separate column in order to stratify
## since ~6/7 of the spectrograms are false-positives.
fp_labels = np.zeros(7781, dtype=int)
tp_labels = np.ones(1216, dtype=int)
tp_fp_labels = np.concatenate((tp_labels, fp_labels)).reshape(-1,1)
spectrograms_with_labels = np.concatenate((spectrograms, tp_fp_labels), axis=1)

In [20]:
## Only consider the true-positive data
spectrograms_tp = spectrograms[:1216]
print(spectrograms_tp.shape)

## Only consider the false-positive data
spectrograms_fp = spectrograms[1216:]
print(spectrograms_fp.shape)

(1216, 1409)
(7781, 1409)


In [21]:
cls_tp = classifiers.Classifier(input_X = spectrograms_tp,
                                input_y = labels_tp)

cls_tp.t_t_split()

In [22]:
cls_tp.log_reg(CV = False)
print("---------------------")

cls_tp.log_reg(CV = True)
print("---------------------")

cls_tp.svm(CV = False)
print("---------------------")
cls_tp.svm(CV = True)
print("---------------------")
cls_tp.knn(CV = True,
        max_n_neighbors = 30)
print("---------------------")

cls_tp.save_statistics('/home/birdsong/classifier_stats/cls_tp_stats')
classifiers.save_object_to_pickle(cls_tp, '/home/birdsong/classifier_stats/cls_tp')

cls_tp.log_reg_PCA(PCA_dims = [i for i in range(2, 50)])
print("---------------------")

cls_tp.save_statistics('/home/birdsong/classifier_stats/cls_tp_stats')
classifiers.save_object_to_pickle(cls_tp, '/home/birdsong/classifier_stats/cls_tp')

for dim in range(100, 901, 100):
    cls_tp.log_reg_PCA(CV = False, PCA_dim = dim)
    print("---------------------")
    cls_tp.log_reg_PCA(CV = False, PCA_dim = dim)
    print("---------------------")
    
cls_tp.save_statistics('/home/birdsong/classifier_stats/cls_tp_stats')
classifiers.save_object_to_pickle(cls_tp, '/home/birdsong/classifier_stats/cls_tp')

cls_tp.rand_forest()

The Logistic Regression test accuracy was 0.4371584699453552
The Logistic Regression train accuracy was 0.9835430784123911
---------------------
CV Split: 0
CV Split: 1
CV Split: 2
CV Split: 3
CV Split: 4
Elapsed Time: 255.0679657459259
The highest CV Logistic Regression test accuracy was 0.4444444444444444
The highest CV Logistic Regression train accuracy was 1.0
---------------------
The SVM test accuracy was 0.33879781420765026
The SVM train accuracy was 0.6282671829622459
---------------------
CV Split: 0
CV Split: 1
CV Split: 2
CV Split: 3
CV Split: 4
Elapsed Time: 4.225691080093384
The highest CV SVM test accuracy was 0.3333333333333333
The highest CV SVM train accuracy was 0.6372430471584039
---------------------
CV Split: 0
CV Split: 1
CV Split: 2
CV Split: 3
CV Split: 4
Elapsed Time: 39.046441078186035
The k with the highest AVG CV Accuracy was k = 1
The highest CV KNN test accuracy was 0.1761362037427888
The corresponding CV KNN train accuracy was 0.28339720861599
-----------

STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation

CV Split: 1


STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation

CV Split: 2


STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation

CV Split: 3


STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation

CV Split: 4


STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of f AND g EVALUATIONS EXCEEDS LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation

Elapsed Time: 1823.3248736858368
The highest AVG CV Test Accuracy corresponds to  number of components = 22
The highest CV Logistic Regression test accuracy was 0.3600112565076684
The corresponding CV Logistic Regression train accuracy was 0.5162130984830962
---------------------
Projecting to dimension =  100
The Logistic Regression test accuracy was 0.3825136612021858
The Logistic Regression train accuracy was 0.9332042594385286
---------------------
Projecting to dimension =  100
The Logistic Regression test accuracy was 0.3770491803278688
The Logistic Regression train accuracy was 0.9399806389157793
---------------------
Projecting to dimension =  200
The Logistic Regression test accuracy was 0.39344262295081966
The Logistic Regression train accuracy was 0.9816069699903195
---------------------
Projecting to dimension =  200
The Logistic Regression test accuracy was 0.39344262295081966
The Logistic Regression train accuracy was 0.9816069699903195
---------------------
Projecting to

In [23]:
cls_tp.save_statistics('/home/birdsong/classifier_stats/cls_tp_stats')
classifiers.save_object_to_pickle(cls_tp, '/home/birdsong/classifier_stats/cls_tp')

In [24]:
cls_tp_fp = classifiers.Classifier(input_X = spectrograms,
                                   input_y = labels)
cls_tp_fp.t_t_split()

In [25]:
cls_tp_fp.log_reg(CV = False)
print("---------------------")

cls_tp_fp.log_reg(CV = True)
print("---------------------")

cls_tp_fp.svm(CV = False)
print("---------------------")
cls_tp_fp.svm(CV = True)
print("---------------------")
cls_tp_fp.knn(CV = True,
        max_n_neighbors = 30)
print("---------------------")

cls_tp_fp.save_statistics('/home/birdsong/classifier_stats/cls_tp_fp_stats')
classifiers.save_object_to_pickle(cls_tp, '/home/birdsong/classifier_stats/cls_tp_fp')

for dim in range(100, 901, 100):
    cls_tp_fp.log_reg_PCA(CV = False, PCA_dim = dim)
    print("---------------------")
    cls_tp_fp.log_reg_PCA(CV = False, PCA_dim = dim)
    print("---------------------")

cls_tp_fp.save_statistics('/home/birdsong/classifier_stats/cls_tp_fp_stats')
classifiers.save_object_to_pickle(cls_tp, '/home/birdsong/classifier_stats/cls_tp_fp')
    
cls_tp_fp.log_reg_PCA(PCA_dims = [i for i in range(2, 50)])
print("---------------------")

cls_tp_fp.save_statistics('/home/birdsong/classifier_stats/cls_tp_fp_stats')
classifiers.save_object_to_pickle(cls_tp, '/home/birdsong/classifier_stats/cls_tp_fp')

cls_tp_fp.rand_forest()

The Logistic Regression test accuracy was 0.9014814814814814
The Logistic Regression train accuracy was 0.9976461357395057
---------------------
CV Split: 0
CV Split: 1
CV Split: 2
CV Split: 3
CV Split: 4
Elapsed Time: 71.42306923866272
The highest CV Logistic Regression test accuracy was 0.9064748201438849
The highest CV Logistic Regression train accuracy was 0.9988558352402745
---------------------
The SVM test accuracy was 0.8629629629629629
The SVM train accuracy was 0.8749836537204132
---------------------
CV Split: 0
CV Split: 1
CV Split: 2
CV Split: 3
CV Split: 4
Elapsed Time: 225.66924810409546
The highest CV SVM test accuracy was 0.8776978417266187
The highest CV SVM train accuracy was 0.8744687806472704
---------------------
CV Split: 0
CV Split: 1
CV Split: 2
CV Split: 3
CV Split: 4
Elapsed Time: 288.2228877544403
The k with the highest AVG CV Accuracy was k = 1
The highest CV KNN test accuracy was 0.8876679618871748
The corresponding CV KNN train accuracy was 0.899993426643

KeyboardInterrupt: 

In [None]:
cls_tp_fp.save_statistics('/home/birdsong/classifier_stats/cls_tp_fp_stats')
classifiers.save_object_to_pickle(cls_tp, '/home/birdsong/classifier_stats/cls_tp_fp')

In [None]:
cls_tp.rand_forest(n_trees = [1000])
print("-------------------")
cls_tp_fp.rand_forest(n_trees = [1000])

In [None]:
cls_tp.save_statistics('/home/birdsong/classifier_stats/cls_tp_stats')
classifiers.save_object_to_pickle(cls_tp, '/home/birdsong/classifier_stats/cls_tp')

cls_tp_fp.save_statistics('/home/birdsong/classifier_stats/cls_tp_fp_stats')
classifiers.save_object_to_pickle(cls_tp, '/home/birdsong/classifier_stats/cls_tp_fp')