# Classification in PySpark's MLlib Project Solution

### Genre classification
Now it's time to leverage what we learned in the lectures to a REAL classification project! Have you ever wondered what makes us, humans, able to tell apart two songs of different genres? How we do we inherenly know the difference between a pop song and heavy metal? This type of classifcation may seem easy for us, but it's a very difficult challenge for a computer to do. So the question is, could an automatic genre classifcation model be possible? 

For this project we will be classifying songs based on a number of characteristics into a set of 23 electronic genres. This technology could be used by an application like Pandora to recommend songs to users or just create meaningful channels. Super fun!

### Dataset
*beatsdataset.csv*
Each row is an electronic music song. The dataset contains 100 song for each genre among 23 electronic music genres, they were the top (100) songs of their genres on November 2016. The 71 columns are audio features extracted of a two random minutes sample of the file audio. These features have been extracted using pyAudioAnalysis (https://github.com/tyiannak/pyAudioAnalysis).

### Your task
Create an algorithm that classifies songs into the 23 genres provided. Test out several different models and select the highest performing one. Also play around with feature selection methods and finally try to make a recommendation to a user.  

For the feature selection aspect of this project, you may need to get a bit creative if you want to select features from a non-tree algorithm. I did not go over this aspect of PySpark intentionally in the previous lectures to give you chance to get used to researching the PySpark documentation page. Here is the link to the Feature Selectors section of the documentation that just might come in handy: https://spark.apache.org/docs/latest/ml-features.html#feature-selectors

Good luck! Have fun :)

### Source
https://www.kaggle.com/caparrini/beatsdataset

In [42]:
import pyspark
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('ClassificationProject').getOrCreate()



In [43]:
from pyspark.ml.feature import VectorAssembler
from pyspark.sql.types import * 
from pyspark.sql.functions import *
from pyspark.ml.feature import StringIndexer
from pyspark.ml.feature import MinMaxScaler

In [44]:
path ="Datasets/"
df = spark.read.csv(path+'beatsdataset.csv',inferSchema=True,header=True)

In [45]:
df.limit(5).toPandas()

Unnamed: 0,_c0,1-ZCRm,2-Energym,3-EnergyEntropym,4-SpectralCentroidm,5-SpectralSpreadm,6-SpectralEntropym,7-SpectralFluxm,8-SpectralRolloffm,9-MFCCs1m,...,63-ChromaVector8std,64-ChromaVector9std,65-ChromaVector10std,66-ChromaVector11std,67-ChromaVector12std,68-ChromaDeviationstd,69-BPM,70-BPMconf,71-BPMessentia,class
0,0,0.13644,0.088861,3.201201,0.262825,0.249212,1.114423,0.007003,0.256682,-22.723259,...,0.003431,0.004981,0.010818,0.024001,0.005201,0.015056,133.333333,0.132792,128.0,BigRoom
1,1,0.117039,0.108389,3.194001,0.247657,0.250288,1.065668,0.005387,0.199821,-21.775871,...,0.004461,0.006441,0.007469,0.015499,0.005589,0.019339,120.0,0.112767,126.0,BigRoom
2,2,0.085308,0.128525,3.123837,0.217205,0.228652,0.789647,0.008247,0.156822,-22.472722,...,0.001529,0.004556,0.007723,0.017482,0.002901,0.022201,133.333333,0.123373,129.0,BigRoom
3,3,0.10305,0.167042,3.15083,0.233593,0.245032,0.967082,0.006571,0.168083,-21.470751,...,0.001591,0.003514,0.009477,0.023162,0.004165,0.015379,133.333333,0.158876,129.0,BigRoom
4,4,0.15173,0.148405,3.194498,0.29373,0.267231,1.353005,0.003872,0.292055,-21.371157,...,0.003945,0.004131,0.01133,0.028188,0.002639,0.019079,133.333333,0.190708,129.0,BigRoom


In [46]:
df.printSchema()

root
 |-- _c0: integer (nullable = true)
 |-- 1-ZCRm: double (nullable = true)
 |-- 2-Energym: double (nullable = true)
 |-- 3-EnergyEntropym: double (nullable = true)
 |-- 4-SpectralCentroidm: double (nullable = true)
 |-- 5-SpectralSpreadm: double (nullable = true)
 |-- 6-SpectralEntropym: double (nullable = true)
 |-- 7-SpectralFluxm: double (nullable = true)
 |-- 8-SpectralRolloffm: double (nullable = true)
 |-- 9-MFCCs1m: double (nullable = true)
 |-- 10-MFCCs2m: double (nullable = true)
 |-- 11-MFCCs3m: double (nullable = true)
 |-- 12-MFCCs4m: double (nullable = true)
 |-- 13-MFCCs5m: double (nullable = true)
 |-- 14-MFCCs6m: double (nullable = true)
 |-- 15-MFCCs7m: double (nullable = true)
 |-- 16-MFCCs8m: double (nullable = true)
 |-- 17-MFCCs9m: double (nullable = true)
 |-- 18-MFCCs10m: double (nullable = true)
 |-- 19-MFCCs11m: double (nullable = true)
 |-- 20-MFCCs12m: double (nullable = true)
 |-- 21-MFCCs13m: double (nullable = true)
 |-- 22-ChromaVector1m: double (null

In [47]:
df.groupBy('class').count().show(100)

+--------------------+-----+
|               class|count|
+--------------------+-----+
|           PsyTrance|  100|
|           HardDance|  100|
|              Breaks|  100|
|  HardcoreHardTechno|  100|
|   IndieDanceNuDisco|  100|
|              Trance|  100|
|           DeepHouse|  100|
|ElectronicaDowntempo|  100|
|           ReggaeDub|  100|
|             Minimal|  100|
|         DrumAndBass|  100|
|             Dubstep|  100|
|             BigRoom|  100|
|              Techno|  100|
|               House|  100|
|         FutureHouse|  100|
|        ElectroHouse|  100|
|           GlitchHop|  100|
|           TechHouse|  100|
|              HipHop|  100|
|           FunkRAndB|  100|
|               Dance|  100|
|    ProgressiveHouse|  100|
+--------------------+-----+



## Format Data

In [48]:
df.select([count(when(col(c).isNull(), c)).alias(c) for c in df.columns]).show()

+---+------+---------+----------------+-------------------+-----------------+------------------+---------------+------------------+---------+----------+----------+----------+----------+----------+----------+----------+----------+-----------+-----------+-----------+-----------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+------------------+------------------+------------------+-------------------+---------+------------+-------------------+----------------------+--------------------+---------------------+------------------+---------------------+------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+-------------+-------------+-------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+-

In [49]:
input_columns = df.columns # Collect the column names as a list
input_columns = input_columns[1:-1] # keep only relevant columns: from column 1 to 

dependent_var = 'class'

In [50]:
renamed = df.withColumn('label_str', df[dependent_var].cast(StringType()))
indexer = StringIndexer(inputCol='label_str', outputCol='label')
indexed = indexer.fit(renamed).transform(renamed)

In [51]:
numeric_inputs = []
string_inputs = []
for column in input_columns:
    if str(indexed.schema[column].dataType) == 'StringType':
        indexer = StringIndexer(inputCol = column, outputCol=column+'_num')
        indexed = indexer.fit(indexed).transform(indexed)
        new_col_name = column+'_num'
        string_inputs.append(new_col_name)
    else:
        numeric_inputs.append(column)
indexed.printSchema()

root
 |-- _c0: integer (nullable = true)
 |-- 1-ZCRm: double (nullable = true)
 |-- 2-Energym: double (nullable = true)
 |-- 3-EnergyEntropym: double (nullable = true)
 |-- 4-SpectralCentroidm: double (nullable = true)
 |-- 5-SpectralSpreadm: double (nullable = true)
 |-- 6-SpectralEntropym: double (nullable = true)
 |-- 7-SpectralFluxm: double (nullable = true)
 |-- 8-SpectralRolloffm: double (nullable = true)
 |-- 9-MFCCs1m: double (nullable = true)
 |-- 10-MFCCs2m: double (nullable = true)
 |-- 11-MFCCs3m: double (nullable = true)
 |-- 12-MFCCs4m: double (nullable = true)
 |-- 13-MFCCs5m: double (nullable = true)
 |-- 14-MFCCs6m: double (nullable = true)
 |-- 15-MFCCs7m: double (nullable = true)
 |-- 16-MFCCs8m: double (nullable = true)
 |-- 17-MFCCs9m: double (nullable = true)
 |-- 18-MFCCs10m: double (nullable = true)
 |-- 19-MFCCs11m: double (nullable = true)
 |-- 20-MFCCs12m: double (nullable = true)
 |-- 21-MFCCs13m: double (nullable = true)
 |-- 22-ChromaVector1m: double (null

#### Skewness

In [52]:
d= {}
for col in numeric_inputs:
    d[col]=indexed.approxQuantile(col,[0.01, 0.99], 0.25)

for col in numeric_inputs:
    skew = indexed.agg(skewness(indexed[col])).collect()
    skew=skew[0][0]
    if skew > 1:
        indexed = indexed.withColumn(col, \
                                    log(when(df[col] < d[col][0], d[col][0])\
                                       .when(indexed[col]> d[col][1], d[col][1])\
                                       .otherwise(indexed[col])+1).alias(col))
        print(col, " has been treated for pos skew", skew)
    elif skew < -1:
        indexed = indexed.withColumn(col, \
                                    exp(when(df[col] < d[col][0], d[col][0])\
                                       .when(indexed[col]> d[col][1], d[col][1])\
                                       .otherwise(indexed[col])+1).alias(col))
        print(col, " has been treated for neg skew", skew)

7-SpectralFluxm  has been treated for pos skew 1.6396138160129063
22-ChromaVector1m  has been treated for pos skew 2.4162415204309258
23-ChromaVector2m  has been treated for pos skew 4.154796693680583
24-ChromaVector3m  has been treated for pos skew 1.1974019617504328
25-ChromaVector4m  has been treated for pos skew 2.446635863594906
26-ChromaVector5m  has been treated for pos skew 2.154482876187508
27-ChromaVector6m  has been treated for pos skew 2.01234064472543
28-ChromaVector7m  has been treated for pos skew 1.1829228989215521
29-ChromaVector8m  has been treated for pos skew 3.7372643733999955
30-ChromaVector9m  has been treated for pos skew 2.4117416421548645
31-ChromaVector10m  has been treated for pos skew 2.1979538518563233
32-ChromaVector11m  has been treated for pos skew 2.1924295373960554
33-ChromaVector12m  has been treated for pos skew 2.278981912155668
41-SpectralFluxstd  has been treated for pos skew 1.8577721462401056
56-ChromaVector1std  has been treated for pos skew 1

In [53]:
minimums = df.select([min(c).alias(c) for c in df.columns if c in numeric_inputs])
min_array = minimums.select(array(numeric_inputs).alias('mins'))
df_minimum = min_array.select(array_min(min_array.mins)).collect()
df_minimum = df_minimum[0][0]
df_minimum

-30.3789543716

In [54]:
features_list = numeric_inputs + string_inputs
assembler = VectorAssembler(inputCols=features_list, outputCol='features')
output = assembler.transform(indexed).select('features','label')


In [55]:
output.show()

+--------------------+-----+
|            features|label|
+--------------------+-----+
|[0.136439587512,0...|  0.0|
|[0.117038518483,0...|  0.0|
|[0.0853077737447,...|  0.0|
|[0.103049917216,0...|  0.0|
|[0.151729948738,0...|  0.0|
|[0.127046737192,0...|  0.0|
|[0.123395302003,0...|  0.0|
|[0.140027382431,0...|  0.0|
|[0.117635200751,0...|  0.0|
|[0.137400181488,0...|  0.0|
|[0.148838734199,0...|  0.0|
|[0.119846749928,0...|  0.0|
|[0.0786813648231,...|  0.0|
|[0.138335144235,0...|  0.0|
|[0.101304207661,0...|  0.0|
|[0.132862180406,0...|  0.0|
|[0.1533035231,0.1...|  0.0|
|[0.118001599962,0...|  0.0|
|[0.110992493712,0...|  0.0|
|[0.124299777916,0...|  0.0|
+--------------------+-----+
only showing top 20 rows



In [56]:
scaler = MinMaxScaler(inputCol='features', outputCol='scaledFeatures', min = 0, max=1000)
scalerModel = scaler.fit(output)
scaled_data= scalerModel.transform(output)

In [57]:
scaled_data.show()

+--------------------+-----+--------------------+
|            features|label|      scaledFeatures|
+--------------------+-----+--------------------+
|[0.136439587512,0...|  0.0|[519.818266700239...|
|[0.117038518483,0...|  0.0|[435.295463992565...|
|[0.0853077737447,...|  0.0|[297.057129121742...|
|[0.103049917216,0...|  0.0|[374.352647734368...|
|[0.151729948738,0...|  0.0|[586.432337466259...|
|[0.127046737192,0...|  0.0|[478.897323979332...|
|[0.123395302003,0...|  0.0|[462.989461602348...|
|[0.140027382431,0...|  0.0|[535.448873531282...|
|[0.117635200751,0...|  0.0|[437.894973202184...|
|[0.137400181488,0...|  0.0|[524.003195633498...|
|[0.148838734199,0...|  0.0|[573.836456502165...|
|[0.119846749928,0...|  0.0|[447.529820358620...|
|[0.0786813648231,...|  0.0|[268.188480022892...|
|[0.138335144235,0...|  0.0|[528.076459415282...|
|[0.101304207661,0...|  0.0|[366.747280005321...|
|[0.132862180406,0...|  0.0|[504.232915471732...|
|[0.1533035231,0.1...|  0.0|[593.287780078539...|


In [58]:
final_data=scaled_data.select('label', 'scaledFeatures')
final_data = final_data.withColumnRenamed('scaledFeatures', 'features')

In [59]:
train, test= final_data.randomSplit([0.7, 0.3])

In [60]:
train.count()

1578

In [61]:
test.count()

722

### Modeling

In [62]:
from pyspark.ml.classification import *
from pyspark.ml.evaluation import *
from pyspark.sql.functions import *
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder

In [63]:
Bin_evaluator = BinaryClassificationEvaluator(rawPredictionCol='prediction')
MC_evaluator = MulticlassClassificationEvaluator(metricName='accuracy')

In [64]:
def ClassTrainEval(classifier,features,classes, folds,train,test):

    def FindMtype(classifier):
        # Intstantiate Model
        M = classifier
        # Learn what it is
        Mtype = type(M).__name__
        
        return Mtype
    
    Mtype = FindMtype(classifier)
    

    def IntanceFitModel(Mtype,classifier,classes,features,train):
        
        if Mtype == "OneVsRest":
            # instantiate the base classifier.
            lr = LogisticRegression()
            # instantiate the One Vs Rest Classifier.
            OVRclassifier = OneVsRest(classifier=lr)
#             fitModel = OVRclassifier.fit(train)
            # Add parameters of your choice here:
            paramGrid = ParamGridBuilder() \
                .addGrid(lr.regParam, [0.1, 0.01]) \
                .build()
            #Cross Validator requires the following parameters:
            crossval = CrossValidator(estimator=OVRclassifier,
                                      estimatorParamMaps=paramGrid,
                                      evaluator=MulticlassClassificationEvaluator(),
                                      numFolds=folds) # 3 is best practice
            # Run cross-validation, and choose the best set of parameters.
            fitModel = crossval.fit(train)
            return fitModel
        if Mtype == "MultilayerPerceptronClassifier":
            # specify layers for the neural network:
            # input layer of size features, two intermediate of features+1 and same size as features
            # and output of size number of classes
            # Note: crossvalidator cannot be used here
            features_count = len(features[0][0])
            layers = [features_count, features_count+1, features_count, classes]
            MPC_classifier = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=1234)
            fitModel = MPC_classifier.fit(train)
            return fitModel
        if Mtype in("LinearSVC","GBTClassifier") and classes != 2: # These classifiers currently only accept binary classification
            print(Mtype," could not be used because PySpark currently only accepts binary classification data for this algorithm")
            return
        if Mtype in("LogisticRegression","NaiveBayes","RandomForestClassifier","GBTClassifier","LinearSVC","DecisionTreeClassifier"):
  
            # Add parameters of your choice here:
            if Mtype in("LogisticRegression"):
                paramGrid = (ParamGridBuilder() \
#                              .addGrid(classifier.regParam, [0.1, 0.01]) \
                             .addGrid(classifier.maxIter, [10, 15,20])
                             .build())
                
            # Add parameters of your choice here:
            if Mtype in("NaiveBayes"):
                paramGrid = (ParamGridBuilder() \
                             .addGrid(classifier.smoothing, [0.0, 0.2, 0.4, 0.6]) \
                             .build())
                
            # Add parameters of your choice here:
            if Mtype in("RandomForestClassifier"):
                paramGrid = (ParamGridBuilder() \
                               .addGrid(classifier.maxDepth, [2, 5, 10])
#                                .addGrid(classifier.maxBins, [5, 10, 20])
#                                .addGrid(classifier.numTrees, [5, 20, 50])
                             .build())
                
            # Add parameters of your choice here:
            if Mtype in("GBTClassifier"):
                paramGrid = (ParamGridBuilder() \
#                              .addGrid(classifier.maxDepth, [2, 5, 10, 20, 30]) \
#                              .addGrid(classifier.maxBins, [10, 20, 40, 80, 100]) \
                             .addGrid(classifier.maxIter, [10, 15,50,100])
                             .build())
                
            # Add parameters of your choice here:
            if Mtype in("LinearSVC"):
                paramGrid = (ParamGridBuilder() \
                             .addGrid(classifier.maxIter, [10, 15]) \
                             .addGrid(classifier.regParam, [0.1, 0.01]) \
                             .build())
            
            # Add parameters of your choice here:
            if Mtype in("DecisionTreeClassifier"):
                paramGrid = (ParamGridBuilder() \
#                              .addGrid(classifier.maxDepth, [2, 5, 10, 20, 30]) \
                             .addGrid(classifier.maxBins, [10, 20, 40, 80, 100]) \
                             .build())
            
            #Cross Validator requires all of the following parameters:
            crossval = CrossValidator(estimator=classifier,
                                      estimatorParamMaps=paramGrid,
                                      evaluator=MulticlassClassificationEvaluator(),
                                      numFolds=2) # 3 + is best practice
            # Fit Model: Run cross-validation, and choose the best set of parameters.
            fitModel = crossval.fit(train)
            return fitModel
    
    fitModel = IntanceFitModel(Mtype,classifier,classes,features,train)
    
    # Print feature selection metrics
    if fitModel is not None:
        
        if Mtype in("OneVsRest"):
            # Get Best Model
            BestModel = fitModel.bestModel
            print(" ")
            print('\033[1m' + Mtype + '\033[0m')
            # Extract list of binary models
            models = BestModel.models
            for model in models:
                print('\033[1m' + 'Intercept: '+ '\033[0m',model.intercept,'\033[1m' + '\nCoefficients:'+ '\033[0m',model.coefficients)
                coeff_array = model.coefficients.toArray()
                coeff_scores = []
                for x in coeff_array:
                    coeff_scores.append(float(x))
                result = spark.createDataFrame(zip(input_columns,coeff_scores), schema = ['feature','coeff'])
                print(result.orderBy(result['coeff'].desc()).show(truncate=False))
        if Mtype == "MultilayerPerceptronClassifier":
            print("")
            print('\033[1m' + Mtype," Weights"+ '\033[0m')
            print('\033[1m' + "Model Weights: "+ '\033[0m',fitModel.weights.size)
            print("")

        if Mtype in("DecisionTreeClassifier", "GBTClassifier","RandomForestClassifier"):
            # FEATURE IMPORTANCES
            # Estimate of the importance of each feature.
            # Each feature’s importance is the average of its importance across all trees 
            # in the ensemble The importance vector is normalized to sum to 1. 
            # Get Best Model
            BestModel = fitModel.bestModel
            print(" ")
            print('\033[1m' + Mtype," Feature Importances"+ '\033[0m')
            print("(Scores add up to 1)")
            print("Lowest score is the least important")
            print(" ")
            featureImportances = BestModel.featureImportances.toArray()
            print(featureImportances)
            
            if Mtype in("DecisionTreeClassifier"):
                global DT_featureImportances
                DT_featureImportances = BestModel.featureImportances.toArray()
                global DT_BestModel
                DT_BestModel = BestModel
            if Mtype in("GBTClassifier"):
                global GBT_featureImportances
                GBT_featureImportances = BestModel.featureImportances.toArray()
                global GBT_BestModel
                GBT_BestModel = BestModel
            if Mtype in("RandomForestClassifier"):
                global RF_featureImportances
                RF_featureImportances = BestModel.featureImportances.toArray()
                global RF_BestModel
                RF_BestModel = BestModel

        if Mtype in("LogisticRegression"):
            # Get Best Model
            BestModel = fitModel.bestModel
            print(" ")
            print('\033[1m' + Mtype," Coefficient Matrix"+ '\033[0m')
            print("You should compares these relative to eachother")
            print("Coefficients: \n" + str(BestModel.coefficientMatrix))
            print("Intercept: " + str(BestModel.interceptVector))
            global LR_coefficients
            LR_coefficients = BestModel.coefficientMatrix.toArray()
            global LR_BestModel
            LR_BestModel = BestModel

        if Mtype in("LinearSVC"):
            # Get Best Model
            BestModel = fitModel.bestModel
            print(" ")
            print('\033[1m' + Mtype," Coefficients"+ '\033[0m')
            print("You should compares these relative to eachother")
            print("Coefficients: \n" + str(BestModel.coefficients))
            global LSVC_coefficients
            LSVC_coefficients = BestModel.coefficients.toArray()
            global LSVC_BestModel
            LSVC_BestModel = BestModel
        
   
    # Set the column names to match the external results dataframe that we will join with later:
    columns = ['Classifier', 'Result']
    
    if Mtype in("LinearSVC","GBTClassifier") and classes != 2:
        Mtype = [Mtype] # make this a list
        score = ["N/A"]
        result = spark.createDataFrame(zip(Mtype,score), schema=columns)
    else:
        predictions = fitModel.transform(test)
        MC_evaluator = MulticlassClassificationEvaluator(metricName="accuracy") # redictionCol="prediction",
        accuracy = (MC_evaluator.evaluate(predictions))*100
        Mtype = [Mtype] # make this a string
        score = [str(accuracy)] #make this a string and convert to a list
        result = spark.createDataFrame(zip(Mtype,score), schema=columns)
        result = result.withColumn('Result',result.Result.substr(0, 5))
        
    return result

### Logit (Chi-Square Numfeatures=30)

In [65]:
classifier= LogisticRegression()

paramGrid = (ParamGridBuilder().addGrid(classifier.maxIter, [10, 15,20]).build())

crossval = CrossValidator(estimator=classifier,
                         estimatorParamMaps=paramGrid,
                         evaluator=MC_evaluator,
                         numFolds=2)

fitModel = crossval.fit(train)
BestModel = fitModel.bestModel

print('Intercept: ' +str(BestModel.interceptVector))
print('Coefficients: \n' + str(BestModel.coefficientMatrix))

LR_BestModel = BestModel

predictions=fitModel.transform(test)
accuracy = (MC_evaluator.evaluate(predictions))*100

print(accuracy)


Intercept: [0.006327749088245603,0.020273095974877958,0.08247165437409971,0.08622416442355908,0.12582556446955476,-0.02092723663260653,0.006835293954409371,-0.1519403123463118,-0.01810374499302218,0.034740099427522673,-0.06730159788586712,0.010501738595491077,-0.035865041966963086,-0.0696017955056091,-0.00804941095215304,-0.02194806513207668,-0.006426817940390233,-0.09374627556060107,0.03786926177207716,0.029478834094131937,-0.07403515754194856,0.12053866662936238,0.006859333654217729]
Coefficients: 
DenseMatrix([[ 5.32977819e-04,  1.92339292e-03,  1.79232188e-03, ...,
              -2.40750282e-03, -9.89613679e-04,  1.93692382e-03],
             [-3.59532315e-04,  4.57092338e-05,  5.04748462e-04, ...,
               4.56277915e-03, -3.85152745e-03,  2.14977566e-03],
             [-2.99364967e-04,  5.13046881e-04,  1.08536264e-03, ...,
              -2.20879472e-03,  3.92030817e-04, -9.65459694e-04],
             ...,
             [ 2.99706591e-04, -8.65067073e-04, -6.70341283e-04, ...

In [66]:

coeff_array = BestModel.coefficientMatrix.toArray()
coeff_scores = []
for x in coeff_array[0]:
    coeff_scores.append(float(x))

result = spark.createDataFrame(zip(input_columns,coeff_scores), schema=['feature','coeff'])
result.show(100)

+--------------------+--------------------+
|             feature|               coeff|
+--------------------+--------------------+
|              1-ZCRm| 5.32977818938724E-4|
|           2-Energym|0.001923392915402...|
|    3-EnergyEntropym|0.001792321881340...|
| 4-SpectralCentroidm|8.048828244733436E-4|
|   5-SpectralSpreadm|0.001248253380924...|
|  6-SpectralEntropym|9.813579013497574E-4|
|     7-SpectralFluxm|-0.00199969199956...|
|  8-SpectralRolloffm|3.617704284047664...|
|           9-MFCCs1m|0.003554236755111...|
|          10-MFCCs2m|-0.00175323636687...|
|          11-MFCCs3m|-2.34699846001469...|
|          12-MFCCs4m|4.491271670002251...|
|          13-MFCCs5m|3.561334868151826...|
|          14-MFCCs6m|-1.64095021625326...|
|          15-MFCCs7m|-1.95132391256584...|
|          16-MFCCs8m|-3.36765825269175...|
|          17-MFCCs9m|-5.22181906404892E-4|
|         18-MFCCs10m|9.859545719092485E-4|
|         19-MFCCs11m|-1.66984746252996...|
|         20-MFCCs12m|-4.5961665

### One Vs Rest Classifier fwe

In [67]:
lr = LogisticRegression()
classifier = OneVsRest(classifier = lr)

paramGrid = ParamGridBuilder()\
    .addGrid(lr.regParam,[0.1,0.01]) \
    .build()

crossval=CrossValidator(estimator=classifier,
                       estimatorParamMaps=paramGrid,
                       evaluator=MulticlassClassificationEvaluator(),
                       numFolds=2)

fitModel=crossval.fit(train)

BestModel = fitModel.bestModel

models= BestModel.models
for model in models:
    print('\033[1m' + 'Intercept: '+ '\033[0m',model.intercept,'\033[1m' + '\nCoefficients:'+ '\033[0m',model.coefficients)

predictions=fitModel.transform(test)
accuracy=(MC_evaluator.evaluate(predictions))*100
print(accuracy)

[1mIntercept: [0m -5.327607821747951 [1m
Coefficients:[0m [-0.0001247469296608693,0.0010563147071388703,0.0016299668375744092,-0.00010697809951257754,0.001516056297451319,0.0004961902249819562,-0.0011904106180791114,-0.0005837362237384427,0.00343075931848717,-0.0011577457049325968,0.0008890539481378145,0.001437723271617106,0.0007785885950998867,-0.0009049575471640344,-0.0003810973707986484,-0.00035702901643329925,-0.0011060958574965177,0.0014083799065055845,-0.00032151704695149315,0.0002809155840371197,-0.0009324376622848985,0.001760021745283577,-0.0021234708913158615,0.0014969293147074868,-0.00028731966451902923,0.0005061172071978576,-0.0004123308598637224,-0.0024883974102714993,-0.0003038139469712204,-9.189626328586254e-05,-0.00012651095272910352,-0.00029507297698849305,-0.0003592977292747732,-0.0014474156432776083,-0.0020243104431017784,0.0008394868755785959,5.02712771162173e-05,-0.0023918972802158497,-0.0018315213146606545,2.863108186198744e-05,0.00011788693588364681,-0.0004750

[1mIntercept: [0m 6.356285011986955 [1m
Coefficients:[0m [0.0016750584230403249,0.003450131034641761,-0.0016126989996392215,-3.58068799602509e-05,-0.001076650050760353,-0.00019256575620239688,0.00231696558047221,0.00023592696515099362,-0.003382907943845887,-0.0016742660403461496,-0.00010326996506951217,-2.51163433931245e-05,0.0028015459406174332,-0.0005490271537591776,0.0016741383649346289,-0.0012419369457037372,0.00033134561014287826,-0.0012920057006437096,0.0001596496029678244,-0.00022125464788047599,0.0015110693131820065,-0.0015274927762545464,-0.0005360375891956395,0.00035033599365471625,-0.001893748810391613,-0.0032043287880247455,-0.0011387642913875387,0.0006428356760920399,0.0009175139224814714,0.0010969561326689941,-0.0008910811156966763,-0.0025181063031135486,-0.00028298587354630374,-0.0013953417890367238,0.00010375706838407185,-0.0023095659710905855,-0.0008718648457851036,0.0013369487010505444,0.00012769796629383734,-0.0013789339087752516,-0.0028737971857860363,-0.0014041

[1mIntercept: [0m -2.1332264584206064 [1m
Coefficients:[0m [0.0007464282488642449,-0.0033040237731031107,0.001996136175051764,0.001143007614223008,0.002222433468568251,0.00020558890206725969,0.0011330194823688997,0.0004189883297367194,0.0012871799674672084,-0.0008644087041356764,-0.0010926272455598104,0.000903067346937901,-0.0010126048105342088,-0.0009578340318615835,-0.0004471244133879923,0.001558641985216794,0.00048047004850114487,-0.0007746914019089484,0.001139269242306947,0.0011753365769881098,0.0015153684763631852,0.0004347121858176374,0.0006172772332530342,-0.001194751560032499,0.0015241714845209988,-0.00012441277873991745,0.00019085995446587592,-6.378209525718598e-05,0.00028386770611870386,0.0009948724934100619,0.0016265895111172246,-0.0005328739338815845,0.0010076683980567974,-0.0009813755752253297,-0.0010222237238687874,-0.003827189388972431,-0.002500667694925711,-0.002478498669708403,-0.0021780209183043025,-0.0009188561835560869,0.0017327491312230101,-0.000849146321205932

### Multilayer Perceptron Classifier fwe

In [68]:
features = train.select(['features']).collect()
features_count = len(features[0][0])
class_count=final_data.select(countDistinct('label')).collect()
classes = class_count[0][0]

layers = [features_count, features_count+1, features_count, classes]

classifier = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed= 1234)

fitModel = classifier.fit(train)

print('\033[1m' + "Model Weights: "+ '\033[0m',fitModel.weights.size)


predictions = fitModel.transform(test)

accuracy = (MC_evaluator.evaluate(predictions))*100

print("Accuracy: ",accuracy)

[1mModel Weights: [0m 12023
Accuracy:  18.144044321329638


### NaiveBayes

In [69]:
classifier= NaiveBayes()
paramGrid = (ParamGridBuilder()\
             .addGrid(classifier.smoothing, [0.0,0.2,0.4,0.6])\
             .build())
crossval = CrossValidator(estimator=classifier,
                         estimatorParamMaps = paramGrid,
                         evaluator= MulticlassClassificationEvaluator(),
                         numFolds=2)

fitModel = crossval.fit(train)

predictions = fitModel.transform(test)
accuracy = (MC_evaluator.evaluate(predictions))*100
print("Accuracy: ",accuracy)

Accuracy:  40.16620498614959


### Decision Tree

In [70]:
classifier = DecisionTreeClassifier()
paramGrid = (ParamGridBuilder()\
            .addGrid(classifier.maxDepth, [2, 5, 10, 20, 30])\
            .addGrid(classifier.maxBins, [10,20,40,80, 100])\
            .build())

crossval = CrossValidator(estimator = classifier,
                         estimatorParamMaps=paramGrid,
                         evaluator=MulticlassClassificationEvaluator(),
                         numFolds=2)

fitModel = crossval.fit(train)

BestModel = fitModel.bestModel
featureImportances= BestModel.featureImportances.toArray()

print("Feature Importances: ", featureImportances)

predictions= fitModel.transform(test)
accuracy = (MC_evaluator.evaluate(predictions))*100
print("Accuracy: ", accuracy)

Feature Importances:  [0.077028   0.02745731 0.01815883 0.01406153 0.00479245 0.
 0.06583188 0.         0.05407617 0.02367465 0.01408755 0.00951782
 0.00173532 0.00341321 0.01002467 0.00459738 0.00545495 0.00886397
 0.00694405 0.00772908 0.00359039 0.00274237 0.01250895 0.0023725
 0.00594858 0.00224597 0.         0.00174741 0.         0.
 0.00685076 0.0093459  0.00183422 0.00969638 0.01520677 0.00366043
 0.0145113  0.01680881 0.02119703 0.         0.00856446 0.00891435
 0.03023958 0.03540894 0.00551505 0.01508603 0.         0.00946655
 0.00325686 0.01214712 0.00091628 0.02958179 0.01093284 0.01104048
 0.01535916 0.         0.00348096 0.00267805 0.0050712  0.00611655
 0.00793518 0.0059349  0.         0.0037584  0.00266624 0.00676977
 0.00794323 0.00617212 0.         0.03242106 0.22490626]
Accuracy:  37.257617728531855


In [71]:
imp_scores = []
for x in featureImportances:
    imp_scores.append(int(x))

result = spark.createDataFrame(zip(input_columns,imp_scores), schema=['feature','score'])
print(result.orderBy(result["score"].desc()).show(truncate=False))

+---------------------+-----+
|feature              |score|
+---------------------+-----+
|52-MFCCs10std        |0    |
|53-MFCCs11std        |0    |
|54-MFCCs12std        |0    |
|55-MFCCs13std        |0    |
|56-ChromaVector1std  |0    |
|57-ChromaVector2std  |0    |
|58-ChromaVector3std  |0    |
|59-ChromaVector4std  |0    |
|60-ChromaVector5std  |0    |
|61-ChromaVector6std  |0    |
|62-ChromaVector7std  |0    |
|63-ChromaVector8std  |0    |
|64-ChromaVector9std  |0    |
|65-ChromaVector10std |0    |
|66-ChromaVector11std |0    |
|67-ChromaVector12std |0    |
|68-ChromaDeviationstd|0    |
|69-BPM               |0    |
|70-BPMconf           |0    |
|71-BPMessentia       |0    |
+---------------------+-----+
only showing top 20 rows

None


### Random Forest Classifer

In [72]:
classifier = RandomForestClassifier()
paramGrid = (ParamGridBuilder()\
            .addGrid(classifier.maxDepth, [2, 5, 10])
            .addGrid(classifier.maxBins, [5, 10 , 20])
            .addGrid(classifier.numTrees,[5, 20, 50])
            .build())

crossval = CrossValidator(estimator=classifier,
                         estimatorParamMaps=paramGrid,
                         evaluator=MulticlassClassificationEvaluator(),
                         numFolds=2)

fitModel = crossval.fit(train)

BestModel= fitModel.bestModel
featureImportances= BestModel.featureImportances.toArray()
print("Feature Importances: ", featureImportances)

predictions = fitModel.transform(test)

accuracy = (MC_evaluator.evaluate(predictions))*100

print(" ")
print("Accuracy: ", accuracy)

Feature Importances:  [0.02506208 0.02707725 0.02104294 0.01333907 0.01117727 0.0088462
 0.01551201 0.00714973 0.01783216 0.01718051 0.01278898 0.00884493
 0.00863997 0.00898261 0.00963838 0.00882911 0.00900272 0.00925924
 0.00920059 0.00981966 0.00934604 0.00991391 0.00928938 0.01510976
 0.00843184 0.00993823 0.00910101 0.00984084 0.00769268 0.0078826
 0.00858376 0.01045547 0.00958591 0.01167134 0.01162439 0.01115673
 0.01363387 0.01918384 0.02262271 0.01023284 0.01510046 0.01071707
 0.02171162 0.01881061 0.0114218  0.0139186  0.01279879 0.01794861
 0.01280647 0.0152012  0.01341866 0.01788879 0.01470956 0.01839441
 0.0174341  0.01013387 0.00853691 0.01045082 0.00991492 0.0102664
 0.00841088 0.01160771 0.00964672 0.00913043 0.00906054 0.01066449
 0.00825124 0.00825027 0.04579168 0.02488019 0.08819962]
 
Accuracy:  47.091412742382275


In [73]:
imp_scores = []
for x in featureImportances:
    imp_scores.append(float(x))
    
# Then zip with input_columns list and create a df
result = spark.createDataFrame(zip(input_columns,imp_scores), schema=['feature','score'])
print(result.orderBy(result["score"].desc()).show(truncate=False))

+----------------------+--------------------+
|feature               |score               |
+----------------------+--------------------+
|71-BPMessentia        |0.08819961581085413 |
|69-BPM                |0.045791676007317324|
|2-Energym             |0.027077253042407767|
|1-ZCRm                |0.025062083469539136|
|70-BPMconf            |0.02488018894380021 |
|39-SpectralSpreadstd  |0.022622713253370164|
|43-MFCCs1std          |0.02171161824923858 |
|3-EnergyEntropym      |0.021042940235161456|
|38-SpectralCentroidstd|0.019183844439266712|
|44-MFCCs2std          |0.018810613105668367|
|54-MFCCs12std         |0.018394406355297273|
|48-MFCCs6std          |0.01794860551589802 |
|52-MFCCs10std         |0.01788878922113427 |
|9-MFCCs1m             |0.01783215645769427 |
|55-MFCCs13std         |0.01743409787396502 |
|10-MFCCs2m            |0.01718051395309192 |
|7-SpectralFluxm       |0.015512013814870638|
|50-MFCCs8std          |0.015201196884346594|
|24-ChromaVector3m     |0.01510976

### Feature Selection for RFC and Logit

In [74]:
from pyspark.ml.feature import ChiSqSelector
from pyspark.ml.linalg import Vectors

In [75]:
### code from LaylaAi's udemy course
classifiers=[OneVsRest(), LogisticRegression()]
maximum= len(input_columns)

for n in range(10, maximum, 10):
    print("Testing top n= ", n, " features")
    selector=ChiSqSelector(numTopFeatures=n, featuresCol="features", outputCol="selectedFeatures", labelCol= "label")
    bestFeaturesDf= selector.fit(final_data).transform(final_data)
    bestFeaturesDf= bestFeaturesDf.select('label', 'selectedFeatures')
    bestFeaturesDf= bestFeaturesDf.withColumnRenamed('selectedFeatures', 'features')
    features= bestFeaturesDf.select(['features']).collect()
    
    train,test= bestFeaturesDf.randomSplit([0.7, 0.3])
    
    folds= 2
    
    columns = ['Classifier', 'Result']
    vals = [('Place Holder', 'N/A')]
    results = spark.createDataFrame(vals, columns)
    
    for classifier in classifiers:
        new_result = ClassTrainEval(classifier, features, classes, folds, train, test)
        results = results.union(new_result)
        
    results = results.where('Classifier!="Place Holder"')
    results.show(100, False)
    

Testing top n=  10  features
 
[1mOneVsRest[0m
[1mIntercept: [0m -2.0366462111133212 [1m
Coefficients:[0m [-0.0005752594713940964,0.00015924626506736675,-0.006098812114142482,-0.0007171696078283604,0.00031668256889342975,0.0003065116918083959,-0.0006572134733437084,0.0016194540932313766,-0.002160900549895043,0.0018958738082606656]
+-------------------+---------------------+
|feature            |coeff                |
+-------------------+---------------------+
|10-MFCCs2m         |0.0018958738082606656|
|8-SpectralRolloffm |0.0016194540932313766|
|5-SpectralSpreadm  |3.1668256889342975E-4|
|6-SpectralEntropym |3.065116918083959E-4 |
|2-Energym          |1.5924626506736675E-4|
|1-ZCRm             |-5.752594713940964E-4|
|7-SpectralFluxm    |-6.572134733437084E-4|
|4-SpectralCentroidm|-7.171696078283604E-4|
|9-MFCCs1m          |-0.002160900549895043|
|3-EnergyEntropym   |-0.006098812114142482|
+-------------------+---------------------+

None
[1mIntercept: [0m -7.696526702526525 

+-------------------+----------------------+
|feature            |coeff                 |
+-------------------+----------------------+
|6-SpectralEntropym |0.0018469078575391415 |
|7-SpectralFluxm    |0.0015848773534941154 |
|10-MFCCs2m         |0.0015036295874660225 |
|4-SpectralCentroidm|6.689644567145351E-4  |
|8-SpectralRolloffm |6.369260097688406E-4  |
|5-SpectralSpreadm  |1.0333856765245224E-4 |
|2-Energym          |-0.0011327512495431314|
|1-ZCRm             |-0.002496670971027062 |
|3-EnergyEntropym   |-0.0028112635120365955|
|9-MFCCs1m          |-0.003471089729144676 |
+-------------------+----------------------+

None
[1mIntercept: [0m -4.364303455367623 [1m
Coefficients:[0m [0.002106843876834983,-0.0009413279069609462,0.0010472562588161082,0.001105462126698673,0.0022577325138016216,-0.0005642568877830937,-0.0003752404816965884,-0.0009909706749680568,0.002620664731191733,-0.00393421361637113]
+-------------------+---------------------+
|feature            |coeff          

+-------------------+----------------------+
|feature            |coeff                 |
+-------------------+----------------------+
|10-MFCCs2m         |0.00945573467687867   |
|3-EnergyEntropym   |0.002708089324040792  |
|9-MFCCs1m          |0.002048432797760987  |
|2-Energym          |0.0013348631720791807 |
|7-SpectralFluxm    |-1.0933977644056485E-5|
|6-SpectralEntropym |-2.8517706421354056E-5|
|4-SpectralCentroidm|-0.0016558004213730413|
|8-SpectralRolloffm |-0.002627303354901193 |
|5-SpectralSpreadm  |-0.004123132554873408 |
|1-ZCRm             |-0.004879326427989962 |
+-------------------+----------------------+

None
[1mIntercept: [0m -3.8660508043921684 [1m
Coefficients:[0m [-0.0031684606993591822,8.729347942811146e-05,0.0006757130057539147,0.005101858074925588,0.0031003485356038184,-0.0015738816933057033,0.0010403134638884634,-0.001968130688719553,0.0008932706108497055,-0.003180422902403353]
+-------------------+----------------------+
|feature            |coeff       

+------------------+------+
|Classifier        |Result|
+------------------+------+
|OneVsRest         |25.07 |
|LogisticRegression|23.52 |
+------------------+------+

Testing top n=  20  features
 
[1mOneVsRest[0m
[1mIntercept: [0m -2.1008225230345663 [1m
Coefficients:[0m [0.00286267216669078,0.0004926217703195879,-0.004954817370643486,7.630801129232283e-05,-0.0007111647274742342,-0.0008244146966446949,0.0006729042855091452,0.00087740185778574,-0.0002580826042051339,-0.0021525201729227766,-0.0027554479340011874,-0.0009153243878235367,-0.0005515557775277354,-0.0019488774500770864,0.0013853388617495008,-0.00011001545781036007,-0.0004023335414033334,0.0011744889263547039,-0.002930917929185269,0.002372773350737075]
+-------------------+----------------------+
|feature            |coeff                 |
+-------------------+----------------------+
|1-ZCRm             |0.00286267216669078   |
|20-MFCCs12m        |0.002372773350737075  |
|15-MFCCs7m         |0.0013853388617495008 |
|

+-------------------+----------------------+
|feature            |coeff                 |
+-------------------+----------------------+
|19-MFCCs11m        |0.0047717784673351514 |
|1-ZCRm             |0.0026704667171506234 |
|13-MFCCs5m         |0.0024086328094465475 |
|12-MFCCs4m         |0.0017833073787380084 |
|2-Energym          |0.0017057744542218597 |
|15-MFCCs7m         |0.0013805981610914679 |
|6-SpectralEntropym |0.0010613751454456601 |
|14-MFCCs6m         |8.280809874153225E-4  |
|7-SpectralFluxm    |7.270487086941188E-4  |
|5-SpectralSpreadm  |3.0865780295822524E-4 |
|20-MFCCs12m        |-5.350846174535463E-5 |
|8-SpectralRolloffm |-7.873031612597696E-4 |
|10-MFCCs2m         |-0.0010971531835923033|
|4-SpectralCentroidm|-0.0018244147762976942|
|9-MFCCs1m          |-0.001967298399720317 |
|11-MFCCs3m         |-0.0021849558681757905|
|18-MFCCs10m        |-0.002280379763475768 |
|16-MFCCs8m         |-0.0026723042551308057|
|17-MFCCs9m         |-0.0027730627711736277|
|3-EnergyE

+-------------------+----------------------+
|feature            |coeff                 |
+-------------------+----------------------+
|9-MFCCs1m          |0.0037307100164069716 |
|1-ZCRm             |0.0033961183215566243 |
|14-MFCCs6m         |0.00203306563842808   |
|2-Energym          |0.0014656047775375449 |
|5-SpectralSpreadm  |0.001320519432577488  |
|20-MFCCs12m        |0.0011470170930193869 |
|19-MFCCs11m        |9.705673440627657E-4  |
|6-SpectralEntropym |-5.233451393106982E-7 |
|18-MFCCs10m        |-4.931793888098816E-5 |
|15-MFCCs7m         |-8.438943778439616E-5 |
|8-SpectralRolloffm |-3.7093811821052293E-4|
|17-MFCCs9m         |-7.251643387094838E-4 |
|3-EnergyEntropym   |-8.299109708394802E-4 |
|13-MFCCs5m         |-9.390410518098272E-4 |
|10-MFCCs2m         |-0.0019496314744236912|
|7-SpectralFluxm    |-0.0027865126085654304|
|16-MFCCs8m         |-0.0028346391396022867|
|4-SpectralCentroidm|-0.00339444556087742  |
|11-MFCCs3m         |-0.004405615783980581 |
|12-MFCCs4

+-------------------+----------------------+
|feature            |coeff                 |
+-------------------+----------------------+
|20-MFCCs12m        |0.009377629754641147  |
|3-EnergyEntropym   |0.0063507151115192615 |
|19-MFCCs11m        |0.0028452130831782985 |
|16-MFCCs8m         |0.001506491990111941  |
|9-MFCCs1m          |0.001357417925473597  |
|8-SpectralRolloffm |5.583983821978561E-4  |
|7-SpectralFluxm    |4.2706901807027724E-4 |
|10-MFCCs2m         |3.2051868967084696E-4 |
|6-SpectralEntropym |2.9431470269891094E-4 |
|5-SpectralSpreadm  |5.586892365886732E-6  |
|1-ZCRm             |-2.0032203101245502E-5|
|17-MFCCs9m         |-3.7279582551058213E-4|
|18-MFCCs10m        |-5.565409010457123E-4 |
|13-MFCCs5m         |-6.216327633449585E-4 |
|11-MFCCs3m         |-9.028046905047454E-4 |
|2-Energym          |-0.001266473739593807 |
|14-MFCCs6m         |-0.0014876873649051485|
|12-MFCCs4m         |-0.002369561606213452 |
|15-MFCCs7m         |-0.0047270246626332   |
|4-Spectra

+------------------+------+
|Classifier        |Result|
+------------------+------+
|OneVsRest         |36.84 |
|LogisticRegression|35.51 |
+------------------+------+

Testing top n=  30  features
 
[1mOneVsRest[0m
[1mIntercept: [0m -1.924032197145241 [1m
Coefficients:[0m [0.002517175978528328,-0.00037807548666490815,-0.0035053069671841996,-0.0012753544719304366,0.0005079358806115005,0.0015589824300466428,-0.00014724749780893102,-0.00031528108265526674,-0.0002673501603246808,0.0013925644046746165,-0.0019888291313168703,0.0009425033023330217,-0.0005505293579902523,0.0012845325298368121,8.281909812467854e-05,-0.0015116995893914521,-0.0018147156278705535,-0.0025063650911367197,-0.0009557411625433652,-0.0030797829365719355,-0.0002863562963964347,-0.0006549180838458839,0.002605906355929248,0.0004229823501507851,0.0005726985528116581,0.0001809130890151963,-0.000719128396813578,0.000608685137673554,-0.0031575175583259544,0.0016656767480532312]
+------------------+----------------------

+-------------------+----------------------+
|feature            |coeff                 |
+-------------------+----------------------+
|1-ZCRm             |0.0043897932530666955 |
|19-MFCCs11m        |0.0025530292504365713 |
|5-SpectralSpreadm  |0.002305672299564812  |
|4-SpectralCentroidm|0.0021515422269254483 |
|10-MFCCs2m         |0.0021024941719164454 |
|29-ChromaVector8m  |0.0018255390455092387 |
|20-MFCCs12m        |0.0018006031295034666 |
|11-MFCCs3m         |0.0016085676529183784 |
|25-ChromaVector4m  |0.0014012445891652433 |
|22-ChromaVector1m  |9.238864417751937E-4  |
|16-MFCCs8m         |8.323271210823475E-4  |
|17-MFCCs9m         |2.0700356950793084E-4 |
|6-SpectralEntropym |1.7846843721094005E-4 |
|15-MFCCs7m         |2.6392959749077904E-5 |
|28-ChromaVector7m  |-1.0989933815625728E-4|
|2-Energym          |-1.161390096604314E-4 |
|24-ChromaVector3m  |-1.4985443946927048E-4|
|9-MFCCs1m          |-7.035167042805372E-4 |
|26-ChromaVector5m  |-8.290856724987755E-4 |
|21-MFCCs1

+-------------------+----------------------+
|feature            |coeff                 |
+-------------------+----------------------+
|1-ZCRm             |0.002773425595008094  |
|20-MFCCs12m        |0.0020014325581377665 |
|30-ChromaVector9m  |0.0018842627451600585 |
|19-MFCCs11m        |0.0016482815739029587 |
|24-ChromaVector3m  |0.0014783956240008971 |
|23-ChromaVector2m  |0.0014328759322604523 |
|26-ChromaVector5m  |0.0014123998572373223 |
|27-ChromaVector6m  |0.0011994428999526022 |
|18-MFCCs10m        |7.840243819221333E-4  |
|25-ChromaVector4m  |7.782757446767892E-4  |
|10-MFCCs2m         |6.726432572978974E-4  |
|28-ChromaVector7m  |2.983334801896148E-4  |
|2-Energym          |2.2042642475880172E-4 |
|8-SpectralRolloffm |9.103779795143623E-5  |
|14-MFCCs6m         |3.6666330349039266E-5 |
|13-MFCCs5m         |-2.378936516912463E-4 |
|12-MFCCs4m         |-4.772251090607962E-4 |
|21-MFCCs13m        |-8.484923736784485E-4 |
|11-MFCCs3m         |-0.0011751257523725362|
|4-Spectra

+-------------------+----------------------+
|feature            |coeff                 |
+-------------------+----------------------+
|5-SpectralSpreadm  |0.0030822080921069544 |
|24-ChromaVector3m  |0.002349813569941817  |
|2-Energym          |0.0017976641020564666 |
|30-ChromaVector9m  |0.0017790288928057406 |
|20-MFCCs12m        |0.0017724811130956331 |
|19-MFCCs11m        |0.001492825130356284  |
|23-ChromaVector2m  |0.0010497776235379997 |
|10-MFCCs2m         |0.0010486628603899983 |
|11-MFCCs3m         |7.253195980378145E-4  |
|25-ChromaVector4m  |6.649135157448335E-4  |
|28-ChromaVector7m  |5.330152717170127E-4  |
|27-ChromaVector6m  |4.581613849707973E-4  |
|18-MFCCs10m        |4.320358252440539E-4  |
|7-SpectralFluxm    |1.7557820145133626E-4 |
|4-SpectralCentroidm|1.3235092325665574E-4 |
|21-MFCCs13m        |9.312554529702155E-5  |
|6-SpectralEntropym |8.031385804150624E-5  |
|9-MFCCs1m          |-1.25171057433404E-4  |
|26-ChromaVector5m  |-2.5211455567651436E-4|
|1-ZCRm   

+------------------+----------------------+
|feature           |coeff                 |
+------------------+----------------------+
|21-MFCCs13m       |0.004860925429074414  |
|13-MFCCs5m        |0.003328521532535647  |
|18-MFCCs10m       |0.003192743296086016  |
|23-ChromaVector2m |0.0030837203469632987 |
|15-MFCCs7m        |0.002955251829889862  |
|7-SpectralFluxm   |0.0026232871523810194 |
|16-MFCCs8m        |0.0025619978571331495 |
|22-ChromaVector1m |0.0022215569588991345 |
|29-ChromaVector8m |0.001496096463964717  |
|12-MFCCs4m        |0.0013763399964756276 |
|10-MFCCs2m        |0.001081670086565484  |
|17-MFCCs9m        |0.0010146027026375851 |
|25-ChromaVector4m |9.673521854229653E-4  |
|8-SpectralRolloffm|9.403486940962194E-4  |
|26-ChromaVector5m |8.619599115126066E-4  |
|28-ChromaVector7m |6.746315520552448E-4  |
|27-ChromaVector6m |4.663719285095777E-4  |
|6-SpectralEntropym|-2.3049741778658652E-4|
|11-MFCCs3m        |-5.762789063593409E-4 |
|9-MFCCs1m         |-6.581024264

+------------------+------+
|Classifier        |Result|
+------------------+------+
|OneVsRest         |37.06 |
|LogisticRegression|34.48 |
+------------------+------+

Testing top n=  40  features
 
[1mOneVsRest[0m
[1mIntercept: [0m -2.2562214497522755 [1m
Coefficients:[0m [-0.0009274231754282655,0.002789017772166363,0.0018660212026823486,0.0004801928741052842,-0.0030998925338130605,-0.00185747494117412,0.0007471992132751277,0.0003654888051812797,0.000649833152100381,-0.000593372292294594,-0.000545652897800325,0.001472515574817544,-0.00093726405401186,-0.0016946941543825355,0.0012799741146054691,-0.0007693530245834713,-0.0002708073904519862,0.0006057056474142223,-0.0018401439651018461,0.0012542866320235729,0.0002435286919647108,-0.001778506868442974,-0.0005548216699992437,-0.00277785533312121,-0.0015496072857675055,0.0021542709774703137,-0.002963864978156455,-0.0005415523029688809,-0.00043138639520712795,-0.0016380952491339726,0.0013182696288319645,0.001352765714748459,0.0006811

+----------------------+----------------------+
|feature               |coeff                 |
+----------------------+----------------------+
|2-Energym             |0.004939886862373297  |
|26-ChromaVector5m     |0.003182433103372555  |
|25-ChromaVector4m     |0.0029324562671883864 |
|13-MFCCs5m            |0.002433542343304065  |
|6-SpectralEntropym    |0.002248093230620973  |
|7-SpectralFluxm       |0.0022167909796952955 |
|39-SpectralSpreadstd  |0.00210581516471157   |
|16-MFCCs8m            |0.0015851392832844714 |
|22-ChromaVector1m     |0.0011793392905491324 |
|1-ZCRm                |0.0011557732338565712 |
|35-ZCRstd             |0.0010781208152626078 |
|11-MFCCs3m            |6.82191362475763E-4   |
|30-ChromaVector9m     |6.632652144092314E-4  |
|27-ChromaVector6m     |5.977368623574103E-4  |
|17-MFCCs9m            |5.79753993440897E-4   |
|14-MFCCs6m            |4.827232691615228E-4  |
|38-SpectralCentroidstd|3.81876158939347E-4   |
|36-Energystd          |-5.1564964425805

+----------------------+---------------------+
|feature               |coeff                |
+----------------------+---------------------+
|21-MFCCs13m           |0.002820887263494494 |
|37-EnergyEntropystd   |0.002365577613307476 |
|17-MFCCs9m            |0.002331626007943838 |
|31-ChromaVector10m    |0.0022293053794738013|
|30-ChromaVector9m     |0.002169450309167539 |
|38-SpectralCentroidstd|0.0020009161155860196|
|33-ChromaVector12m    |0.0019101469196434072|
|7-SpectralFluxm       |0.0016796183227855434|
|8-SpectralRolloffm    |0.0015999542251270125|
|25-ChromaVector4m     |0.001534519651456029 |
|34-ChromaDeviationm   |0.0014792053068444473|
|15-MFCCs7m            |0.0012777383678409734|
|12-MFCCs4m            |0.0012604758194232795|
|24-ChromaVector3m     |0.0010935059443954306|
|11-MFCCs3m            |0.0010692226703075856|
|6-SpectralEntropym    |9.792936355385196E-4 |
|28-ChromaVector7m     |7.509932881311106E-4 |
|23-ChromaVector2m     |7.080400718950668E-4 |
|35-ZCRstd   

+----------------------+---------------------+
|feature               |coeff                |
+----------------------+---------------------+
|39-SpectralSpreadstd  |0.003932689196982731 |
|18-MFCCs10m           |0.00333751420182502  |
|27-ChromaVector6m     |0.00314716409680132  |
|33-ChromaVector12m    |0.0029495700894091263|
|30-ChromaVector9m     |0.002592848462942845 |
|22-ChromaVector1m     |0.0022914337979152983|
|2-Energym             |0.0017987008130465716|
|19-MFCCs11m           |0.0017737902073206339|
|3-EnergyEntropym      |0.0016437093125167576|
|32-ChromaVector11m    |0.0011212786222626664|
|28-ChromaVector7m     |9.476607798308863E-4 |
|35-ZCRstd             |6.209664220403514E-4 |
|17-MFCCs9m            |5.143941713008789E-4 |
|4-SpectralCentroidm   |4.7310305527453034E-4|
|20-MFCCs12m           |4.083621826678891E-4 |
|38-SpectralCentroidstd|2.2053702749964734E-4|
|9-MFCCs1m             |1.762324523755532E-4 |
|31-ChromaVector10m    |1.2550889488194874E-4|
|24-ChromaVec

+---------------------+----------------------+
|feature              |coeff                 |
+---------------------+----------------------+
|40-SpectralEntropystd|0.007390392857434433  |
|5-SpectralSpreadm    |0.004819741469687826  |
|39-SpectralSpreadstd |0.0036961588870171327 |
|19-MFCCs11m          |0.003622107409728093  |
|37-EnergyEntropystd  |0.003421610808552074  |
|13-MFCCs5m           |0.0017654265044512159 |
|18-MFCCs10m          |0.00128855712394262   |
|11-MFCCs3m           |0.0012398198082108538 |
|21-MFCCs13m          |0.0011135891391014184 |
|34-ChromaDeviationm  |6.15444361568049E-4   |
|29-ChromaVector8m    |3.9349664601979334E-4 |
|26-ChromaVector5m    |2.8413625182055897E-4 |
|2-Energym            |3.3857840471531157E-6 |
|30-ChromaVector9m    |-1.9645341597874694E-5|
|17-MFCCs9m           |-6.283250193217028E-5 |
|36-Energystd         |-7.574506015239332E-5 |
|4-SpectralCentroidm  |-1.925390823131061E-4 |
|23-ChromaVector2m    |-3.006157831401342E-4 |
|10-MFCCs2m  

+---------------------+----------------------+
|feature              |coeff                 |
+---------------------+----------------------+
|40-SpectralEntropystd|0.0024038210016886254 |
|3-EnergyEntropym     |0.0021487122982431477 |
|37-EnergyEntropystd  |0.002000480404060808  |
|16-MFCCs8m           |0.001673743063598706  |
|14-MFCCs6m           |0.0015247152471011816 |
|10-MFCCs2m           |0.0014753122230766673 |
|12-MFCCs4m           |0.0014133721149080323 |
|21-MFCCs13m          |0.001328989355904975  |
|20-MFCCs12m          |0.0012129643493611394 |
|18-MFCCs10m          |9.117059455872213E-4  |
|13-MFCCs5m           |8.956210406596458E-4  |
|15-MFCCs7m           |8.567364092339191E-4  |
|1-ZCRm               |8.535410278396186E-4  |
|17-MFCCs9m           |7.378500138635622E-4  |
|5-SpectralSpreadm    |7.153103491221146E-4  |
|4-SpectralCentroidm  |5.725435908412531E-4  |
|36-Energystd         |3.293606422658954E-4  |
|34-ChromaDeviationm  |3.1787499199623516E-4 |
|33-ChromaVec

+------------------+------+
|Classifier        |Result|
+------------------+------+
|OneVsRest         |41.43 |
|LogisticRegression|37.62 |
+------------------+------+

Testing top n=  50  features
 
[1mOneVsRest[0m
[1mIntercept: [0m -0.6517729811935141 [1m
Coefficients:[0m [-0.00025592426558317604,0.002299151596031069,0.0012628734329632892,0.00021739479523206082,-0.002780616966324896,-0.0006098179327209569,-7.892185502088389e-05,0.0006602172523825218,0.0008095627406122862,-0.00090920802853373,0.0009298340821089593,-0.0001820915361696027,-0.0009208782853387182,0.00046460821263145496,-0.0002025426665417119,-0.001528361434610846,0.0010774713691462027,-0.0037181519658483807,-0.0004908919128651975,0.0007284530050607916,-0.0016070837439548645,-0.0022498535455055335,0.0013815176277208853,-0.0009965804846215768,-0.00045444332632654273,-0.001678137488666469,-0.0017364314670007001,0.0003841029205702367,-0.000197058141780164,-0.0020425636660032723,-0.0013627681306363018,-1.4555276808302632

+---------------------+---------------------+
|feature              |coeff                |
+---------------------+---------------------+
|22-ChromaVector1m    |0.003740964224585474 |
|6-SpectralEntropym   |0.003278446479080597 |
|30-ChromaVector9m    |0.00242597479494617  |
|28-ChromaVector7m    |0.0023652250516026834|
|32-ChromaVector11m   |0.0014741131743617527|
|50-MFCCs8std         |0.0014433068409009252|
|26-ChromaVector5m    |0.0014293114341790859|
|19-MFCCs11m          |0.0014230142861160911|
|44-MFCCs2std         |0.0013338774865719197|
|17-MFCCs9m           |0.0013318231272475983|
|7-SpectralFluxm      |0.0012550450233560085|
|43-MFCCs1std         |9.229871241734239E-4 |
|12-MFCCs4m           |8.073853252406858E-4 |
|20-MFCCs12m          |7.889545955961047E-4 |
|27-ChromaVector6m    |7.405215570731151E-4 |
|39-SpectralSpreadstd |5.945584046841002E-4 |
|37-EnergyEntropystd  |5.675087523117269E-4 |
|40-SpectralEntropystd|4.6022920202303685E-4|
|10-MFCCs2m           |3.812080616

[1mIntercept: [0m -9.416949922308481 [1m
Coefficients:[0m [0.0004354003757894583,-0.0039583863032910315,-0.0020129709211397116,-5.7711268656914644e-05,-0.002474068859776732,0.0013180466649980527,0.0026786250899213755,0.001487886789838939,0.0013475280998868383,0.00043569920884756056,-0.001383265168697828,0.0007679793942473733,0.0015485673162806699,0.0017567263512477131,-0.00036870251024435154,0.00012551368622606027,0.0009534956736873512,0.0007133248745181945,0.0011363311017936958,-0.0014942871117531256,-0.0004543201358056702,0.0006881935890136723,-0.0025032220385023236,0.0018653682292143032,0.0017600175433383403,-0.00532476589543474,-0.00013548159681048302,0.0011320483425215242,4.06327802620041e-05,-0.001014727270453454,0.00036503227657492285,0.002048581401560435,-0.0024366748920776255,0.0008336639785948841,-0.0005396960547608308,0.0010226426424453747,-0.0013627281687147743,0.0015920944389649476,0.002506516202436939,0.0005335467211086842,0.0021945073176211074,0.0012112944732514857,0

+----------------------+---------------------+
|feature               |coeff                |
+----------------------+---------------------+
|35-ZCRstd             |0.003660123787708856 |
|20-MFCCs12m           |0.00352608565460104  |
|41-SpectralFluxstd    |0.00318761379323032  |
|49-MFCCs7std          |0.003172348636044463 |
|22-ChromaVector1m     |0.0031160930703054493|
|3-EnergyEntropym      |0.0023977379259759518|
|28-ChromaVector7m     |0.0023461266723203075|
|26-ChromaVector5m     |0.0021144475957272806|
|40-SpectralEntropystd |0.0015889873666707407|
|38-SpectralCentroidstd|0.001460763626007456 |
|31-ChromaVector10m    |0.0010113513890199176|
|7-SpectralFluxm       |7.962569998400938E-4 |
|37-EnergyEntropystd   |7.305559264522355E-4 |
|36-Energystd          |6.587081931203105E-4 |
|10-MFCCs2m            |4.666468546922877E-4 |
|44-MFCCs2std          |4.261745302192251E-4 |
|11-MFCCs3m            |4.1756885871645927E-4|
|2-Energym             |3.3727262225611936E-4|
|19-MFCCs11m 

+---------------------+---------------------+
|feature              |coeff                |
+---------------------+---------------------+
|6-SpectralEntropym   |0.002945805791154816 |
|30-ChromaVector9m    |0.0025696202347820288|
|28-ChromaVector7m    |0.002360482351996938 |
|16-MFCCs8m           |0.002204500542152425 |
|25-ChromaVector4m    |0.001974378598960497 |
|8-SpectralRolloffm   |0.0016263061724429608|
|19-MFCCs11m          |0.0014449846722661647|
|33-ChromaVector12m   |0.0012743311477784484|
|31-ChromaVector10m   |0.0012213093310186485|
|24-ChromaVector3m    |0.0012145296784825072|
|13-MFCCs5m           |0.001022168374451378 |
|46-MFCCs4std         |0.0010137014030156097|
|40-SpectralEntropystd|9.81616107096124E-4  |
|20-MFCCs12m          |9.802174830957026E-4 |
|50-MFCCs8std         |9.770703927201219E-4 |
|39-SpectralSpreadstd |8.695184253409255E-4 |
|44-MFCCs2std         |8.365973236911786E-4 |
|43-MFCCs1std         |7.6770215834278E-4   |
|21-MFCCs13m          |7.401605954

+----------------------+---------------------+
|feature               |coeff                |
+----------------------+---------------------+
|35-ZCRstd             |0.002160753425184763 |
|17-MFCCs9m            |0.002015312584808261 |
|8-SpectralRolloffm    |0.0019986632731626448|
|33-ChromaVector12m    |0.001821944335069351 |
|28-ChromaVector7m     |0.0017474405544106342|
|16-MFCCs8m            |0.0016040908338711888|
|50-MFCCs8std          |0.0011864646421777736|
|10-MFCCs2m            |0.0011602482928320286|
|25-ChromaVector4m     |0.001119226633163608 |
|30-ChromaVector9m     |0.0010126804394660744|
|21-MFCCs13m           |9.994797009050711E-4 |
|18-MFCCs10m           |9.480109978926758E-4 |
|46-MFCCs4std          |9.216729254380266E-4 |
|42-SpectralRolloffstd |8.141701765163606E-4 |
|23-ChromaVector2m     |7.647796166743551E-4 |
|6-SpectralEntropym    |6.963799802937525E-4 |
|38-SpectralCentroidstd|6.484314814262847E-4 |
|11-MFCCs3m            |6.271130835760626E-4 |
|5-SpectralSp

[1mIntercept: [0m -6.02089695781934 [1m
Coefficients:[0m [-0.0005139377873127886,-0.00010732736087360269,0.003142402673207145,-0.001086189483856698,-0.0004302271656190057,-0.002052948399000394,0.001140933059614308,0.00025377341445579585,0.0029038695084857855,-0.0002206765131427839,-0.0011239779873955789,-0.0022469543011024375,0.0008420528172409565,0.0011026534916779322,-0.0010702831470398987,0.0006552716217313484,-0.0006384883936517555,0.0011836406140642573,0.00014044028961531389,-0.00011504483633765505,0.0013523396808976917,0.00035397771861354053,-0.00011740944587970154,-0.00028066153131906457,0.0007061347970614412,-0.0001733423871633174,-0.0015401149760194378,-0.0012267797429642232,0.00027028301644387476,-0.0019227384551531736,0.0001993177359922104,0.001621834566655645,-0.0024785076214114644,0.0010437822851534846,0.0011655474829538183,0.0013632230476924873,-0.0007818446815695272,-0.001261321529890949,0.0007287181006196793,0.0013279259621895433,-0.0007505976173601193,-0.0008318901

[1mIntercept: [0m -4.3827783611142355 [1m
Coefficients:[0m [-0.0006061723636697996,0.0029023343025249104,0.001580388634820583,3.6283702771908875e-05,0.0010394161385585742,-0.003457125249879477,-0.0025420137034595403,0.0015684002370335134,0.0013728523625403514,0.00041905874082275165,-0.0014812157675005244,-0.0015260182459134373,0.00024221707686808658,0.00038021263815317093,0.0011787715211796761,-0.0017258547816803218,-0.0004826238457190414,0.0012721122881402332,0.00046761991287185445,0.0007093436138532297,-0.001098724932588719,-0.0007404348357722665,0.00014493874877262545,-0.0002850365606819478,-4.345339357256193e-05,0.0007449608448729709,0.0001697961114383726,0.0002877859045105302,-0.0002847818493304652,-0.001397229910025995,-0.0011667105287225588,0.0006859986412717353,0.0005565004217862202,-0.0010928554917902964,-0.0014390392158321408,0.00020040607457509564,-0.0016310558276199552,0.0040184540658621554,0.00029346307605412496,0.0007978502649403102,0.0013468650043283632,-0.0009686172

+--------------------+---------------------+
|feature             |coeff                |
+--------------------+---------------------+
|3-EnergyEntropym    |0.004710318053781462 |
|60-ChromaVector5std |0.002020146968100716 |
|37-EnergyEntropystd |0.0018763100957510978|
|2-Energym           |0.0017543445857756996|
|34-ChromaDeviationm |0.0016610389627866868|
|36-Energystd        |0.0016335032755302676|
|48-MFCCs6std        |0.001354898364386598 |
|55-MFCCs13std       |0.001340570386893193 |
|52-MFCCs10std       |0.0013145411231634495|
|32-ChromaVector11m  |0.0012283087165193857|
|53-MFCCs11std       |0.0012130476294986243|
|56-ChromaVector1std |0.0011726379997182184|
|54-MFCCs12std       |0.0010307397068017288|
|21-MFCCs13m         |0.001002554899902844 |
|39-SpectralSpreadstd|9.528013892276358E-4 |
|29-ChromaVector8m   |8.95777485201551E-4  |
|35-ZCRstd           |8.233250318120177E-4 |
|45-MFCCs3std        |7.983351234771344E-4 |
|51-MFCCs9std        |7.612100769325016E-4 |
|41-Spectr

+----------------------+---------------------+
|feature               |coeff                |
+----------------------+---------------------+
|51-MFCCs9std          |0.0031457886001516667|
|41-SpectralFluxstd    |0.00267772407052382  |
|8-SpectralRolloffm    |0.0026549219144214524|
|60-ChromaVector5std   |0.002346752724801387 |
|34-ChromaDeviationm   |0.0020576229728437195|
|38-SpectralCentroidstd|0.0017663027364699705|
|16-MFCCs8m            |0.0017360089670905765|
|31-ChromaVector10m    |0.001658075379028541 |
|48-MFCCs6std          |0.0014577689682980582|
|49-MFCCs7std          |0.0013390909935871403|
|36-Energystd          |0.001150275467655392 |
|21-MFCCs13m           |9.708340399758433E-4 |
|3-EnergyEntropym      |8.664669297468573E-4 |
|57-ChromaVector2std   |7.464197374373925E-4 |
|22-ChromaVector1m     |6.81768495332724E-4  |
|32-ChromaVector11m    |6.596204093893387E-4 |
|56-ChromaVector1std   |6.393606989385708E-4 |
|11-MFCCs3m            |5.704551227424666E-4 |
|1-ZCRm      

+-------------------+---------------------+
|feature            |coeff                |
+-------------------+---------------------+
|60-ChromaVector5std|0.006453764949876246 |
|55-MFCCs13std      |0.0033622735811666008|
|25-ChromaVector4m  |0.00335477236258262  |
|6-SpectralEntropym |0.0033209025582756594|
|37-EnergyEntropystd|0.003219812508330832 |
|59-ChromaVector4std|0.002966895364742471 |
|44-MFCCs2std       |0.0016041043121661032|
|58-ChromaVector3std|0.001489853933595376 |
|56-ChromaVector1std|0.0013600873962374216|
|51-MFCCs9std       |0.0012426905843151653|
|29-ChromaVector8m  |0.0010764277506176967|
|16-MFCCs8m         |8.345630969480105E-4 |
|14-MFCCs6m         |8.268058135431907E-4 |
|27-ChromaVector6m  |7.902425349723696E-4 |
|9-MFCCs1m          |6.697143415594907E-4 |
|23-ChromaVector2m  |6.105367898305436E-4 |
|32-ChromaVector11m |4.7194828896683096E-4|
|36-Energystd       |4.2020607193909456E-4|
|22-ChromaVector1m  |3.740610872443714E-4 |
|54-MFCCs12std      |3.149734573

+-------------------+---------------------+
|feature            |coeff                |
+-------------------+---------------------+
|21-MFCCs13m        |0.002989868503101851 |
|4-SpectralCentroidm|0.0023011231171182383|
|15-MFCCs7m         |0.002226764682290678 |
|53-MFCCs11std      |0.0021105498371833094|
|37-EnergyEntropystd|0.0018420398607064696|
|58-ChromaVector3std|0.0018272424657936781|
|50-MFCCs8std       |0.0018112248159622385|
|55-MFCCs13std      |0.0017434940163050343|
|19-MFCCs11m        |0.0017410412738946659|
|27-ChromaVector6m  |0.0016837810182322571|
|3-EnergyEntropym   |0.0015944056665101842|
|60-ChromaVector5std|0.0015135095362086447|
|17-MFCCs9m         |0.0012015576049986918|
|26-ChromaVector5m  |0.001171172285076394 |
|6-SpectralEntropym |9.429304547485125E-4 |
|1-ZCRm             |9.420341200220409E-4 |
|16-MFCCs8m         |7.788365452265335E-4 |
|48-MFCCs6std       |5.458480338137726E-4 |
|45-MFCCs3std       |4.176444524493252E-4 |
|54-MFCCs12std      |3.908242882

[1mIntercept: [0m 1.1724323623168063 [1m
Coefficients:[0m [-0.0014632463070668646,-0.0037293620613081564,0.0005479871183454294,-0.001612604123048397,0.0007273678228066464,-0.0014355941819046325,-0.001533815681633388,-0.0008890009482329947,-0.0028718465254600504,0.0022499811840399468,-0.0001015618464862203,-3.7919313542739414e-05,-0.0019670932953473504,0.0003242510493073059,-0.0010460995230661867,8.392054779456661e-05,-0.0032953376633637816,-0.0007031979999083409,-0.0009452968475220557,-0.0006193884293434992,0.00019713506881644302,0.0013357271141036223,0.0008621110970351105,0.0001606642761690714,-0.0010306143928690756,0.0014141419516106625,-0.00027198795923100556,0.00023567758397163084,0.0026986560922194615,-0.00025159350939563043,-0.00038051437284687754,0.001628992741814989,0.0021400782103102554,0.0008709154985120098,0.0009214011147490088,0.0032923434623873963,-0.00036541836319322674,0.0021831396090225557,-9.35666048605307e-05,8.23554603518792e-06,0.00012452337454174977,0.001204282

+---------------------+---------------------+
|feature              |coeff                |
+---------------------+---------------------+
|9-MFCCs1m            |0.0050061609848410715|
|43-MFCCs1std         |0.0034645745677177684|
|22-ChromaVector1m    |0.002395918819804486 |
|70-BPMconf           |0.0023249397888972685|
|30-ChromaVector9m    |0.0018012387626943388|
|16-MFCCs8m           |0.0016432020737475322|
|18-MFCCs10m          |0.001554065628456601 |
|11-MFCCs3m           |0.0015457996210320496|
|2-Energym            |0.001504528175916523 |
|37-EnergyEntropystd  |0.0014658126758984801|
|51-MFCCs9std         |9.490547776473764E-4 |
|6-SpectralEntropym   |9.300931954650559E-4 |
|3-EnergyEntropym     |7.096516025843169E-4 |
|25-ChromaVector4m    |6.46376326682387E-4  |
|12-MFCCs4m           |5.975692074949871E-4 |
|24-ChromaVector3m    |5.481862337746344E-4 |
|13-MFCCs5m           |5.477623204641065E-4 |
|32-ChromaVector11m   |5.041694218966412E-4 |
|46-MFCCs4std         |4.971633887

+--------------------+---------------------+
|feature             |coeff                |
+--------------------+---------------------+
|56-ChromaVector1std |0.005361828360712423 |
|69-BPM              |0.0027816441288569587|
|9-MFCCs1m           |0.0022954748816592274|
|50-MFCCs8std        |0.002148433727313108 |
|43-MFCCs1std        |0.0019774431698532558|
|11-MFCCs3m          |0.0015669271707192817|
|66-ChromaVector11std|0.0015444069430287453|
|62-ChromaVector7std |0.0015062338804391264|
|52-MFCCs10std       |0.0013307147221093682|
|58-ChromaVector3std |0.001018875782845234 |
|27-ChromaVector6m   |0.0010047874684149857|
|59-ChromaVector4std |9.782383707297641E-4 |
|53-MFCCs11std       |9.44530511031019E-4  |
|18-MFCCs10m         |9.020330770078221E-4 |
|21-MFCCs13m         |8.616280994178366E-4 |
|45-MFCCs3std        |8.443360034529914E-4 |
|13-MFCCs5m          |8.420266752635723E-4 |
|35-ZCRstd           |7.989274281528484E-4 |
|67-ChromaVector12std|7.518558489685723E-4 |
|32-Chroma

+--------------------+---------------------+
|feature             |coeff                |
+--------------------+---------------------+
|69-BPM              |0.0036536181244617156|
|44-MFCCs2std        |0.003370939317931728 |
|55-MFCCs13std       |0.003168311323428454 |
|36-Energystd        |0.002641402140501172 |
|3-EnergyEntropym    |0.00233071131431432  |
|63-ChromaVector8std |0.00229294210343538  |
|54-MFCCs12std       |0.0020096460402373355|
|27-ChromaVector6m   |0.0019157772433192437|
|34-ChromaDeviationm |0.0016073868380801523|
|18-MFCCs10m         |0.0014860282897685608|
|52-MFCCs10std       |0.001067713103282287 |
|46-MFCCs4std        |0.0010392960736416142|
|24-ChromaVector3m   |9.640301495846889E-4 |
|48-MFCCs6std        |8.5311145464213E-4   |
|15-MFCCs7m          |8.508770602510055E-4 |
|6-SpectralEntropym  |8.127892275537794E-4 |
|66-ChromaVector11std|7.91200381798846E-4  |
|49-MFCCs7std        |7.805420502762905E-4 |
|60-ChromaVector5std |6.031792924960962E-4 |
|2-Energym

+----------------------+---------------------+
|feature               |coeff                |
+----------------------+---------------------+
|29-ChromaVector8m     |0.003499277141498636 |
|43-MFCCs1std          |0.002908352424994844 |
|41-SpectralFluxstd    |0.0028501974748288976|
|7-SpectralFluxm       |0.002687765214341369 |
|21-MFCCs13m           |0.002068158106139719 |
|38-SpectralCentroidstd|0.002048138618551188 |
|39-SpectralSpreadstd  |0.0020233199114395068|
|23-ChromaVector2m     |0.0017685593338607965|
|70-BPMconf            |0.0017570148302831327|
|45-MFCCs3std          |0.0017072178995462197|
|33-ChromaVector12m    |0.0016461424387570205|
|37-EnergyEntropystd   |0.0014637088114300763|
|34-ChromaDeviationm   |0.0014564189569386465|
|47-MFCCs5std          |0.0012680546448222412|
|32-ChromaVector11m    |0.0012014276394617556|
|13-MFCCs5m            |0.001107417561675057 |
|22-ChromaVector1m     |7.268042511052394E-4 |
|28-ChromaVector7m     |7.165908581598595E-4 |
|44-MFCCs2std

+----------------------+---------------------+
|feature               |coeff                |
+----------------------+---------------------+
|11-MFCCs3m            |0.0032193879390392892|
|67-ChromaVector12std  |0.002651313755411129 |
|38-SpectralCentroidstd|0.0016843705481038285|
|43-MFCCs1std          |0.0016533627362578026|
|35-ZCRstd             |0.0016245732059408167|
|30-ChromaVector9m     |0.0016056910529219776|
|42-SpectralRolloffstd |0.0015668082667349638|
|70-BPMconf            |0.0015350844960886432|
|15-MFCCs7m            |0.001473884067628245 |
|41-SpectralFluxstd    |0.0014538521857972516|
|28-ChromaVector7m     |0.001379334463436413 |
|17-MFCCs9m            |0.0012384641396957052|
|44-MFCCs2std          |0.0011638241353168922|
|25-ChromaVector4m     |0.001137396059634726 |
|54-MFCCs12std         |0.001044831519291648 |
|37-EnergyEntropystd   |0.0010427413195330932|
|13-MFCCs5m            |9.696612545398464E-4 |
|7-SpectralFluxm       |7.836428336003808E-4 |
|31-ChromaVec

#### Selector for Random Forest

In [76]:
from pyspark.ml.feature import VectorSlicer

In [77]:
### code from LaylaAi's udemy course
classifiers=[RandomForestClassifier()]
maximum= len(input_columns)

for n in range(10, maximum, 10):
    print("Testing top n= ", n, " features")
    
    best_n_features= featureImportances.argsort()[-n:][::-1]
    best_n_features= best_n_features.tolist()
    vs= VectorSlicer(inputCol='features', outputCol='best_features',indices=best_n_features)
    bestFeaturesDf= vs.transform(final_data)
    train,test= bestFeaturesDf.randomSplit([0.7, 0.3])
    
    folds= 2
    
    columns = ['Classifier', 'Result']
    vals = [('Place Holder', 'N/A')]
    results = spark.createDataFrame(vals, columns)
    
    for classifier in classifiers:
        new_result = ClassTrainEval(classifier, features, classes, folds, train, test)
        results = results.union(new_result)
        
    results = results.where('Classifier!="Place Holder"')
    results.show(100, False)
    

Testing top n=  10  features
 
[1mRandomForestClassifier  Feature Importances[0m
(Scores add up to 1)
Lowest score is the least important
 
[0.02888963 0.03108573 0.0159277  0.01337458 0.0125509  0.01025784
 0.01419054 0.00750732 0.01859048 0.01639994 0.01111623 0.01124447
 0.00998606 0.00770556 0.01305929 0.00894956 0.00583386 0.00741931
 0.01141357 0.01080727 0.00858454 0.00893249 0.00730578 0.0115212
 0.00794147 0.01098896 0.006381   0.01059559 0.01040044 0.00776125
 0.01073851 0.01119676 0.00833582 0.01017087 0.01328641 0.01185696
 0.01166108 0.02124423 0.02076403 0.01086397 0.01522136 0.00884456
 0.01965365 0.0188907  0.0133015  0.01340496 0.01400851 0.01842625
 0.01527748 0.01439687 0.01311606 0.01799388 0.01366715 0.01976257
 0.0172113  0.01115143 0.01152773 0.00962855 0.00783504 0.00897977
 0.00783178 0.01096438 0.01223009 0.008425   0.01103581 0.0105871
 0.00700039 0.00827773 0.04741375 0.02396123 0.08313222]
+----------------------+------+
|Classifier            |Result|
+-

### Final Model

In [78]:
classifiers=[RandomForestClassifier()]

n=71
best_n_features= featureImportances.argsort()[-n:][::-1]
best_n_features= best_n_features.tolist()
vs= VectorSlicer(inputCol='features', outputCol='best_features',indices=best_n_features)
bestFeaturesDf= vs.transform(final_data)
train,test= bestFeaturesDf.randomSplit([0.7, 0.3])

columns = ['Classifier', 'Result']
vals = [('Place Holder', 'N/A')]
results = spark.createDataFrame(vals, columns)
    
for classifier in classifiers:
    new_result = ClassTrainEval(classifier, features, classes, folds, train, test)
    results = results.union(new_result)
        
results = results.where('Classifier!="Place Holder"')
results.show(100, False)
    

 
[1mRandomForestClassifier  Feature Importances[0m
(Scores add up to 1)
Lowest score is the least important
 
[0.03125174 0.03092625 0.0198047  0.01285609 0.00901886 0.00884094
 0.01107597 0.00604407 0.01782944 0.01784096 0.01209756 0.01036299
 0.00739588 0.0107025  0.01014866 0.00802617 0.00996827 0.00641163
 0.00950391 0.00817639 0.01140924 0.0113248  0.00877135 0.01260641
 0.00746633 0.01054556 0.00700342 0.0110766  0.00745274 0.00888207
 0.00967223 0.01454129 0.00763932 0.0119033  0.01194383 0.01215271
 0.01022457 0.01567451 0.0249122  0.01203591 0.01627062 0.01319178
 0.01913719 0.01935344 0.0118876  0.01188497 0.01117717 0.02264837
 0.01464779 0.0115643  0.01516469 0.01863266 0.01784844 0.01535855
 0.019193   0.00858562 0.00798951 0.01110997 0.00722239 0.00737345
 0.00663295 0.01139503 0.01110558 0.00857489 0.01106337 0.01121042
 0.00814962 0.00698486 0.04142901 0.02631359 0.09137778]
+----------------------+------+
|Classifier            |Result|
+----------------------+-----

#### Make Conclusions

In [83]:
predictions = BestModel.transform(test)

In [84]:
count = predictions.filter('label!=21.0 AND prediction ==21.0').count()
print(count)
predictions.filter('label!=21.0 AND prediction ==21.0').show()

10
+-----+--------------------+--------------------+--------------------+--------------------+----------+
|label|            features|       best_features|       rawPrediction|         probability|prediction|
+-----+--------------------+--------------------+--------------------+--------------------+----------+
| 12.0|[231.375566399096...|[535.433070866141...|[0.12822352205422...|[0.00256447044108...|      21.0|
| 12.0|[500.629373568471...|[535.433070866141...|[0.54391737294963...|[0.01087834745899...|      21.0|
| 14.0|[159.800545928214...|[511.811023622047...|[0.34052287581699...|[0.00681045751633...|      21.0|
| 16.0|[245.822938778770...|[519.685039370078...|[0.02578838785735...|[5.15767757147067...|      21.0|
| 16.0|[264.866355108149...|[511.811023622047...|[0.03333333333333...|[6.66666666666666...|      21.0|
| 16.0|[343.458940732645...|[511.811023622047...|[0.11204481792717...|[0.00224089635854...|      21.0|
| 17.0|[240.533842646285...|[511.811023622047...|[0.22941176470588...|