The purpose of the script here is to take the data of each of the OpenCV classifiers found in this folder and create a dataframe of the parameter settings for each of the classifiers.

In the future, we will also want to add the results obtained from the classifiers to this dataframe.

Also, although the classifier files contain information about the parameter settings used, they do not contain information about the specific images used to train them. Therefore, we also need to link these classifiers to the bg.txt (list of negatives used in training) and the info.data (list of positive annotations used in training). 


In [1]:
import pandas as pd
import xml.etree.ElementTree as ET
import pprint
import os

In [2]:
# this function will process a classifier.xml and return a dictionary with it's parameter settings
def get_parameters(xmlfile):
    tree = ET.parse(xmlfile)
    root = tree.getroot()
    parameters = {}
    parameters['Classifier'] = xmlfile
    for child in root:
        for x in child:
            if '   ' not in x.text:
                parameters[x.tag] = x.text
            for y in x:
                if '   ' not in y.text:
                    parameters[y.tag] = y.text
    return parameters
    
    

In [3]:
# this function goes through all of the .xml files in this directory and makes a single data frame showing the paramters used
# in training each of the classifiers.
def make_parameter_df():
    classifiers = [xmlfile for xmlfile in os.listdir(os.getcwd()) if '.xml' in xmlfile]
    parameters = []
    for xmlfile in classifiers:
        parameters.append(get_parameters(xmlfile))
    return pd.DataFrame(parameters)
    
    

In [4]:
parameters = make_parameter_df()

In [5]:
parameters

Unnamed: 0,Classifier,boostType,featSize,featureType,height,maxCatCount,maxDepth,maxFalseAlarm,maxWeakCount,minHitRate,mode,stageNum,stageType,weightTrimRate,width
0,cascade_1.xml,GAB,1,LBP,40,256,1,0.5,100,0.9990000128746032,,12,BOOST,0.95,40
1,whale_classifier.xml,GAB,1,HAAR,24,0,1,0.5,100,0.9950000047683715,ALL,4,BOOST,0.95,24
2,cascade_2.xml,GAB,1,LBP,30,256,1,0.5,100,0.9950000047683715,,12,BOOST,0.95,30
3,whale_classifier2.xml,GAB,1,LBP,24,256,1,0.5,100,0.9950000047683715,,20,BOOST,0.95,24


In [6]:
cols = ['Classifier','stageType','featureType','height','width','boostType','minHitRate', 'maxFalseAlarm','weightTrimRate','maxDepth','maxWeakCount','maxCatCount','featSize','mode','stageNum']


In [7]:
test = parameters[cols]

In [8]:
test

Unnamed: 0,Classifier,stageType,featureType,height,width,boostType,minHitRate,maxFalseAlarm,weightTrimRate,maxDepth,maxWeakCount,maxCatCount,featSize,mode,stageNum
0,cascade_1.xml,BOOST,LBP,40,40,GAB,0.9990000128746032,0.5,0.95,1,100,256,1,,12
1,whale_classifier.xml,BOOST,HAAR,24,24,GAB,0.9950000047683715,0.5,0.95,1,100,0,1,ALL,4
2,cascade_2.xml,BOOST,LBP,30,30,GAB,0.9950000047683715,0.5,0.95,1,100,256,1,,12
3,whale_classifier2.xml,BOOST,LBP,24,24,GAB,0.9950000047683715,0.5,0.95,1,100,256,1,,20


In [9]:
test_dict = {}
for child in root:
    for x in child:
        if '   ' not in x.text:
            test_dict[x.tag] = x.text
        for y in x:
            if '   ' not in y.text:
                test_dict[y.tag] = y.text
pprint.pprint(test_dict)
        

NameError: name 'root' is not defined

In [10]:
test_dict

{}

In [11]:
ls

cascade_1.xml  get_parameters.ipynb   whale_classifier.xml
cascade_2.xml  whale_classifier2.xml


In [12]:
os.listdir()

TypeError: listdir() takes exactly 1 argument (0 given)