# Comparing the Performance of scikit-eLCS and the Original eLCS Algorithm

Author: Robert Zhang - Univeresity of Pennsylvania, B.S.E Computer Science, B.S.E. Economics (SEAS '22, WH '22)

Advisor: Ryan Urbanowicz, PhD - University of Pennsylvania, Department of Biostatics, Epidemiology, and Informatics & Institue for Biomedical Informatics (IBI)

Date: 04/05/2020

Requirements: (Python 3)
<ul>
    <li>scikit-eLCS</li>
    <li>pandas</li>
    <li>numpy</li>
    <li>scikit-learn</li>
</ul>

## Introduction
This notebook presents a comparison between the performance of the original eLCS Algorithm, as presented in the 2017 textbook "Introduction to Learning Classifier Systems" by Ryan Urbanowicz and Will Browne, and the new scikit-eLCS Python package.

The scikit-eLCS package is a sklearn compatible Python implementation of the original eLCS Algorithm. It was designed to perform equally well in terms of training/testing accuracy and training time, while being significantly more user friendly, and including an array of additional real time analysis tools. This guide will demonstrate these capabilities in detail.

The scikit-eLCS source code and a complete walkthrough of its usage can be found at <a href=https://github.com/UrbsLab/scikit-eLCS>this Github Repository</a>. The package can be installed via **pip3 install scikit-eLCS**.

## Notebook Organization
**Part 1: Comparing Training Accuracy and Runtime**
<ul>
    <li> 6-bit Multiplexer Problem </li>
    <li> 11-bit Multiplexer Problem </li>
    <li> 20-bit Multiplexer Problem </li>
</ul>

**Part 2: Comparing Testing Accuracy**
<ul>
    <li> 6-bit Multiplexer Problem </li>
    <li> 11-bit Multiplexer Problem </li>
    <li> 20-bit Multiplexer Problem </li>
</ul>

**Part 3: Quick Demo of Additional Analysis Tools Provided by scikit-eLCS**
<ul>
    <li> Iteration Tracking Tool </li>
    <li> Rule Population Tool </li>
    <li> Population Statistics Tools </li>
</ul>

## Part 1: Comparing Training Accuracy and Runtime
We will use the n-bit Multiplexer Problem to test the training accuracy and runtime of the two eLCS implementations. The Multiplexer Problem is a benchmark LCS problem, due to its highly epistatic and heterogeneous nature.
<br>
<br>
<img src="MP.jpg">

We will use the same hyperparameters for both eLCS implementations, and also use the same random seed, to ensure the exact replicability (without a set random seed however, the results of analysis will still yield highly similar conclusions).

### 6-bit Multiplexer Problem with Original eLCS

In [26]:
from eLCS_Timer import Timer
from eLCS_ConfigParser import ConfigParser
from eLCS_Offline_Environment import Offline_Environment
from eLCS_Algorithm import eLCS
from eLCS_Constants import *
import time

helpstr = """Failed attempt to run e-LCS.  Please ensure that a configuration file giving all run parameters has been specified."""
configurationFile = "ConfigFiles/config6.txt"
ConfigParser(configurationFile)
timer = Timer() 
cons.referenceTimer(timer)
env = Offline_Environment()
cons.referenceEnv(env)
cons.parseIterations()

#Run the e-LCS algorithm.
eLCS()
print("Deletion Time:"+str(cons.timer.globalDeletion))
print("Evaluation Time:"+str(cons.timer.globalEvaluation))
print("Matching Time:"+str(cons.timer.globalMatching))
print("Selection Time:"+str(cons.timer.globalSelection))
print("Subsumption Time:"+str(cons.timer.globalSubsumption))
print("Total Time:"+str(cons.timer.globalTime))

----------------------------------------------------------------------------
eLCS Code Demo 5: The Complete eLCS Algorithm - Niche GA + Subsumption
----------------------------------------------------------------------------
Environment: Formatting Data... 
DataManagement: Loading Data... Datasets/o-eLCS/Multiplexer6.txt
DataManagement: Phenotype Column Location = 6
DataManagement: Number of Attributes = 6
DataManagement: Number of Instances = 64
DataManagement: Analyzing Phenotype...
DataManagement: Phenotype Detected as Discrete.
DataManagement: Detecting Classes...
DataManagement: Following Classes Detected:['0', '1']
Class: 0 count = 32
Class: 1 count = 32
DataManagement: Detecting Attributes...
DataManagement: Identified 6 discrete and 0 continuous attributes.
DataManagement: Characterizing Attributes...
----------------------------------------------------------------------------
eLCS: Initializing Algorithm...
Learning Checkpoints: [5000]
Maximum Iterations: 5000
Beginning eLCS l

### 6-bit Multiplexeer Problem with scikit-eLCS

In [27]:
import skeLCS
import numpy as np
import pandas as pd

data = pd.read_csv("Datasets/scikit-eLCS/Multiplexer6.csv")
classLabel = "class"
dataFeatures = data.drop(classLabel,axis=1).values
dataPhenotypes = data[classLabel].values

model = skeLCS.eLCS(learningIterations = 5000,randomSeed = 0)
model.fit(dataFeatures,dataPhenotypes)
print("Deletion Time:"+str(model.timer.globalDeletion))
print("Evaluation Time:"+str(model.timer.globalEvaluation))
print("Matching Time:"+str(model.timer.globalMatching))
print("Selection Time:"+str(model.timer.globalSelection))
print("Subsumption Time:"+str(model.timer.globalSubsumption))
print("Total Time:"+str(model.timer.globalTime))
print(model.score(dataFeatures,dataPhenotypes))

Deletion Time:0.05897688865661621
Evaluation Time:0.029959917068481445
Matching Time:3.8522276878356934
Selection Time:0.1961503028869629
Subsumption Time:0.33667635917663574
Total Time:5.285501956939697
1.0


### 11-bit Multiplexer Problem with Original eLCS

In [28]:
configurationFile = "ConfigFiles/config11.txt"
ConfigParser(configurationFile)
timer = Timer() 
cons.referenceTimer(timer)
env = Offline_Environment()
cons.referenceEnv(env)
cons.parseIterations()
eLCS()
print("Deletion Time:"+str(cons.timer.globalDeletion))
print("Evaluation Time:"+str(cons.timer.globalEvaluation))
print("Matching Time:"+str(cons.timer.globalMatching))
print("Selection Time:"+str(cons.timer.globalSelection))
print("Subsumption Time:"+str(cons.timer.globalSubsumption))
print("Total Time:"+str(cons.timer.globalTime))

----------------------------------------------------------------------------
eLCS Code Demo 5: The Complete eLCS Algorithm - Niche GA + Subsumption
----------------------------------------------------------------------------
Environment: Formatting Data... 
DataManagement: Loading Data... Datasets/o-eLCS/Multiplexer11.txt
DataManagement: Phenotype Column Location = 11
DataManagement: Number of Attributes = 11
DataManagement: Number of Instances = 2048
DataManagement: Analyzing Phenotype...
DataManagement: Phenotype Detected as Discrete.
DataManagement: Detecting Classes...
DataManagement: Following Classes Detected:['0', '1']
Class: 0 count = 1024
Class: 1 count = 1024
DataManagement: Detecting Attributes...
DataManagement: Identified 11 discrete and 0 continuous attributes.
DataManagement: Characterizing Attributes...
----------------------------------------------------------------------------
eLCS: Initializing Algorithm...
Learning Checkpoints: [5000]
Maximum Iterations: 5000
Beginn

### 11-bit Multiplexer Problem with scikit-eLCS

In [29]:
data = pd.read_csv("Datasets/scikit-eLCS/Multiplexer11.csv")
classLabel = "class"
dataFeatures = data.drop(classLabel,axis=1).values
dataPhenotypes = data[classLabel].values

model = skeLCS.eLCS(learningIterations = 5000,randomSeed = 0)
model.fit(dataFeatures,dataPhenotypes)
print("Deletion Time:"+str(model.timer.globalDeletion))
print("Evaluation Time:"+str(model.timer.globalEvaluation))
print("Matching Time:"+str(model.timer.globalMatching))
print("Selection Time:"+str(model.timer.globalSelection))
print("Subsumption Time:"+str(model.timer.globalSubsumption))
print("Total Time:"+str(model.timer.globalTime))
print(model.score(dataFeatures,dataPhenotypes))

Deletion Time:2.190622568130493
Evaluation Time:0.04238486289978027
Matching Time:12.66571855545044
Selection Time:0.5445852279663086
Subsumption Time:2.318056583404541
Total Time:17.842293977737427
0.98876953125


### 20-bit Multiplexer Problem with Original eLCS

In [30]:
configurationFile = "ConfigFiles/config20.txt"
ConfigParser(configurationFile)
timer = Timer() 
cons.referenceTimer(timer)
env = Offline_Environment()
cons.referenceEnv(env)
cons.parseIterations()
t = time.time()
eLCS()
print(time.time()-t)
print("Deletion Time:"+str(cons.timer.globalDeletion))
print("Evaluation Time:"+str(cons.timer.globalEvaluation))
print("Matching Time:"+str(cons.timer.globalMatching))
print("Selection Time:"+str(cons.timer.globalSelection))
print("Subsumption Time:"+str(cons.timer.globalSubsumption))
print("Total Time:"+str(cons.timer.globalTime))

----------------------------------------------------------------------------
eLCS Code Demo 5: The Complete eLCS Algorithm - Niche GA + Subsumption
----------------------------------------------------------------------------
Environment: Formatting Data... 
DataManagement: Loading Data... Datasets/o-eLCS/Multiplexer20.txt
DataManagement: Phenotype Column Location = 20
DataManagement: Number of Attributes = 20
DataManagement: Number of Instances = 2000
DataManagement: Analyzing Phenotype...
DataManagement: Phenotype Detected as Discrete.
DataManagement: Detecting Classes...
DataManagement: Following Classes Detected:['0', '1']
Class: 0 count = 1032
Class: 1 count = 968
DataManagement: Detecting Attributes...
DataManagement: Identified 20 discrete and 0 continuous attributes.
DataManagement: Characterizing Attributes...
----------------------------------------------------------------------------
eLCS: Initializing Algorithm...
Learning Checkpoints: [10000]
Maximum Iterations: 10000
Begin

Epoch: 55	 Iteration: 5500	 MacroPop: 886	 MicroPop: 1000	 AccEstimate: 0.8	 AveGen: 0.699700000000001	 Time: 0.35570540428161623
Epoch: 56	 Iteration: 5600	 MacroPop: 883	 MicroPop: 1000	 AccEstimate: 0.68	 AveGen: 0.7011000000000001	 Time: 0.36302796999613446
Epoch: 57	 Iteration: 5700	 MacroPop: 874	 MicroPop: 1000	 AccEstimate: 0.76	 AveGen: 0.7023499999999993	 Time: 0.37010927200317384
Epoch: 58	 Iteration: 5800	 MacroPop: 882	 MicroPop: 1000	 AccEstimate: 0.72	 AveGen: 0.7060999999999988	 Time: 0.3777712027231852
Epoch: 59	 Iteration: 5900	 MacroPop: 876	 MicroPop: 1000	 AccEstimate: 0.79	 AveGen: 0.7065499999999993	 Time: 0.3855320851008097
Epoch: 60	 Iteration: 6000	 MacroPop: 882	 MicroPop: 1000	 AccEstimate: 0.72	 AveGen: 0.7054999999999999	 Time: 0.39206186930338544
Epoch: 61	 Iteration: 6100	 MacroPop: 877	 MicroPop: 1000	 AccEstimate: 0.77	 AveGen: 0.7029499999999995	 Time: 0.3981567541758219
Epoch: 62	 Iteration: 6200	 MacroPop: 885	 MicroPop: 1000	 AccEstimate: 0.8	 AveG

### 20-bit Multiplexer Problem with scikit-eLCS

In [31]:
data = pd.read_csv("Datasets/scikit-eLCS/Multiplexer20.csv")
classLabel = "class"
dataFeatures = data.drop(classLabel,axis=1).values
dataPhenotypes = data[classLabel].values

model = skeLCS.eLCS(learningIterations = 10000,randomSeed = 0)
t = time.time()
model.fit(dataFeatures,dataPhenotypes)
print(time.time()-t)
print("Deletion Time:"+str(model.timer.globalDeletion))
print("Evaluation Time:"+str(model.timer.globalEvaluation))
print("Matching Time:"+str(model.timer.globalMatching))
print("Selection Time:"+str(model.timer.globalSelection))
print("Subsumption Time:"+str(model.timer.globalSubsumption))
print("Total Time:"+str(model.timer.globalTime))
print(model.score(dataFeatures,dataPhenotypes))

71.39654588699341
Deletion Time:17.14607071876526
Evaluation Time:0.08619832992553711
Matching Time:42.485435485839844
Selection Time:1.7833759784698486
Subsumption Time:16.122636079788208
Total Time:71.15319013595581
0.7937928438721251
