# Learning Classifiers from Scratch

## Model Source

The initial, basic learning classifier system model taken from Dr. Ryan Urbanowicz's "Learning Classifier Systems in a Nutshell" video on YouTube found here: https://youtu.be/CRge_cZ2cJc?si=1CM2osKW7CptJ-DM

This video description of an LCS is the simplest and most digestable that has been found while also staying complete in terms of LCS operation. Additionally, some psuedo code snippets have been taken from Dr. Martin Butz's book "Rule-Based Evolutionary Online Learning Systems" and his algorithmic description of XCS.

### Step 1: Initialize Setup

Initialize the population and create the functions for creating empty match sets and action sets:

In [49]:
# Initialize the empty population. This is only called once at the beginning of the cycle.
def initialize_population():
    population = []
    return population

population = initialize_population()
print(population)

[]


In [50]:
print(population)

[]


### Step 2: Feeding Data to LCS

LCS is an online learning mechanism, but will normally be trained from some dataset. Data from the dataset in training or from the environment in testing will need to be fed to the LCS.

In [51]:

data = './6Multiplexer_Data_Complete.csv'

# Get the length of the file so that the get_instance function doesn't return anything if requested line is not present
def get_data_length(data):
    with open(data, 'r') as file:
        return sum(1 for row in file)

def convert_int(instance):
    int_instance = []
    for i in instance:
        int_instance.append(int(i))
    return int_instance

# Create a function that gets the data from a file an returns a specified instance of the dataset to the LCS
def get_instance(data, line_num):
    import csv
    lines = get_data_length(data)
    with open(data, 'r') as source:
        reader = csv.reader(source)
        if line_num > lines:
            return
        for _ in range(line_num):
            next(reader)
        return convert_int(next(reader))

instance = get_instance(data, 1)
print(instance)

[0, 0, 0, 0, 0, 0, 0]


### Step 3: Determine if classifiers in population match the current instance

Compare each classifier in the population to the current instance. If classifiers in the population match, they are each added to the match set.

In [52]:
# Create a does_match function that compares each attribute between two classifiers
def does_match(state, instance):
    for i in range(len(state)):
        index = state[i][0]
        if state[i][1] != instance[index]:
            return False
    return True

# Create the match set by comparing the attributes of each classifier in the population with the current instance
def create_match_set(population, instance):
    match_set = []
    if len(population) == 0:
        return match_set
    else:
        for classifier in population:
            state = classifier['state']
            if does_match(state, instance) == True:
                match_set.append(classifier)
                classifier['match count'] += 1
                classifier['accuracy'] = classifier['correct count'] / classifier['match count']
                classifier['fitness'] = classifier['accuracy'] ** 5
        return match_set

match_set = create_match_set(population, instance)
print(population)
print(match_set)

[]
[]


### Step 4: Generate the correct set

From the match set, create a correct set by comparing the action or class of each classifier with the action or class of each instance.

In [53]:
# Create the correct set by comparing the class or action of each classifier in the match set with the current instance

def create_correct_set(match_set, instance):
    correct_set = []
    if len(match_set) == 0:
        return correct_set
    else:
        for classifier in match_set:
            if classifier['action'] == instance[-1]:
                correct_set.append(classifier)
                classifier['correct count'] +=1
                classifier['accuracy'] = classifier['correct count'] / classifier['match count']
                classifier['fitness'] = classifier['accuracy'] ** 5
        return correct_set
        
correct_set = create_correct_set(match_set, instance)
print(population)
print(correct_set)

[]
[]


### Step 5: Covering

In most LCS, the population is initialized as being empty. Covering adds classifiers to the population using the current instance if the correct set is empty. This is also the step that turns the simple instance data into the classifier dictionary.

In [54]:
# Create a dictionary item to represent the current instance if the correct set is empty.

def covering(instance, iteration, specificity):
    import random
    state = []
    action = instance[-1]
    for x in range(len(instance) - 1):
        if random.random() < specificity:
            state.append(tuple((x, instance[x])))
        classifier = {'state': state, 
                        'action': action, 
                        'numerosity': 1, 
                        'match count': 1, 
                        'correct count': 1, 
                        'accuracy': 1, 
                        'fitness': 1, 
                        'birth iteration': iteration}
    return classifier

classifier = covering(instance, 1, specificity=.5)

def update_population(classifier, population):
    population.append(classifier)
update_population(classifier, population)
print(population)

[{'state': [(5, 0)], 'action': 0, 'numerosity': 1, 'match count': 1, 'correct count': 1, 'accuracy': 1, 'fitness': 1, 'birth iteration': 1}]


### Step 5.1: Population Filling

Below is an example of a loop that goes through the training data and creates a population from the training instances. The loop takes a data set to train on and a specificity parameter that is from 0 to 1. A specificity of 1 means that the classifier states will be 100% specific and the population will essentially be filled with one of each iteration of the training data. Anything less than 1 and there is a chance that for each bit in the state of a calssifier it will be unspecified. This is equivalent to the # symbol in most LCS algorithms and if a bis is unspecified, the algorithm "doesn't care". The states are coded in a list of tuples that represent index-value pairs. For large datasets and complicated problems, this can have a significant speed and memory advantage over representing "don't care" bits with #.

In [55]:
def testing(data, specificity):
    population = initialize_population()
    length = get_data_length(data)
    for i in range(1, length):
        instance = get_instance(data, i)
        match_set = create_match_set(population, instance)
        correct_set = create_correct_set(match_set, instance)
        if len(correct_set) == 0:
            classifier = covering(instance, i, specificity=specificity)
            update_population(classifier, population)
    return population

### Step 6: Genetic Algorithm

The genetic algorithm is the heart of learning for the LCS. It introduces new rules to the population and evolves accurate, general rules that apply to the training data. The three main portions of the GA are selection, crossover, and mutation, applied in that order.

Selection selects two parent classifiers from the correct set. Selection is most often done in two ways, proportionate selection or tournament selection. Proportionate selection makes the most logical sense at first, but can significantly hinder learning performance. In proportionate selection parents are selected directly proporional to their fitness. However, it is often the case during training that many classifiers will have similar, low accuracy and few classifiers will have high accuracy. The chance of picking a highly accurate classifier at random is small. This is normally visualized with a roulette wheel. If the slices of a roulette wheel were represented by classifier accuracy, one, highly accurate classifer might take up 25% of the wheel while thousands of classifiers with poor accuracy would take up 75% of the wheel. Spinning the wheel to choose a classifier means that you'll pick an inaccurate classifier 75% of the time. There are ways around this like fitness sharing for proportionate selection, but tournament selection is simpler and will be used here. Tournament selection randomly selects a number of classifiers from the correct set. The classifier with the highest accuracy is chosen as a parent. This is repeated for the second parent.

Crossover exchanges attributes of the parent classifier states to create potentially new classifiers. The three main crossover mechanisms are uniform, single point, and double point crossover. Uniform goes one attribute at a time and randomly exchanges the values between the two parent classifiers. Uniform crossover introduces the most diversity into the population but has two major drawbacks. uniform crossover not only significantly more difficult to perform (in terms of computations and even physically coding it) than the other two, it can disrupt learning significantly. For example, if two very accurate classifiers are chosen as parents, uniform crossover can completely disrupt their attributes into new classifiers that look nothing like the original parents. Thus, single point or double point crossover is traditionally used. Single point crossover chooses a random index in the parent classifiers and swaps them at that point. In this way, as least 50% of the parent classifier attributes are maintained in their original order while introducing attributes from the other parent classifier. Two point crossover does the samething as single point but chooses two indices and swaps the portion between those two points. In this method, at least 66% of the parent classifiers are preserved.

Lastly, mutation is applied to the offspring of the two parent classifiers. Mutation is based off a small probability that either converts a generalized attribute into a specified one or vice versa. If converting a generalized attribute to a spcified one, the specified attribute is made sure to match the current training instance.

In [56]:
# Create a function that takes in a set, like the correct set, and selects two parent classifiers

def tournament_selection(data, tournament_size):
    import random
    tournament = list(set(random.choices(data, k=tournament_size)))
    return tournament

data = [1, 2, 3, 4, 5, 6, 7]

print(tournament_selection(data, 9))

TypeError: 'list' object is not callable

In [59]:
my_list = [1, 2, 3, 3, 4, 5, 5]
my_set = list(set(my_list))
print(my_set)

TypeError: 'list' object is not callable