# Genetic Agents

Let's win some cartpole!

Genetic algorithms do well here, and they solve the environment pretty consistently. I try some variations to see how they affect performance.

In [1]:
# Display GIFs in Jupyter
from IPython.display import HTML

# OpenAI gym
import gym

# Import local script
import agents

# numpy
import numpy as np

# To speed up the algorithm
from multiprocessing import Pool
n_jobs = 4 # Set your number of cores here

I re-use the function from before.

In [2]:
def trial_agent(agent, trials=100, limit=1000):
    env = gym.make(agent.game)

    scores = []
    for i in range(trials):
        observation = env.reset()
        score = 0
        for t in range(limit):
            action = agent.predict(observation)
            observation, reward, done, info = env.step(action)
            if done:
                break
            score += reward
        scores.append(score)
        
    data_dict = {
        "agent" : agent, 
        "weights" : agent.w, 
        "pedigree" : agent.pedigree, 
        "minimum" : min(scores), 
        "maximum" : max(scores), 
        "mean" : sum(scores)/len(scores)
    }
    
    env.close()
    
    return data_dict

## The genetic algorithm

Below is my implementation of a genetic algorithm. I designed this by reading the Wikipedia article and talking to a few people. It's definitely not the best out there, but it works fine for cartpole.

When you keep the top agents from a generation, this is called [elitism](https://en.wikipedia.org/wiki/Genetic_algorithm#Elitism). It's a way to ensure that the next generation doesn't ruin itself with mutations and bad parenting. The "elite" from a round of testing stick around to maintain the status quo.

Next the parents are [selected](https://en.wikipedia.org/wiki/Selection_(genetic_algorithm%29) to create the offspring. This [genetic operation](https://en.wikipedia.org/wiki/Crossover_(genetic_algorithm%29) the main part of the algorithm. By mixing genes together, you're effectively searching the parameter space for a solution. Wikipedia uses a crossover method but I just randomly mix them. My agents mix their DNA by drawing genes from a hat, I guess.

[Mutations](https://en.wikipedia.org/wiki/Mutation_(genetic_algorithm%29) are rare, but they're a way of developing new genes that don't exist in the population. This should the agents escape from a local minimum.

One last observation: with random games you need many trials to get a reliable mean score. With a low amount of trials, some lucky games from bad genes could have them passed on. With more trials, this is less likely.

In [3]:
def genetic_algorithm(results, old=5, new=95, n_parents=2, generations=25, 
                      mutation_rate=0.01, mutation_amount=0.5, order=1, max_score=499.0, 
                      game="CartPole-v1"):
    for round in range(generations):
        # Sort agents by score (fitness)
        top_scores = sorted(results, key=lambda x: x["mean"], reverse=True)

        # The survival of the fittest. Wikipedia calls this "elitism".
        # The top agents of a generation are carried over to the next
        survivors = top_scores[:old]

        # To start breeding new agents, I'll mix weights (genes)
        weight_shape = top_scores[0]["weights"].shape
        gene_pool = [list(i["weights"].flatten()) for i in top_scores]
        pedigree_list = [i["pedigree"] for i in top_scores]
        genome_size = top_scores[0]["weights"].size

        # Scores can be negative, so here I make them all positive
        # They also need to sum to 1 for random sampling
        min_score = min([i["mean"] for i in top_scores])
        sum_score = sum([i["mean"]+min_score for i in top_scores])
        probs = [(i["mean"]+min_score)/sum_score for i in top_scores]

        # For each new agent, randomly select parents
        # Higher-fitness agents are likelier to sire new agents
        children = []
        for birth in range(new):
            parents = np.random.choice(np.arange(len(gene_pool)), 
                             size=n_parents, 
                             replace=False, 
                             p=probs)

            # The offspring get a mix of each parent's weights
            # The weights (genes) are simply copied over
            mix = np.random.randint(0, high=n_parents, size=genome_size)

            weights = []
            pedigree = []
            for i in range(genome_size):
                weights.append(gene_pool[parents[mix[i]]][i])
                pedigree.append(pedigree_list[parents[mix[i]]][i])
                # A mutation happens rarely and adds a bit of noise to a gene
                if np.random.random(1) < mutation_rate:
                    weights[i] += float(np.random.normal(0, mutation_amount, 1))
                    pedigree[i] += "M"

            children.append({"weights" : weights, "pedigree" : pedigree})

        # Elitism: the top agents survive to fight another day
        new_agents = [i["agent"] for i in survivors]

        # The offspring are added it
        # With the pedigree variable their ancestors are tracked
        for child in children:
            new_agents.append(
                agents.LinearAgent(
                    np.array(child["weights"]).reshape(weight_shape), 
                    pedigree=child["pedigree"],
                    order=order,
                    game=game))

        # Trial the agents using multiple CPU cores
        p = Pool(n_jobs)
        results = p.map(trial_agent, new_agents)
        p.close()
        
        results = sorted(results, key=lambda x: x["mean"], reverse=True)

        print(f"[{round+1:3}] Population average: {sum([i['mean'] for i in results])/len(results):5.1f}")
        print(f"[{round+1:3}] Best mean score:    {results[0]['mean']:5.1f}, Pedigree: {'-'.join(results[0]['pedigree'])}")
        print()
        
        # End early if maximum is reached
        if results[0]['mean'] >= max_score:
            print(f"[{round+1:3}] Best score reached, ending early")
            break
    return results

## First-order model

The process I use from now takes this form: I randomly generate an initial population, I breed them for a while, and I display the best agent.

Below you can see that the original population isn't that bad. It sometimes already solves the environment, but other times it fails miserably.

In [4]:
results = []

for a in range(25):
    results.append(trial_agent(agents.LinearAgent(None, id=a)))

winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('genetic_simple_test.gif')}'>")

{'agent': <agents.LinearAgent object at 0x7fd1b6263e48>, 'weights': array([[ 0.00482662],
       [-0.58437806],
       [-0.08999562],
       [ 0.89924846],
       [ 0.63816057]]), 'pedigree': ['13', '13', '13', '13', '13'], 'minimum': 69.0, 'maximum': 479.0, 'mean': 183.59}


This runs the genetic algorithm. Each round you can see the population's average score/fitness. For the fittest agent you can see their mean score and their pedigree. Each agent of the original population is given an ID number, and these are how the genees are numbered. If there's a mutation, an M is appended to the ID number.

In [5]:
results = genetic_algorithm(results, generations=25)

[  1] Population average:  52.0
[  1] Best mean score:    488.3, Pedigree: 13-11-11-13-13M

[  2] Population average: 151.7
[  2] Best mean score:    499.0, Pedigree: 13-1-11-13-13M

[  2] Best score reached, ending early


It's very likely the scores will increase consistently each round. Usually the environment is solved within the 25 rounds.

In the example I ran, the cart runs off the screen but doesn't reach all the way within the 500 time steps. This counts as a win, even if isn't a very elegant one.

In [6]:
winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('genetic_simple.gif')}'>")

{'agent': <agents.LinearAgent object at 0x7fd1a919dac8>, 'weights': array([[0.00482662],
       [0.12806551],
       [0.69777549],
       [0.89924846],
       [1.053473  ]]), 'pedigree': ['13', '1', '11', '13', '13M'], 'minimum': 499.0, 'maximum': 499.0, 'mean': 499.0}


## Second-order agent

Is there an advantage to making the agent's logistic regresion into a second order one? This would allow the agent to develop more sophisticated policies.

It doesn't seem to make much of a difference.

In [7]:
results = []

for a in range(25):
    results.append(trial_agent(agents.LinearAgent(None, order=2, id=a)))

winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('genetic_complex_test.gif')}'>")

{'agent': <agents.LinearAgent object at 0x7fd1a92855c0>, 'weights': array([[-0.02640773],
       [ 0.57012762],
       [-0.29486028],
       [-0.0379439 ],
       [ 0.78203947],
       [ 0.74831177],
       [ 0.63544248],
       [ 0.64601175],
       [-0.43026738],
       [-0.81297677],
       [-0.20149993],
       [-0.9208676 ],
       [ 0.51046582],
       [-0.72095141],
       [ 0.06373336]]), 'pedigree': ['13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13', '13'], 'minimum': 36.0, 'maximum': 189.0, 'mean': 84.0}


In [8]:
results = genetic_algorithm(results, generations=25, order=2)

[  1] Population average:  24.0
[  1] Best mean score:    211.6, Pedigree: 13-13-13-19-13-13-13-19-13-13-19-19-19-19-13

[  2] Population average:  44.6
[  2] Best mean score:    499.0, Pedigree: 13-13-11-19M-13-13-11-19-11-13-13-19-19-13-13

[  2] Best score reached, ending early


In [9]:
winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('genetic_complex.gif')}'>")

{'agent': <agents.LinearAgent object at 0x7fd1a8079940>, 'weights': array([[-0.02640773],
       [ 0.57012762],
       [ 0.47916927],
       [ 1.63387038],
       [ 0.78203947],
       [ 0.74831177],
       [ 0.63451236],
       [-0.9783834 ],
       [ 0.5089603 ],
       [-0.81297677],
       [-0.20149993],
       [-0.27187147],
       [ 0.57288765],
       [-0.72095141],
       [ 0.06373336]]), 'pedigree': ['13', '13', '11', '19M', '13', '13', '11', '19', '11', '13', '13', '19', '19', '13', '13'], 'minimum': 499.0, 'maximum': 499.0, 'mean': 499.0}


## A bad initial gene pool?

With a good initial batch of agents, the genetic algorithm can just fine-tune these good guesses. But what about a bad batch? I'm curious to know if mutation play a bigger role here.

By selecting the bottom 10% of an initial population of 250, I get agents that topple the pole as quickly as they can. It's even hard to see it fall in the GIF.

This bad initial batch doesn't seem to overly harm the genetic algorithm. I try both no-mutations and high-mutations; both work well.

In [10]:
results = []

# Run 50 random agents
for a in range(250):
    results.append(trial_agent(agents.LinearAgent(None, id=a)))

# Select the bottom tenth
bottom_tenth = sorted(results, key=lambda x: x["mean"], reverse=True)[-25:]

winner = bottom_tenth[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('bad_genetic_simple_test.gif')}'>")

{'agent': <agents.LinearAgent object at 0x7fd1a807fb70>, 'weights': array([[ 0.52200294],
       [ 0.23513251],
       [ 0.95194245],
       [-0.22501347],
       [-0.9928648 ]]), 'pedigree': ['148', '148', '148', '148', '148'], 'minimum': 7.0, 'maximum': 10.0, 'mean': 8.28}


### No mutation

In [11]:
results = genetic_algorithm(bottom_tenth, generations=25, mutation_rate=0.0)

[  1] Population average:   8.4
[  1] Best mean score:     10.5, Pedigree: 101-101-177-101-101

[  2] Population average:   8.5
[  2] Best mean score:     10.8, Pedigree: 101-101-177-101-101

[  3] Population average:   8.8
[  3] Best mean score:     28.2, Pedigree: 237-48-117-117-221

[  4] Population average:   9.6
[  4] Best mean score:     39.3, Pedigree: 237-48-117-117-118

[  5] Population average:  10.5
[  5] Best mean score:     42.7, Pedigree: 237-48-117-117-118

[  6] Population average:  13.5
[  6] Best mean score:     47.9, Pedigree: 237-48-117-48-118

[  7] Population average:  18.9
[  7] Best mean score:    103.5, Pedigree: 237-48-194-48-194

[  8] Population average:  24.6
[  8] Best mean score:    104.3, Pedigree: 237-118-194-48-194

[  9] Population average:  31.8
[  9] Best mean score:    106.2, Pedigree: 237-48-194-48-194

[ 10] Population average:  44.1
[ 10] Best mean score:    152.3, Pedigree: 237-177-194-117-194

[ 11] Population average:  56.0
[ 11] Best mean sc

In [12]:
winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('bad_genetic_simple.gif')}'>")

{'agent': <agents.LinearAgent object at 0x7fd1a8076860>, 'weights': array([[-0.06521825],
       [-0.00775237],
       [ 0.43074136],
       [ 0.39855194],
       [ 0.36722224]]), 'pedigree': ['237', '124', '194', '48', '194'], 'minimum': 499.0, 'maximum': 499.0, 'mean': 499.0}


### High mutation

In [13]:
results = genetic_algorithm(bottom_tenth, generations=25, mutation_rate=0.1)

[  1] Population average:   8.5
[  1] Best mean score:     19.3, Pedigree: 128-34-128-128-34

[  2] Population average:  12.9
[  2] Best mean score:    355.3, Pedigree: 237-82M-82-82-34

[  3] Population average:  31.7
[  3] Best mean score:    358.6, Pedigree: 237-82M-82-82-34

[  4] Population average:  94.8
[  4] Best mean score:    499.0, Pedigree: 237-34-82-82-34M

[  4] Best score reached, ending early


In [14]:
winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('mutants_genetic_simple.gif')}'>")

{'agent': <agents.LinearAgent object at 0x7fd1a91cf630>, 'weights': array([[-0.06521825],
       [ 0.24005704],
       [ 0.68586647],
       [ 0.91697999],
       [ 0.70325394]]), 'pedigree': ['237', '34', '82', '82', '34M'], 'minimum': 499.0, 'maximum': 499.0, 'mean': 499.0}


## More than two parents

[Wikipedia mentions](https://en.wikipedia.org/wiki/Genetic_algorithm#Genetic_operators) that having more than two parents is beneficial for the genetic algorithm.

In the example below, both cases reach the same solution. The 5-parent case reaches it faster.

In [18]:
initial = []

for a in range(25):
    initial.append(trial_agent(agents.LinearAgent(None, order=2, id=a)))

winner = sorted(initial, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('genetic_parents_test.gif')}'>")

{'agent': <agents.LinearAgent object at 0x7fd1a91b4d68>, 'weights': array([[-0.00606296],
       [ 0.06950005],
       [-0.69813354],
       [ 0.52751006],
       [ 0.55149959],
       [ 0.99437783],
       [-0.82551131],
       [ 0.71378451],
       [-0.70759282],
       [ 0.42861422],
       [ 0.40050821],
       [-0.91465721],
       [ 0.73345818],
       [ 0.50117301],
       [-0.35055604]]), 'pedigree': ['11', '11', '11', '11', '11', '11', '11', '11', '11', '11', '11', '11', '11', '11', '11'], 'minimum': 43.0, 'maximum': 499.0, 'mean': 211.01}


### 2 parents

In [19]:
results = genetic_algorithm(initial, order=2, generations=25)

winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('genetic_2_parents.gif')}'>")

[  1] Population average:  42.7
[  1] Best mean score:    268.7, Pedigree: 11-17-17-11-11-11-11-17-17-11-17-11-11-11-17

[  2] Population average:  98.4
[  2] Best mean score:    316.1, Pedigree: 11-11-5-11-11-23-11-5-11-5-11-23-11-11-5

[  3] Population average: 146.0
[  3] Best mean score:    350.9, Pedigree: 11-11-3-11-11-18-18-11-11-11-6-3-11-11-3

[  4] Population average: 203.5
[  4] Best mean score:    472.5, Pedigree: 11-7-7-11-7-5-7-5-11-6-18-5-5-5-6

[  5] Population average: 227.0
[  5] Best mean score:    469.1, Pedigree: 11-7-7-11-7-11-7-5-11-6-11-5-5-5-6

[  6] Population average: 237.3
[  6] Best mean score:    494.5, Pedigree: 11-7-7-11-7-5-3-5-7-11-7-5-11-23-11M

[  7] Population average: 251.1
[  7] Best mean score:    497.7, Pedigree: 11-7-7-11-7-5-3-5-7-11-7-5-11-23-11M

[  8] Population average: 264.1
[  8] Best mean score:    499.0, Pedigree: 11-7-7-11-7-5-11-5-17-11-7-5-5-11-6

[  8] Best score reached, ending early
{'agent': <agents.LinearAgent object at 0x7fd1a

### 5 parents

In [20]:
results = genetic_algorithm(initial, order=2, n_parents=5, generations=25)

winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('genetic_5_parents.gif')}'>")

[  1] Population average:  26.5
[  1] Best mean score:    229.4, Pedigree: 11-11-11-11-11-11-11-11-11-11-11-11-11-11-11

[  2] Population average:  51.5
[  2] Best mean score:    239.0, Pedigree: 11-11-11-11-11-11-11-11-11-11-11-11-11-11-11

[  3] Population average: 105.9
[  3] Best mean score:    376.9, Pedigree: 11-5M-23-12M-21-11-23-22-14-11-22-11-22-5-5

[  4] Population average: 154.5
[  4] Best mean score:    499.0, Pedigree: 11-16-7-12-3-6-11-21-9-7-9-5-22-22-11

[  4] Best score reached, ending early
{'agent': <agents.LinearAgent object at 0x7fd1a37f3da0>, 'weights': array([[-0.00606296],
       [ 0.39127394],
       [ 0.62161606],
       [ 0.91430088],
       [ 0.791029  ],
       [ 0.76946297],
       [-0.82551131],
       [ 0.70465431],
       [-0.0064681 ],
       [ 0.47557238],
       [-0.96687173],
       [-0.45077262],
       [ 0.98306909],
       [-0.60113155],
       [-0.35055604]]), 'pedigree': ['11', '16', '7', '12', '3', '6', '11', '21', '9', '7', '9', '5', '22', '