# Driving a Car Over the Mountain

Cartpole is pretty easy, but mountain-car is hard. According to [Reddit](https://www.reddit.com/r/MachineLearning/comments/67fqv8/da3c_performs_badly_in_mountain_car/), mountain-car is difficult because rewards only in a win. The gym [wiki page](https://github.com/openai/gym/wiki/MountainCar-v0) also explains this. Unless your car reaches the little flag, you receive a -1 penalty each action you take: agents may be tempted to give up and chill at the bottom of the valley.

In order to solve the mountain-car game, the agent has to build momentum by going left and right. This is like how you can dislodge your car from being stuck in snow by rocking it forwards and backwards. Agents have to experiment a little to figure it out.

The functions are the same from the previous notebook.

In [1]:
# Display GIFs in Jupyter
from IPython.display import HTML

# OpenAI gym
import gym

# Import local script
import agents

# numpy
import numpy as np

# To speed up the algorithm
from multiprocessing import Pool
n_jobs = 4 # Set your number of cores here

In [2]:
def trial_agent(agent, trials=25, limit=200):
    env = gym.make(agent.game)

    scores = []
    for i in range(trials):
        observation = env.reset()
        score = 0
        for t in range(limit):
            action = agent.predict(observation)
            observation, reward, done, info = env.step(action)
            if done:
                break
            score += reward
        scores.append(score)
        
    data_dict = {
        "agent" : agent, 
        "weights" : agent.w, 
        "pedigree" : agent.pedigree, 
        "minimum" : min(scores), 
        "maximum" : max(scores), 
        "mean" : sum(scores)/len(scores)
    }
    
    env.close()
    
    return data_dict

In [3]:
def genetic_algorithm(results, old=5, new=95, n_parents=2, generations=25, 
                      mutation_rate=0.01, mutation_amount=0.5, order=1, max_score=499.0, 
                      game="CartPole-v1"):
    for round in range(generations):
        # Sort agents by score (fitness)
        top_scores = sorted(results, key=lambda x: x["mean"], reverse=True)

        # The survival of the fittest. Wikipedia calls this "elitism".
        # The top agents of a generation are carried over to the next
        survivors = top_scores[:old]

        # To start breeding new agents, I'll mix weights (genes)
        weight_shape = top_scores[0]["weights"].shape
        gene_pool = [list(i["weights"].flatten()) for i in top_scores]
        pedigree_list = [i["pedigree"] for i in top_scores]
        genome_size = top_scores[0]["weights"].size

        # Scores can be negative, so here I make them all positive
        # They also need to sum to 1 for random sampling
        min_score = min([i["mean"] for i in top_scores])
        sum_score = sum([i["mean"]+min_score for i in top_scores])
        probs = [(i["mean"]+min_score)/sum_score for i in top_scores]

        # For each new agent, randomly select parents
        # Higher-fitness agents are likelier to sire new agents
        children = []
        for birth in range(new):
            parents = np.random.choice(np.arange(len(gene_pool)), 
                             size=n_parents, 
                             replace=False, 
                             p=probs)

            # The offspring get a mix of each parent's weights
            # The weights (genes) are simply copied over
            mix = np.random.randint(0, high=n_parents, size=genome_size)

            weights = []
            pedigree = []
            for i in range(genome_size):
                weights.append(gene_pool[parents[mix[i]]][i])
                pedigree.append(pedigree_list[parents[mix[i]]][i])
                # A mutation happens rarely and adds a bit of noise to a gene
                if np.random.random(1) < mutation_rate:
                    weights[i] += float(np.random.normal(0, mutation_amount, 1))
                    pedigree[i] += "M"

            children.append({"weights" : weights, "pedigree" : pedigree})

        # Elitism: the top agents survive to fight another day
        new_agents = [i["agent"] for i in survivors]

        # The offspring are added it
        # With the pedigree variable their ancestors are tracked
        for child in children:
            new_agents.append(
                agents.LinearAgent(
                    np.array(child["weights"]).reshape(weight_shape), 
                    pedigree=child["pedigree"],
                    order=order,
                    game=game))

        # Trial the agents using multiple CPU cores
        p = Pool(n_jobs)
        results = p.map(trial_agent, new_agents)
        p.close()
        
        results = sorted(results, key=lambda x: x["mean"], reverse=True)

        print(f"[{round+1:3}] Population average: {sum([i['mean'] for i in results])/len(results):5.1f}")
        print(f"[{round+1:3}] Best mean score:    {results[0]['mean']:5.1f}, Pedigree: {'-'.join(results[0]['pedigree'])}")
        print()
        
        # End early if maximum is reached
        if results[0]['mean'] >= max_score:
            print(f"[{round+1:3}] Best score reached, ending early")
            break
    return results

## Simple agent

This eventually worked with a bit of patience. My initial failures were from trying [continuous-mountain-car](https://github.com/openai/gym/wiki/MountainCarContinuous-v0) first, which was a bad idea. Once I changed my agent into a softmax regression, the non-continuous mountain-car was fine.

In [4]:
results = []

for a in range(100):
    results.append(trial_agent(agents.LinearAgent(None, id=a, order=1, game="MountainCar-v0")))

winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('mountaincar_0.gif', episodes=5, limit=250)}'>")

{'agent': <agents.LinearAgent object at 0x7f8560b8a240>, 'weights': array([[-0.08561069, -0.54933713,  0.87068654],
       [ 0.63230176, -0.74579003, -0.04338294],
       [-0.50436285,  0.11389632,  0.64390371]]), 'pedigree': ['0', '0', '0', '0', '0', '0', '0', '0', '0'], 'minimum': -199.0, 'maximum': -199.0, 'mean': -199.0}


In [5]:
results = genetic_algorithm(results, generations=10, max_score=-120, mutation_amount=5.0, game="MountainCar-v0")

winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('mountaincar_10.gif', episodes=5, limit=250)}'>")

[  1] Population average: -199.0
[  1] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

[  2] Population average: -199.0
[  2] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

[  3] Population average: -199.0
[  3] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

[  4] Population average: -199.0
[  4] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

[  5] Population average: -199.0
[  5] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

[  6] Population average: -199.0
[  6] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

[  7] Population average: -199.0
[  7] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

[  8] Population average: -199.0
[  8] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

[  9] Population average: -199.0
[  9] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

[ 10] Population average: -199.0
[ 10] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

{'agent': <agents.LinearAgent object at 

In [6]:
results = genetic_algorithm(results, generations=10, max_score=-120, mutation_amount=5.0, game="MountainCar-v0")

winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('mountaincar_20.gif', episodes=5, limit=250)}'>")

[  1] Population average: -199.0
[  1] Best mean score:    -199.0, Pedigree: 0-0-0-0-0-0-0-0-0

[  2] Population average: -199.0
[  2] Best mean score:    -198.6, Pedigree: 30-3-40-2-4-41-2-7-59M

[  3] Population average: -199.0
[  3] Best mean score:    -198.2, Pedigree: 30-3-40-2-4-41-2-7-59M

[  4] Population average: -199.0
[  4] Best mean score:    -198.0, Pedigree: 30-3-3-62-4-41-2-7-59M

[  5] Population average: -199.0
[  5] Best mean score:    -196.3, Pedigree: 30-3-40-11-69-40-2-2-59M

[  6] Population average: -199.0
[  6] Best mean score:    -196.2, Pedigree: 30-3-40-11-69-40-2-2-59M

[  7] Population average: -199.0
[  7] Best mean score:    -197.1, Pedigree: 30-3-40-11-69-40-2-2-59M

[  8] Population average: -198.9
[  8] Best mean score:    -193.7, Pedigree: 2-12-3-62-3-0-2-2-59M

[  9] Population average: -198.9
[  9] Best mean score:    -190.5, Pedigree: 2-12-3-62-3-0-2-2-59M

[ 10] Population average: -198.8
[ 10] Best mean score:    -191.6, Pedigree: 2-12-3-62-3-0-2

In [7]:
results = genetic_algorithm(results, generations=10, max_score=-120, mutation_amount=5.0, game="MountainCar-v0")

winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('mountaincar_30.gif', episodes=5, limit=250)}'>")

[  1] Population average: -198.8
[  1] Best mean score:    -193.5, Pedigree: 2-12-3-62-3-0-2-2-59M

[  2] Population average: -198.8
[  2] Best mean score:    -193.0, Pedigree: 2-12-3-62-3-0-2-2-59M

[  3] Population average: -198.8
[  3] Best mean score:    -192.8, Pedigree: 30-12-3-11-3-0-2-88M-59M

[  4] Population average: -198.8
[  4] Best mean score:    -193.1, Pedigree: 30-12-3-11-3-0-2-88M-59M

[  5] Population average: -198.2
[  5] Best mean score:    -139.4, Pedigree: 92-4M-72-11-3-40-0-69-59M

[  6] Population average: -198.1
[  6] Best mean score:    -131.8, Pedigree: 92-4M-72-11-3-40-0-69-59M

[  7] Population average: -197.3
[  7] Best mean score:    -137.2, Pedigree: 92-4M-72-11-3-40-0-69-59M

[  8] Population average: -197.0
[  8] Best mean score:    -140.8, Pedigree: 92-4M-72-11-3-40-0-69-59M

[  9] Population average: -196.5
[  9] Best mean score:    -142.3, Pedigree: 92-4M-72-11-3-40-0-69-59M

[ 10] Population average: -195.8
[ 10] Best mean score:    -134.9, Pedigre

In [8]:
results = genetic_algorithm(results, generations=10, max_score=-120, mutation_amount=5.0, game="MountainCar-v0")

winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('mountaincar_40.gif', episodes=5, limit=250)}'>")

[  1] Population average: -194.8
[  1] Best mean score:    -133.4, Pedigree: 92-75-72-11-3-40-0-2-59M

[  2] Population average: -194.8
[  2] Best mean score:    -134.4, Pedigree: 92-75-72-11-3-40-0-2-59M

[  3] Population average: -195.1
[  3] Best mean score:    -133.1, Pedigree: 92-4M-72-11-3-40-0-69-59M

[  4] Population average: -194.5
[  4] Best mean score:    -134.0, Pedigree: 92-75-72-11-3-40-0-2-59M

[  5] Population average: -195.6
[  5] Best mean score:    -137.0, Pedigree: 92-75-72-11-3-40-0-2-59M

[  6] Population average: -194.9
[  6] Best mean score:    -136.0, Pedigree: 92-4M-72-11-3M-40-0M-2-59M

[  7] Population average: -193.8
[  7] Best mean score:    -128.7, Pedigree: 92-4M-72-11-3M-40-0M-2-59M

[  8] Population average: -194.9
[  8] Best mean score:    -130.2, Pedigree: 92-4M-72-11-3-40-0-2-59M

[  9] Population average: -195.2
[  9] Best mean score:    -122.4, Pedigree: 92-4M-72-11-3M-40-0M-3-59

[ 10] Population average: -193.6
[ 10] Best mean score:    -126.6, 

In [9]:
results = genetic_algorithm(results, generations=10, max_score=-120, mutation_amount=5.0, game="MountainCar-v0")

winner = sorted(results, key=lambda x: x["mean"], reverse=True)[0]

print(winner)

HTML(f"<img src='{winner['agent'].render('mountaincar_50.gif', episodes=5, limit=250)}'>")

[  1] Population average: -193.7
[  1] Best mean score:    -124.4, Pedigree: 92-75-72-11-0M-40-2-7M-59M

[  2] Population average: -192.0
[  2] Best mean score:    -120.9, Pedigree: 92-4M-72-11-3M-40-0M-3-59

[  3] Population average: -192.4
[  3] Best mean score:    -112.4, Pedigree: 92-75-72-11-3-0M-0M-7-59M

[  3] Best score reached, ending early
{'agent': <agents.LinearAgent object at 0x7f85535af278>, 'weights': array([[ 0.99252738,  0.40209843,  0.77867077],
       [-0.49730536, -0.11059926, -1.07767823],
       [-9.7163576 , -0.18688903,  6.46876156]]), 'pedigree': ['92', '75', '72', '11', '3', '0M', '0M', '7', '59M'], 'minimum': -117.0, 'maximum': -111.0, 'mean': -112.4}


The little car learns to tear right out of that valley! :-)