### How to interact with the environment

Choose the gym problem using the gym.make(*arg*) function:
> env = gym.make(environment_name)

Before starting the problem the environment must be reset, using:
> env.reset()

After every iteration you need to render the environment:
> env.render()

The step function takes one argument, an *action*, which is an object defined by the environment and returns 4 values:
> observation, reward, done, info = env.step(action)

1. observation (object) = an environment specific response
2. reward (float) = amount of reward achieved from previous action
3. done (boolean) = whether it is time to reset the environment again.
4. info (dict) = diagnostic info for debugging


### Below is an example of how to run the environment

In [1]:
import gym

# Use Cartpole-v0, Pendulum-v0, etc. when selecting an environment
env = gym.make('CartPole-v0')


for i_episode in range(20):
    observation = env.reset()
    for t in range(1000):
        env.render()
        print(t, observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break

[2017-11-11 12:16:01,634] Making new env: CartPole-v0


0 [ 0.00159543 -0.00655307  0.0340134   0.00141308]
1 [ 0.00146437 -0.20214589  0.03404166  0.30463063]
2 [-0.00257855 -0.00752519  0.04013427  0.02287502]
3 [-0.00272905  0.18699891  0.04059177 -0.25687984]
4 [ 0.00101092 -0.00867836  0.03545418  0.04832499]
5 [ 0.00083736  0.18591774  0.03642068 -0.23296439]
6 [ 0.00455571  0.38050088  0.03176139 -0.51394022]
7 [ 0.01216573  0.57516145  0.02148258 -0.79644754]
8 [ 0.02366896  0.76998212  0.00555363 -1.08229572]
9 [ 0.0390686   0.96503033 -0.01609228 -1.37323076]
10 [ 0.05836921  1.16034973 -0.0435569  -1.67090285]
11 [ 0.0815762   1.35594982 -0.07697495 -1.97682639]
12 [ 0.1086952   1.16171831 -0.11651148 -1.70895013]
13 [ 0.13192956  0.96811244 -0.15069048 -1.45468715]
14 [ 0.15129181  1.1647288  -0.17978423 -1.79040503]
Episode finished after 15 timesteps
0 [-0.03145225 -0.00640108  0.04888772 -0.01015412]
1 [-0.03158027 -0.20218882  0.04868464  0.29754388]
2 [-0.03562405 -0.39796975  0.05463551  0.60517494]
3 [-0.04358344 -0.59381

Every environment has Space objects, that describe valid actions.
<break>
Discrete space allows a **fixed range** of **non-negative** numbers.
<break>
The Box space allows an n-dimensional box, so valid observations will carry an array of 4 nubmers.

In [33]:
import gym
env = gym.make('Pendulum-v0')
print(env.action_space)

print(env.observation_space)



[2017-09-29 13:27:35,937] Making new env: Pendulum-v0


Box(1,)
Box(3,)


Check the boxes bounds using:
> - env.observation_space.high
** *or* **   
- env.observation_space.low

In [35]:
print('The highest action space is:\n',env.action_space.high)

print('The lowest action space is:\n',env.action_space.low)

print('\n')

print('The highest observation space is:\n',env.observation_space.high)

print('The lowest observation space is:\n',env.observation_space.low)

The highest action space is:
 [ 2.]
The lowest action space is:
 [-2.]


The highest observation space is:
 [ 1.  1.  8.]
The lowest observation space is:
 [-1. -1. -8.]


So what this means is that when you recieve an action it comes as an array of continuous values, and when you upload an **action** it must come in the form of an array (a **numpy.ndarray** with one continuous value between -2 and 2.

Below you can see extra methods and techniques to aid in the development of the environment.

1. To create a sapce object use the spaces package
2. With each spaces object you can check if it contains a particular value/vector using the *contains()* method. <p>REMEMBER: Use a numpy.ndarray and not an Array
3. 

In [1]:
from gym import spaces
from numpy import ndarray
import numpy as np

space = spaces.Box(-3,5,(3,))  # 1 dimensional array with high of 5 and low of -3

# x = space.sample()
# print(type(x))

# x = ndarray((2,),buffer=np.array[2,2],offset=np.int_)

y = np.ndarray((3,), buffer=np.array([-1.0,-3,4.64887352723]),dtype=float)

print(y)

assert space.contains(np.array([1.,2.,3.]))
assert space.contains(y)

# assert space.n == 2

[-1.         -3.          4.64887353]


In [27]:
assert(space.contains(7))

In [30]:
print(env.observation_space.contains)

<bound method Box.contains of Box(4,)>


In [53]:
import gym
env = gym.make('Pendulum-v0')

print(env.action_space)

print(env.observation_sxpace)

# print('The highest action space is:\n',env.action_space.high)

# print('The lowest action space is:\n',env.action_space.low)

print()#'\n')

print('The highest observation space is:\n',env.observation_space.high)

print('The lowest observation space is:\n',env.observation_space.low)

[2017-09-29 17:49:26,419] Making new env: Pendulum-v0


Box(1,)
Box(3,)

The highest observation space is:
 [ 1.  1.  8.]
The lowest observation space is:
 [-1. -1. -8.]


In [4]:
import gym

# Use Cartpole-v0, Pendulum-v0, etc. when selecting an environment
env = gym.make('Pendulum-v0')


for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        print(t, observation)
        action = env.action_space.sample()
        print(action)
        action = [1]
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            



[2017-11-06 00:41:01,433] Making new env: Pendulum-v0


0 [-0.72184538 -0.69205437  0.43605584]
[-1.88808296]
1 [-0.71952243 -0.6944692   0.06701506]
[-0.83069041]
2 [-0.72998926 -0.68345862 -0.30383684]
[ 1.89201033]
3 [-0.75235372 -0.65875935 -0.6664308 ]
[ 0.36733952]
4 [-0.78466329 -0.61992219 -1.01050032]
[-0.82794581]
5 [-0.82399428 -0.56659811 -1.32544196]
[-0.84666871]
6 [-0.86664816 -0.49891979 -1.60039055]
[-1.49842211]
7 [-0.90849708 -0.41789121 -1.82458039]
[-0.67677084]
8 [-0.94548263 -0.32567253 -1.9879988 ]
[-0.19870334]
9 [-0.97420841 -0.22565009 -2.08225319]
[ 0.85488085]
10 [-0.99250188 -0.12222934 -2.10149076]
[ 0.26659418]
11 [-0.99979238 -0.0203762  -2.04316276]
[-1.83279644]
12 [-0.99718547  0.07497421 -1.90844492]
[-0.61604077]
13 [-0.98720252  0.15947157 -1.70221425]
[-1.79600534]
14 [-0.9732577   0.22971601 -1.43261058]
[ 1.75504609]
15 [-0.95901187  0.2833659  -1.11032358]
[-0.29806197]
16 [-0.94774902  0.3190169  -0.74779915]
[ 0.3952942]
17 [-0.94187809  0.33595486 -0.35853647]
[ 0.92113291]
18 [-0.94260539  0.33

In [5]:
type(info)

dict

In [9]:
reward

-2.9939712779992167

## What are the control inputs & Sensory Outputs

The **control** signals are in the form of `actions`, passed as a `numpy.ndarray` to the `env.step()` function.

The **sensory** outputs are the `observations`, also in the form fo a `numpy.ndarray`.

Furthermore you recieve a `reward`, the goal of any environment is always to **increase** your reward.

`done` is a **signal to reset** the environment.

Finally, `info` is a dict that may contain useful **debugging** information.

## Create a noisy signal

In [1]:
import gym
import numpy as np

# Use Cartpole-v0, Pendulum-v0, etc. when selecting an environment
env = gym.make('Pendulum-v0')

sin = np.sin # sin function
deg = 2*np.pi/360 # Multiply a number by deg to convert from radians to degrees

observation = env.reset()

for t in range(1000):
    env.render()
    print(t, observation)
    t_input = sin(deg*t)*1000
    action = np.ndarray((1,), buffer=np.array(t_input),dtype=float)
    print(action)
    observation, reward, done, info = env.step(action)
    if done:
        print("Episode finished after {} timesteps".format(t+1))
        break




[2017-11-06 00:41:59,936] Making new env: Pendulum-v0


0 [-0.35044782  0.93658226  0.23736629]
[ 0.]
1 [-0.39405493  0.91908689  0.93980298]
[ 17.45240644]
2 [-0.48073722  0.87686472  1.92911815]
[ 34.8994967]
3 [-0.60186436  0.79859833  2.88676669]
[ 52.33595624]
4 [-0.74137662  0.67108919  3.78571544]
[ 69.75647374]
5 [-0.87458099  0.48487947  4.58903233]
[ 87.15574275]
6 [-0.97047818  0.24118892  5.25269193]
[ 104.52846327]
7 [-0.99907184 -0.0430751   5.73358362]
[ 121.86934341]
8 [-0.94169873 -0.33645727  6.00127729]
[ 139.17310096]
9 [-0.7987395  -0.601677    6.04893434]
[ 156.43446504]
10 [-0.58939803 -0.80784278  5.89767658]
[ 173.64817767]
11 [-0.3435776  -0.93912429  5.5917945 ]
[ 190.80899538]
12 [-0.09122419 -0.99583038  5.18745128]
[ 207.91169082]
13 [ 0.14516302 -0.98940775  4.74057849]
[ 224.95105434]
14 [ 0.35283932 -0.93568393  4.29852268]
[ 241.9218956]
15 [ 0.52731878 -0.84966753  3.89675973]
[ 258.8190451]
16 [ 0.66941222 -0.74289116  3.55950909]
[ 275.63735582]
17 [ 0.782415   -0.62275739  3.30234072]
[ 292.37170472]
18

In [35]:
env.render(close=True)

In [33]:
deg = 2*np.pi/360 # Multiply a number by deg to convert from radians to degrees

In [30]:
deg

0.017453292519943295

## Microbial GA

In [12]:
import numpy as np

def microbialGA(B,V,pop,gen,MAX_VOLUME,generate,mut,local,cross):
    # B             - benefit of each gene
    # V             - volume of each gene
    # MAX_VOLUME    - maximum volume of the algorithm
    # pop           - size of population
    # gen           - number of generations/matchups. The Algorithm considers the first
    #                 initialised generation as part of the 'gen' generations
    # generate      - 1, to randomly generate population genomes, 0 to set to nothing
    # mut           - Mutation proportion (1 would be 1/length(B))
    # local         - size of the local/neighbourhood search space for a competitor
    # cross         - crossover probability of genes

    return [winner,winnerInd,fitRec,popGens]

    Need to get to a target string.
    [.5,.6,.1,.3,.2]
    from
    [1,0,1,1,1,0]

In [238]:
def fitness(population,target):
    """Returns fitness of a population """
    
    assert population.shape[1] ==  target.shape[0] 
    distance = np.subtract(population,target)
    squares = np.square(distance)
    fitnesss = -(squares.sum(axis=1))
    return fitnesss

In [241]:
def fitness2(geneome,target):
    """Returns fitness of an individual"""
    distance = np.subtract(geneome,target)
    squares = np.square(distance)
    fitnesss = -(squares.sum())
    return fitnesss

In [244]:
def mutate(geneome,mut_rate,std_dev=1,low=None,high=None):
    change = np.random.randn()*std_dev
    geneome = geneome + change
    if low:
        if geneome < low:
            geneome = low
    if high:
        if geneome > high:
            geneome = high
       
    return geneome

In [245]:
def microbial(population,target,gen,mut_rate=None,high=None,
              low=None,std_dev=None,locality=5,crossover=0.5):
    """
    Microbial GA for real values
    
    Args:
        population - 2 dimensonal array, 0th dimension = each individul
        target - the target array
        gen - number of competitions
        mut_rate - mution chance (0-1)
        high - high for a gene
        low - low value for a gene
        std_dev - standard deviation change of mutation
        locality - how close in population changes are (given as a window. therefore 3 will look 3 to the right.)
        crossover - crossover probability
    """
    assert 0 <= gen
    
    genes = len(target)
    
    fit_rec = np.zeros((gen+1,),dtype=int)  # fitness record
    
    pop_fits = fitness(population,target)  # population fitness'
    print('pop fit',pop_fits)
    best_loc = np.argmax(pop_fits)  # fittest individual
    fit_rec[0] = pop_fits[best_loc]  # update fitness for 0th generation
    
    if gen == 0:
        return population[best_loc], fit_rec[0]
    
    # RUN FOR GEN GENERATIONS
    for g in range(0,gen,1):
        first = np.random.randint(population.shape[0])
        second = np.random.randint(locality)+1
        second = (first + second) % genes
        
        # FIND THE FITTEST
        if pop_fits[first] > pop_fits[second]:
            winner =  first
            loser = second
        else:
            winner = second
            loser = first
        
        # CROSSOVER
        for gene in range(genes):
            if np.random.rand() < crossover:
                population[loser][gene] = population[winner][gene]
                
            if np.random.rand() < mut_rate:
                if not std_dev:
                    std_dev = (high - low)/8
                population[loser][gene] = mutate(population[loser][gene],mut_rate,high,low,std_dev)
                
        pop_fits[loser] = fitness2(population[loser], target)
        best_loc = np.argmax(pop_fits)  # fittest individual
        fit_rec[g] = pop_fits[best_loc]  # update fitness for 0th generation
    
    return fit_rec, population, best_loc


    


## Testing:

- zero generations
- population of 0,1,10
- genome of 0,1,10

In [258]:
population_size = 100  # size of population

target = np.array([.5,.4,.3,.6,.7,.5,.4,.3,.6,.7,.5,.4,.3,.6,.7,.5,.4,.3,.6,.7])
gen = 100  # Number of generations
genes = len(target)
population = np.random.rand(population_size,genes)  # Random population
mut_rate=1/genes
high = 1
low = 0
std_dev = 0.01


fit_rec, population, best_loc = microbial(population,target,gen,mut_rate,high,low,std_dev);



(100, 20)
(100, 20)
(100,)
pop fit [-1.78231731 -2.44322635 -2.34297062 -1.3229577  -2.22403625 -1.9388414
 -2.3143494  -3.46901384 -1.00448564 -2.488112   -2.05057925 -1.46148271
 -2.37095382 -2.27983076 -2.29700934 -1.85886381 -1.78888629 -2.06022927
 -2.00817618 -2.1639418  -1.66293608 -1.79454997 -2.18655352 -1.89777246
 -1.9493583  -1.48624629 -2.7457098  -2.15018056 -1.48577011 -1.98168966
 -2.25188683 -1.65639927 -1.93108944 -2.08747322 -1.12960477 -1.31351796
 -2.22097886 -2.30101604 -1.78984594 -2.0187899  -1.4190773  -2.46652922
 -1.86489972 -1.52674242 -2.38617712 -2.53678867 -1.39319287 -2.0167824
 -1.16692063 -1.51671078 -1.65289208 -2.65779482 -2.71548534 -2.82197893
 -2.04199059 -2.77725576 -1.64276291 -1.48114571 -2.1138224  -0.92561744
 -1.30760766 -1.73617407 -3.0032328  -1.78195604 -2.16908119 -2.41036345
 -2.04276047 -2.0391628  -2.40590902 -2.24111668 -1.71867447 -1.52438194
 -2.58225103 -1.28040022 -0.96511826 -1.5366828  -3.01191674 -1.79143452
 -3.02687395 -1.65

In [262]:
print(population[best_loc])

[ 0.58796359  0.15613047  0.45220443  0.74550569  0.711187    0.48420162
  0.37280222  0.63079244  0.3234952   0.59458472  0.60448037  0.48359388
  0.44094051  0.49121987  0.55419981  0.88611335  0.28471837  0.85739956
  0.84221821  0.8156183 ]


In [1]:
import matplotlib.pyplot as plt
plt.plot(fit_rec)
plt.ylabel('some numbers')
plt.show()

NameError: name 'fit_rec' is not defined

In [6]:
env.close()