## smartcab project


### Implement a Basic Driving Agent

To begin, your only task is to get the **smartcab** to move around in the enviro
nment. At this point, you will not be concerned with any sort of optimal driving
 policy. Note that the driving agent is given the following information at each 
intersection:
- The next waypoint location relative to its current location and heading.
- The state of the traffic light at the intersection and the presence of oncomin
g vehicles from other directions.
- The current time left from the allotted deadline.

To complete this task, simply have your driving agent choose a random action fro
m the set of possible actions (`None`, `'forward'`, `'left'`, `'right'`) at each
 intersection, disregarding the input information above. Set the simulation dead
line enforcement, `enforce_deadline` to `False` and observe how it performs.


In [13]:
import random
from smartcab.environment import Agent, Environment
from smartcab.planner import RoutePlanner
from smartcab.simulator import Simulator
from smartcab.agent import LearningAgent
from collections import OrderedDict
import numpy as np

In [14]:
e = Environment()
a = e.create_agent(LearningAgent)
a.env.valid_actions

[None, 'forward', 'left', 'right']

In [20]:
a.env.valid_actions[np.random.randint(3)]  # choose a random action

'left'

***QUESTION:*** _Observe what you see with the agent's behavior as it takes random actions. Does the **smartcab** eventually make it to the destination? Are there any other interesting observations to note?_

With a random walk, in 100 trials, about half of the smartcab have reached the destination within the hard time limit(100), but not within the deadline. This observation may depend on our route planner which always point to the final destination, and simple grid network(8*6).

### Inform the Driving Agent

Now that your driving agent is capable of moving around in the environment, your next task is to identify a set of states that are appropriate for modeling the **smartcab** and environment. The main source of state variables are the current inputs at the intersection, but not all may require representation. You may choose to explicitly define states, or use some combination of inputs as an implicit state. At each time step, process the inputs and update the agent's current state using the `self.state` variable. Continue with the simulation deadline enforcement `enforce_deadline` being set to `False`, and observe how your driving agent now reports the change in state as the simulation progresses.



In [26]:
a.env.agent_states[a]

{'heading': (0, 1), 'location': (8, 3)}

***QUESTION:*** _What states have you identified that are appropriate for modeli
ng the **smartcab** and environment? Why do you believe each of these states to 
be appropriate for this problem?_


***OPTIONAL:*** _How many states in total exist for the **smartcab** in this env
ironment? Does this number seem reasonable given that the goal of Q-Learning is 
to learn and make informed decisions about each state? Why or why not?_


### Implement a Q-Learning Driving Agent

With your driving agent being capable of interpreting the input information and 
having a mapping of environmental states, your next task is to implement the Q-L
earning algorithm for your driving agent to choose the *best* action at each tim
e step, based on the Q-values for the current state and action. Each action take
n by the **smartcab** will produce a reward which depends on the state of the en
vironment. The Q-Learning driving agent will need to consider these rewards when
 updating the Q-values. Once implemented, set the simulation deadline enforcemen
t `enforce_deadline` to `True`. Run the simulation and observe how the **smartca
b** moves about the environment in each trial.

The formulas for updating Q-values can be found in [this](https://classroom.udac
ity.com/nanodegrees/nd009/parts/0091345409/modules/e64f9a65-fdb5-4e60-81a9-72813
beebb7e/lessons/5446820041/concepts/6348990570923) video.


***QUESTION:*** _What changes do you notice in the agent's behavior when compare
d to the basic driving agent when random actions were always taken? Why is this 
behavior occurring?_


### Improve the Q-Learning Driving Agent

Your final task for this project is to enhance your driving agent so that, after
 sufficient training, the **smartcab** is able to reach the destination within t
he allotted time safely and efficiently. Parameters in the Q-Learning algorithm,
 such as the learning rate (`alpha`), the discount factor (`gamma`) and the expl
oration rate (`epsilon`) all contribute to the driving agent?s ability to learn 
the best action for each state. To improve on the success of your **smartcab**:
- Set the number of trials, `n_trials`, in the simulation to 100.
- Run the simulation with the deadline enforcement `enforce_deadline` set to `Tr
ue` (you will need to reduce the update delay `update_delay` and set the `displa
y` to `False`).
- Observe the driving agent?s learning and **smartcab?s** success rate, particul
arly during the later trials.
- Adjust one or several of the above parameters and iterate this process.

This task is complete once you have arrived at what you determine is the best co
mbination of parameters required for your driving agent to learn successfully. 


***QUESTION:*** _Report the different values for the parameters tuned in your ba
sic implementation of Q-Learning. For which set of parameters does the agent per
form best? How well does the final driving agent perform?_


***QUESTION:*** _Does your agent get close to finding an optimal policy, i.e. re
ach the destination in the minimum possible time, and not incur any penalties? H
ow would you describe an optimal policy for this problem?_