#### Exercise 1
In this exercise you will create the  𝑄  and target  𝑄̂   networks and set the optimizer. Remember that the Deep  𝑄 -Network (DQN) is a neural network that approximates the action-value function  𝑄(𝑠,𝑎)≈𝑄∗(𝑠,𝑎) . It does this by learning how to map states to  𝑄  values.

In [2]:
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.losses import MSE
from tensorflow.keras.optimizers import Adam

In [9]:
state_size = (8,)
num_actions = 4
q_network = Sequential([
    Input(shape=state_size),
    Dense(units=64, activation='relu'),
    Dense(units=64, activation='relu'),
    Dense(units=num_actions, activation='linear')    
    ])

# Create the target Q^-Network.
target_q_network = Sequential([
    Input(shape=state_size),
    Dense(units=64, activation='relu'),
    Dense(units=64, activation='relu'),
    Dense(units=num_actions, activation='linear')
    ])

# optimizer = Adam(learning_rate=ALPHA)


#### Exercise 2

In this exercise you will implement line ***12*** of the algorithm outlined in *Fig 3* above and you will also compute the loss between the $y$ targets and the $Q(s,a)$ values. In the cell below, complete the `compute_loss` function by setting the $y$ targets equal to:

$$
\begin{equation}
    y_j =
    \begin{cases}
      R_j & \text{if episode terminates at step  } j+1\\
      R_j + \gamma \max_{a'}\hat{Q}(s_{j+1},a') & \text{otherwise}\\
    \end{cases}       
\end{equation}
$$

In [11]:
def compute_loss(experiences, gamma, q_network, target_q_network):

    
    # Unpack the mini-batch of experience tuples.
    states, actions, rewards, next_states, done_vals = experiences
    
    # Compute max Q^(s,a).
    max_qsa = tf.reduce_max(target_q_network(next_states), axis=-1)
    
    # Set y = R if episode terminates, otherwise set y = R + γ max Q^(s,a).
    y_targets = rewards + (gamma * max_qsa * (1 - done_vals))
        
    # Get the q_values.
    q_values = q_network(states)
    q_values = tf.gather_nd(q_values, tf.stack([tf.range(q_values.shape[0]),
                                                tf.cast(actions, tf.int32)], axis=1))
        
    # Compute the loss.
    loss = MSE(y_targets, q_values) 
    
    return loss