
## Motivation
A key problem in deep learning is data efficiency. While excellent performance can be obtained with modern tools, these are often data-hungry, rendering the deployment of deep learning in the real-world challenging for many tasks.  In Active Learning we use a “human in the loop” approach to data labelling, reducing the amount of data that needs to be labelled drastically, and making machine learning applicable when labelling costs would be too high [1].





In [0]:
from IPython.display import clear_output, Image, HTML, display

## Passive Learning
Tasks which involve gathering a large amount of data randomly sampled from the underlying distribution and using this large dataset to train a model that can perform some sort of prediction. This method has many disadvantages:
* Too many wasted samples. 
* Learning is limited by sampling resolution 

<img src=https://imgur.com/M7Afc39.png width="400">

## Active Learning: Definition and Concepts
Active Learning is a semi-supervised technique whose main hypothesis is that if a learning algorithm can select the data it wants to learn from, it can perform better than traditional methods with significantly less data for training. In other words, **Active Learning (AL) is an interactive approach to simultaneously build a labelled data set and train a machine learning model.**

**How to make machines curious to Learn?**


### AL algorithm:

1. A relatively large unlabeled dataset is gathered.
2. A domain expert labels a few positive examples in the dataset.

<img src=https://imgur.com/BHi6GRx.png width="400">

3. A classifier is trained on labeled samples.
4. The classifier is applied to the rest of the corpus.

<img src=https://imgur.com/ZPya6mX.png width="400">



5. Few most “useful” examples are selected (e.g., that increase classification performance).
6. The examples labeled by the expert are added to the training set.

<img src=https://imgur.com/yCSp1kU.png width="400">

7. Goto 3.


<img src=https://imgur.com/smisThj.png width="400">

<img src=https://imgur.com/NuD954f.png width="400">

## Active Learning for Robotics:
In Reinforcement Learning the reward function are the feedback signal (labels) that the agent uses to learn a new skill or achieve a new task. Real world tasks often involve high dimensional observations, like images. Unfortunately, in practice, the design of reward functions for robotic skills is very challenging, especially when learning skills from raw observations such as images. 

<img src=https://bair.berkeley.edu/static/blog/end_to_end/drape.gif width="200">
<img src=https://bair.berkeley.edu/static/blog/end_to_end/push.gif width="200">

One solution to learn the **rewards** for such tasks from a small number of user-provided goal examples and samples collected by the policy. The RL algorithm then utilizes this updated classifier as reward for learning a policy to achieve the desired goal, and this alternating process continues until the samples collected by the policy are indistinguishable from the user-proved goal examples [2].



### Method: 
1. User provides examples of successful outcomes (80 user-provided examples are used in [2]).
2. Learn a reward function on images using a success classifier, in [2]
they used a convolutional neural network for learning a success classifier on image data.
3. Run RL with this reward
4. Actively query the human user: These active queries are selected based
on uncertainty estimates from the classifier that is being used as a reward function, and allow us to learn effective rewards from a small number of initial examples.


<img src=https://imgur.com/mGli9UF.png width="500">


Some example queries made by the algorithm, and the corresponding labels provided by a human user. This data is fed back into the classifier.

<img src=https://imgur.com/v41mHCE.png width="500">

## Classification Uncertainty 
The simplest measure is the uncertainty of classification defined by ([3] slide 22) $$U(x)=1-P(\hat{x}|x)$$

where x is the instance to be predicted and x^ is the most likely prediction. 

For example [5], if you have classes [0, 1, 2] and classification probabilities [0.1, 0.2, 0.7], the most likely class according to the classifier is 2 with uncertainty 0.3. If you have three instances with class probabilities

In [0]:
import numpy as np

proba = np.array([[0.1 , 0.85, 0.05],
                  [0.6 , 0.3 , 0.1 ],
                  [0.39, 0.61, 0.0 ]])

the corresponding uncertainties are:

In [0]:
1 - proba.max(axis=1)

In the above example, the most uncertain sample is the second one. When querying for labels based on this measure, the strategy selects the sample with the highest uncertainty.

## Active Learning on MNIST
First we will introduce `modAL`, a modular active learning framework for Python. modAL is built on top of scikit-learn, but you can also use TensorFlow Keras or PyTorch models, if those are your frameworks of choice.

In [0]:
!pip install modAL

To start, you will import the packages:

In [0]:
import tensorflow as tf
## visualization
from matplotlib import pyplot as plt

## Dataset preparation
**MNIST** is a dataset which is 60K pictures of handwritten digits with labels and 10K test samples.

In [0]:
# Load and prepare the MNIST dataset.
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# create a small dataset to make the training faster
x_train = x_train[:5000]
y_train = y_train[:5000]

# Take 400 samples for validation  neural network will never see it during the training
x_test = x_test[:400]
y_test = y_test[:400]

### Verify the data
To verify that the dataset looks correct, let's plot the first 25 images from the training set and display the class name below each image.

In [0]:
class_names = ['0', '1', '2', '3', '4',
               '5', '6', '7', '8', '9']

plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([]) # Disable xticks (text labels)
    plt.yticks([]) # Disable yticks (text labels)
    plt.grid(False)
    plt.imshow(x_train[i], cmap=plt.cm.binary)
    # The Minist labels happen to be arrays, 
    # which is why you need the extra index
    plt.xlabel(class_names[y_train[i]])
plt.show()

### Data Preprocessing
Add a channels dimension

In [0]:
print ("The shape of training examples:  " , x_train.shape)

In [0]:
x_train = x_train[..., tf.newaxis]
x_test = x_test[..., tf.newaxis]

For normalization, we can divide the grayscale image points by 255.

In [0]:
# Convert the samples from integers to floating-point numbers:
x_train = x_train / 255.0
x_test = x_test / 255.

### Pool-Based Sampling
In pool-based sampling the machine has access to a large number of examples  (in our example 1000 samples) and samples from a pool (4000 samples) based on “informativeness.” Informativeness is quantified based on a user-selected metric.

In [0]:
# assemble initial data
n_initial = 1000
initial_idx = np.random.choice(range(len(x_train)), size=n_initial, replace=False)
x_initial = x_train[initial_idx]
y_initial = y_train[initial_idx]

# generate the pool
# remove the initial data from the training dataset
x_pool = np.delete(x_train, initial_idx, axis=0)
y_pool = np.delete(y_train, initial_idx, axis=0)

print ("The shape of training examples:  " , x_initial[0].shape)
print ("The number of all training examples: ", x_train.shape[0])
print ("The number of training examples with labels: ", x_initial.shape[0])
print ("The number of training examples without labels: ", x_pool.shape[0])
print ("The number of testing examples:  ", x_test.shape[0])
print ("The number of testing examples:  ", x_test.shape[0])

### Create the Model
Build the `tf.keras.Sequential` model by stacking layers. 


In [0]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(10, activation='softmax')
])

Let's display the architecture of our model so far.

In [0]:
model.summary()

Above, you can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as you go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, you can afford (computationally) to add more output channels in each Conv2D layer.

The next step is to choose a loss function for training. The `losses.SparseCategoricalCrossentropy` loss compares the predicted label and true label and calculates the loss. It is used in multi-class classification.

In [0]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)

Many models train better if you gradually reduce the learning rate during training. Use `optimizers.schedules` to reduce the learning rate over time:

In [0]:
STEPS_PER_EPOCH = 1

lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
  0.001,
  decay_steps=STEPS_PER_EPOCH*1000,
  decay_rate=1,
  staircase=False)

optimizer = tf.keras.optimizers.Adam(lr_schedule)

### Compile and train the model

In [0]:
model.compile(optimizer=optimizer,
              loss=loss_object,
              metrics=['accuracy'])

In [0]:
model.fit(
          x_train, 
          y_train,
          epochs=3,
          validation_data=(x_test, y_test),
          verbose=1)

In [0]:
# evaluate has two output (test_loss, test_acc) only the accuracy output will be used 
_, test_acc = model.evaluate(x_test, y_test)
print("the accuracy of model after 3 epochs of training: ", test_acc*100, "%")

### Active Learning
Suppose that you can query the label of an unlabelled instance, but it costs you a lot. Which one would you choose? By querying an instance in the uncertain region, surely you obtain more information than querying by random. 

The key components of any workflow are the **model** you choose, the **informativeness** measure you use and the **query** strategy you apply to request labels. modAL was designed with modularity, flexibility and extensibility in mind. With using the scikit-learn API, it allows you to rapidly create active learning workflows with nearly complete freedom. What is more, you can easily replace parts with your custom built solutions, allowing you to design novel algorithms with ease. With it, instead of choosing from a small set of built-in components, you have the freedom to seamlessly integrate scikit-learn, TensorFlow/Keras or PyTorch models into your algorithm and easily tailor your custom query strategies and uncertainty measures.

In [0]:
from modAL.models import ActiveLearner
from modAL.uncertainty import uncertainty_sampling # , classifier_margin, classifier_entropy  

Use the same model

In [0]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(pool_size=(2, 2)),
    tf.keras.layers.Conv2D(64, kernel_size=(3, 3), activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer=optimizer,
              loss=loss_object,
              metrics=['accuracy'])

In [0]:
# initialize ActiveLearner
learner = ActiveLearner(
    estimator=model,
    query_strategy=uncertainty_sampling,
    # the following argumets will be used to train the model (model.fit)
    X_training=x_initial, 
    y_training=y_initial,
    epochs=2,
    validation_data=(x_test, y_test),
    verbose=1
)

Evaluate the model on the test data using `evaluate`



In [0]:
# We'll plot the accuracy scores after we have actively queried the human user.
accuracy_scores = []
# evaluate has two output (test_loss, test_acc) only the accuracy output will be used 
_, test_acc = model.evaluate(x_test, y_test)
print("the accuracy of model after 2 epochs of training: ", test_acc*100, "%")
# add test_acc to the list
accuracy_scores.append(test_acc)

Actively query the human user

In [0]:
# query for labels
query_idx, query_inst = learner.query(x_pool)
# training=False is needed only if there are layers with different
# behavior during training versus inference (e.g. Dropout).
predictions = model(query_inst, training=False)[0]
# the predicted class:
class_idx = tf.argmax(predictions).numpy()

# print the results
print("the index of query: ", query_idx)
print("the network prediction: ", class_idx)
print("the network output: ", predictions.numpy())
plt.imshow(query_inst[0].reshape([28, 28]), cmap=plt.cm.binary)

In [0]:
# supply label for queried instance
user_answer = [6]
learner.teach(x_pool[query_idx], user_answer)
print("The right answer was: ", y_pool[query_idx])

Remove this query from the pool

In [0]:
x_pool, y_pool = np.delete(x_pool, query_idx, axis=0), np.delete(y_pool, query_idx, axis=0)

Evaluate the model using the test data after the first query

In [0]:
# evaluate has two output (test_loss, test_acc) only the accuracy output will be used 
_, test_acc = model.evaluate(x_test, y_test)
# add test_acc to the list
accuracy_scores.append(test_acc)
print("the accuracy of model after the first query ", test_acc*100, "%")

### Active Learning Loop

In [0]:
n_queries = 6
for i in range(n_queries):
    query_idx, query_inst = learner.query(x_pool)
    plt.title('Digit to label')
    plt.imshow(query_inst.reshape(28, 28), cmap=plt.cm.binary)
    plt.show()
    print("Which digit is this?")
    y_new = np.array([int(input())], dtype=int)
    print("The right answer was: ",y_pool[query_idx])
    learner.teach(query_inst, y_new)
    x_pool, y_pool = np.delete(x_pool, query_idx, axis=0), np.delete(y_pool, query_idx, axis=0)
    _,test_acc = model.evaluate(x_test, y_test, verbose=0)
    accuracy_scores.append(test_acc)

In [0]:
print("the accuracy of model after the five queries ", test_acc*100, "%")

In [0]:
with plt.style.context('seaborn-white'):
      plt.figure(figsize=(10, 5))
      plt.title('Accuracy of your model')
      plt.plot(range(n_queries+2), accuracy_scores)
      plt.scatter(range(n_queries+2), accuracy_scores)
      plt.xlabel('number of queries')
      plt.ylabel('accuracy')
      plt.show()

## References:
[1] A. Kirsch, "BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning"  
[2] A. Singh, "End-to-End Robotic Reinforcement Learning without Reward Engineering"  
[3] J.M. Zöllner, "Machine Learning 2, Lecture 4"  
[4] https://www.tensorflow.org/tutorials/  
[5] https://modal-python.readthedocs.io/   
[6] https://medium.com/@ODSC/crash-course-pool-based-sampling-in-active-learning-cb40e30d49df   
[7] https://sites.google.com/view/reward-learning-rl/home
