Hi,
In the active learning for regression example, we have used gaussian processes. While the sklearn version seems to keep its length scale and noise parameters static ( maybe i am doing something wrong), other implementations allow for modifying these using gradient descent ( for e.g. gpytorch).
Under batch learning circumstances we would have the whole train set , and tuning the hyperparameters to maximize log likelihood makes sense, but does it also make sense to do so while performing active learning, and having datasets of really small sizes?