Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: simplify learner interface #94

Open
basnijholt opened this issue Dec 19, 2018 · 10 comments
Open

Suggestion: simplify learner interface #94

basnijholt opened this issue Dec 19, 2018 · 10 comments

Comments

@basnijholt
Copy link
Member

(original issue on GitLab)

opened by Christoph Groth (@cwg) at 2018-01-20T10:03:09.612Z

It seems to me that "adaptive" could greatly profit by becoming trivially extensible.

The adaptive package consists of: a collection of learners and a collection of runners. I think that "adaptive" should not strive to contain all learners and runners that are useful to someone. Rather, it seems to me that it would create the most value by establishing a simple notion of a "learner" and providing a set of generally useful learners as well as generally useful runners.

One can easily imagine learners and runners that are out-of-scope of a single package. Just to name a few that I have in mind:

  • A MPI runner
  • A bandstructure/spectrum learner
  • A vectorized integrator learner

Even if one believes that the above should be all included into "adaptive", my point is that adaptive evaluation is such a general idea that it is beyond the scope of a single package. This is why I suggest specifying a simple and "natural" basic runner-learner interface, such that it would become trivial to write a simple runner. One could then stop using an ABC for learners and other packages could contain compatible learners and runners without even explicitly depending on "adaptive", but "adaptive" would be useful together with them.

If the learner interface is simple enough, third-party packages will hopefully adapt it instead of using callback functions.

What is a learner? It seems to me, that a learner is best seen as something that, in accordance with some internal model of a function, can intelligently query points of a function and, based on partial or complete results of the queries, predict that function. Thus, the the core business of a learner consists of

  • choosing points that should be evaluated,
  • accepting results of evaluations,
  • predicting the function (and estimating the quality of this prediction!).

In today's "adaptive", in addition to the above, learners have to do considerable bookkeeping (because results may be fed back arbitrarily while the learner might need specific groups of them).

What would be a natural, almost invisible, interface to the above functionality? How about the following? (From the point of view of a trivial runner):

learner = SomeLearner(a, b, done_condition)
for x in learner:
    learner.feed(x, f(x))

# Plot the learned function using n samples.
x = np.linspace(a, b, n)
pyplot.plot(x, learner(x))

The above pretty minimal interface to a learner consists of the following parts:

  • The learner itself is an iterator (not an iterable) that chooses new points on request.
  • It has a method to feed back results. (The requested points are immutable objects that also serve as a handle when feeding back results.)
  • When called like function, the learner predicts values.

In addition to this strict minimum, there could be some optional functionality, like estimating the intermediate quality of the prediction.

Compared to my earlier attempts at adaptive evaluation I have been convinced by you guys that:

  • The runner should drive the learner (and not vice versa) by being able to request as many points as it has resources to evaluate at any given time. (This seems to be the optimal strategy in the common case where any evaluation of a point takes about the same amount of time.) If the learner is really out of ideas, there will be a natural way to signal that.
  • Explicit priorities, and the ability of the learner to cancel requests are probably not worth the hassle.

However, I still think that it is useful if learners are able to provide points in bunches and expect them back in the same bunches. Moreover, it this implicitly establishes a common priority of all the points of a bunch. At the same time, as subsequent bunches are requested without feeding back results, these have a lower and lower relative priority.

A useful corollary of this is that it is now naturally possible to drive a learner with the least possible amount of speculation but still utilizing parallelism. Consider the situation where there are only a few available cores per active learner. We want to utilize some parallelism, but without becoming too speculative, because this will increase the overall workload and ultimately slow down the whole process. With the above protocol, this means that the runner should only request one bunch of points at a time for each learner, and only request the next bunch when the result has been fed back. In the common case where this provides enough work to occupy all available cores, it is the most efficient approach.

Note that there is no way to do the same with current "adaptive", because it doesn't have the notion of, even implicit, priority. The best one can do if there are a few active learners is requesting one point at a time in a round-robin fashion. This is inefficient not only because of the lack of vectorization, but also because learner A might need only one point right now without speculating, but learner B might need three.

So it seems to me that points should arrive in bunches (i.e. a Python sequence), and should be fed back in the same bunches. In practice, one can assume for many learners and runners that points are sequences (of sequences ...) of numbers, i.e. numerical arrays. (For safety, it's useful if these arrays are immutable.)

The goal of a learner is to predict a function. What is the most natural interface to a predicted function? That of a function call: learner(points).

To summarize, I propose to define learners as iterators that yield sequences-of-points to be evaluated. The results are fed back using a feed-method. The learner's current prediction can be queried by a simple function call. I believe that this minimal interface encapsulates all the essential functionality of a learner for a wide spectrum of applications. It is sufficient for all kinds of synchronous and asynchronous runners as well as just-in-time function plotters, etc.

@basnijholt
Copy link
Member Author

originally posted by Joseph Weston (@jbweston) at 2018-01-31T12:46:30.169Z on GitLab

One could then stop using an ABC for learners and other packages could contain compatible learners and runners without even explicitly depending on "adaptive"

But an ABC is such a specification. The problem of explicitly depending on adaptive is solved by us implementing __subclasshook__ for the Learner ABC.
This is IMO superior to merely documenting what the learner interface is.

  • The learner defines the "done" condition
    learner = SomeLearner(a, b, done_condition)

I am not sure that the learner is the correct place for this. Currently in adaptive the end condition is supplied to
the runner; I believe that this is the correct place for it.


  • The learner itself is an iterator (not an iterable) that chooses new points on request.

I'm not sure that this is a good idea. This means that the following:

for point in learner:
    pass

changes the state of the learner. I understand that this generally true for iterators, but I think it is very confusing.

Also, in the above, the learner is the only one who can decide how many points to give back in each call to next() (in the above
code it is not obvious that point is a sequence, but I saw that later in your post you said that this could be the case). This would
seem a valid decision for cquad, where the learner naturally wants to give back all the points in an interval, but I'm not sure that it
is the best choice for other learners.

For example, imagine we have a 1D learner with just the boundary points evaluated and we want to
choose the next 2 points before giving back any data (so that the runner can get 2-fold parallelism).
If we ask the learner for both points at once (i.e. learner.choose_points(2)) then it can sensibly
choose to divide the interval into 3 parts:

x--------x  ->  x--x--x--x

However, if we have to do 2 separate calls to learner.choose_points(1) then the only thing the learner can do is
interval bisection:

x--------x -> x----x----x -> x--x--x----x

which leads to a suboptimal distribution of points, given the current information! In this example we get "better"
results by allowing the runner to decide how many points it wants, but allowing the learner to decide how to
distribute those points.

In the cquad example I could imagine that we would write the learner algorithm purely in terms of intervals,
and there would be a "middleware" sitting between that and the runner that would work out the points->intervals
mapping. The runner requests points from it, and it then requests intervals from the learner. Does that make sense?


  • It has a method to feed back results. (The requested points are immutable objects that also serve as a handle when feeding back results.)

This already exists; it is called add_point currently, but perhaps feed is a better name (we were up till now not consistent about what
we call elements from the domain, and codomain of the learned function, and pairs of the same).


When called like function, the learner predicts values.

This is a nice addition! @anton-akhmerov reminded me that this does not always make sense, though, because not all learners are trying
to interpolate a function, e.g. AveragingLearner.

@basnijholt
Copy link
Member Author

originally posted by Christoph Groth (@cwg) at 2018-02-01T23:20:08.574Z on GitLab

One could then stop using an ABC for learners and other packages
could contain compatible learners and runners without even
explicitly depending on "adaptive"

But an ABC is such a specification. The problem of explicitly
depending on adaptive is solved by us implementing __subclasshook__
for the Learner ABC. This is IMO superior to merely documenting what
the learner interface is.

My point is that simplifying the learner interface could turn it into an
inofficial standard. Then other libraries could become interoperable
with adaptive without either depending on the other one. The
subclasshook mechanism only works in retrospective: adaptive has to
explicitly accept specific external classes as valid learners.

Of course its OK to depend on adapative, but it's better to avoid
dependencies if there is no good reason for them.

  • The learner defines the "done" condition
    learner = SomeLearner(a, b, done_condition)

I am not sure that the learner is the correct place for
this. Currently in adaptive the end condition is supplied to the
runner; I believe that this is the correct place for it.

But the runner has no understanding at all of the class of functions
that is being learned. That's the learner's field of expertise. That's
why "goals" will always depend on the specific learner type and as such
(if generally useful) should be made best implemented as methods of it.

Of course, with the current design you can also write:

Runner(..., goal=Learner.some_goal)

but I don't see how it helps the runner to be able to trigger the goal
check at will. That check could be better triggered by the learner
because it knows best when and how often this is best done. For example
in cquad the error estimate will only change when an interval gets
updated, so there's no point in checking more often than that. The
runner has not understanding of that.

As far as custom goals that have not been foreseen by the Learner author
are concerned, there are multiple possible solutions but the most
elegant seems to me to subclass a learner. This makes it explicit that
the new goals is specific to a learner.

  • The learner itself is an iterator (not an iterable) that chooses
    new points on request.

I'm not sure that this is a good idea. This means that the following:

for point in learner:
    pass

changes the state of the learner. I understand that this generally
true for iterators, but I think it is very confusing.

You have a good point here.

When starting to think about learner's interface I was looking for some
pythonic way to abstract a consumer-producer in Python. I.e. an object
that provides things on request, and accepts things that are fed to it
(like a file, only that a file works with streams of characters and not
objects).

My first idea was to use a (generator-like) coroutine, that as you know
can not only yield objects but can be also fed objects. The problem
here is that Python's syntax for using a generator in this way is not
nice for this kind of usage.

Perhaps you have a better idea? In C++ I would write something like:

while (points = learner.get()) {
    values = evaluate(points);
    learner.feed(points, values)
}

The nice thing here is that getting no points is also the exit condition
for the loop. If only Python's assignments were expressions... But
they aren't, so one would have to write:

while True:
    points = learner.get()
    if not points:
        break
    values = evaluate(points)
    learner.feed(points, values)

The iterator trick allows to simplify this to

for points in learner:
    values = evaluate(points)
    learner.feed(points, values)

The C++ version is indeed nicer because get() is explicit. But I
consider the Python version OK, at least it's better than any
alternative I can think of.

For example, imagine we have a 1D learner with just the boundary
points evaluated and we want to choose the next 2 points before giving
back any data (so that the runner can get 2-fold parallelism). If we
ask the learner for both points at once
(i.e. learner.choose_points(2)) then it can sensibly choose to
divide the interval into 3 parts:

x--------x  ->  x--x--x--x

(...)

The problem with this approach is that it's nondeterministic. The
learned function will depend on the history of available resources. I
think it is a problem if a program that fails for some input cannot be
rerun to reproduce the bug.

But your point is valid. One can imagine cases where it is useful for
the learner to know how many evaluations can be done in parallel right
now. In the C++ interface sketched above I would add a parameter to
get that defaults to 1 and specifies a "soft" maximum number of
points. The learner would try to provide a number of points that is as
close to but not higher than that maximum. But it would still work in
full "chunks", i.e. the first request from cquad would yield 33 points.

How about this:

while learner.unsatisfied():
    points = learner.get(n)
    values = evaluate(points)
    learner.feed(points, values)

Unfortunately, this is getting much more complicated than my original
proposal. I have the impression that the simple original interface
would work well in practice.

In the cquad example I could imagine that we would write the learner
algorithm purely in terms of intervals, and there would be a
"middleware" sitting between that and the runner that would work out
the points->intervals mapping. The runner requests points from it, and
it then requests intervals from the learner. Does that make sense?

Yes, but what's your goal here?

When called like function, the learner predicts values.

This is a nice addition! @anton-akhmerov reminded me that this does
not always make sense, though, because not all learners are trying
to interpolate a function, e.g. AveragingLearner.

I think that it's a nice and crips definition to say that learners learn
a function y=f(x). The AveragingLearner does not do that. Its x
is some pseudo-random seed that the averaging learner actually does not
seem to need -- it never requests points with the same seed twice.

I'm not saying that the AveragingLearner is not useful. But I think
that it should be implemented as a learner of a noisy function of no
variable, i.e. x would be always None or an empty sequence. Hence,
to get the estimate of the average one would evaluate learner(None).
IMHO this corresponds better to what AveragingLearner does.

There could be of course also an averaging learner of function of a
variable, e.g. the noisy measurement of a current given some gate
voltage.

@basnijholt
Copy link
Member Author

originally posted by Anton Akhmerov (@anton-akhmerov) at 2018-02-02T08:37:29.136Z on GitLab

I'm not saying that the AveragingLearner is not useful. But I think
that it should be implemented as a learner of a noisy function of no
variable, i.e. x would be always None or an empty sequence. Hence,
to get the estimate of the average one would evaluate learner(None).
IMHO this corresponds better to what AveragingLearner does.

Good point, it sounds very reasonable that learner() should return the average of the function that is learned (and probably have a more detailed interface for more detailed information).

How to deal with the need to keep track of the collection of prng seeds? What should keep track of them?

@basnijholt
Copy link
Member Author

originally posted by Joseph Weston (@jbweston) at 2018-02-02T09:42:13.080Z on GitLab

The subclasshook mechanism only works in retrospective: adaptive has to explicitly accept specific external classes as valid learners

This claim is demonstrably false:

import abc

class Animal(abc.ABC):

    @classmethod
    def __subclasshook__(cls, obj):
        return any(hasattr(parent, 'make_noise') for parent in obj.__mro__)

    @abc.abstractmethod
    def make_noise(self):
        pass


class Duck:

    def make_noise(self):
        return "quack"


duck = Duck()
print(isinstance(duck, Animal))

Also the Python docs say so.

I maintain that using an ABC is as good a documentation as any for describing our "learner protocol".


The problem with this approach is that it's nondeterministic. The learned function will depend on the history of available resources.

I agree that it will be not be deterministic from the user's perspective, however I do not perceive this as a problem. A given run produces
a given set of calls to the learner's API (e.g: get(3), feed(...), get(2), feed(...), ...), and if you create a new learner and repeat the
exact same set of calls, then the learners will be in the same state.

I think it is a problem if a program that fails for some input cannot be rerun to reproduce the bug.

At the moment we get around this by having the option of maintaining a log of the calls made to the learner.
While this may not be practical for every run (you essentially need to store all the data twice) it has so far been
a reasonable compromise between reproducibility and usefulness.


But the runner has no understanding at all of the class of functions that is being learned.

Yes, but the user does. The user is the one who provides the goal. Here are some examples of reasonable goals that we have already used in adaptive:

  1. The learner's "loss" is below some tolerance
  2. A certain number of points have been added to the learner
  3. A certain time has elapsed since the start of the run

I believe that (1) is the type of goal that you were considering up till now. At the moment the "loss" is just an arbitrary figure of merit that cannot
really be compared between learners (not necessarily between learners of the same class), however I believe that it would make sense to "standardize"
this somehow (e.g. make it so that a loss of 1 means that the learner's "internal goal" has been reached). However I believe that it should totally
be possible to keep requesting points, even after the learner's "internal goal" has been reached. The user is king, and if they decide that
they want more points then they should get more points dammit!

(2) is also a very common goal (we often use it to compare a learner's performance to a naive gridding of parameter space). I think you will agree
that supplying lambda learner: learner.n > 1000 is more lightweight than subclassing the learner and overriding a goal method.

(3) is clearly outside the remit of the learner, and there is not really a natural place to "start the timer" if the logic is inside the learner.
We could imagine a learner that takes the execution time of different points into account, but this is, IMO, a separate consideration.

@basnijholt
Copy link
Member Author

originally posted by Joseph Weston (@jbweston) at 2018-02-02T09:44:57.028Z on GitLab

I forgot to add:

4. No goal. You leave the runner running overnight and come back in the morning to visually inspect how it's getting on. If it "looks reasonable" you stop the runner.

@basnijholt
Copy link
Member Author

originally posted by Joseph Weston (@jbweston) at 2018-02-02T10:00:45.189Z on GitLab

In the cquad example I could imagine that we would write the learner
algorithm purely in terms of intervals, and there would be a
"middleware" sitting between that and the runner that would work out
the points->intervals mapping. The runner requests points from it, and
it then requests intervals from the learner. Does that make sense?

Yes, but what's your goal here?

That the learner can return objects that are not mapped 1-1 onto "points" that can be evaluated by the runner.

The learner's internal algorithm is not necessarily easily expressed in terms of points, but in terms of other objects such as intervals.
OTOH a runner needs to get points so that it can call the function and get values.

@basnijholt
Copy link
Member Author

originally posted by Christoph Groth (@cwg) at 2018-04-19T11:43:57.486Z on GitLab

I only knew two ways of becoming the instance of an ABC: deriving from it and using register(). Thanks for teaching me the third way: __subclasshook__. I hope that I know all of them now.

However, I have to say that __subclasshook__ feels like a hack that defies the whole idea of ABCs, namely of having an explicit statement of compliance that goes beyond duck typing. With the first two methods of declaring a class to adhere to an ABC there's someone who inspects the class in question and then makes the statement that it adheres to the protocol specified by the ABC.

Using __subclasshook__ is an approach that reduces ABCs to duck typing, so what's the point other than providing a way to migrate an existing library from duck typing to ABCs without breaking compatibility?

But OK, since my proposal is to use duck typing for learners, an ABC with __subclasshook__ works just as well.

@basnijholt
Copy link
Member Author

originally posted by Christoph Groth (@cwg) at 2018-04-19T12:03:19.695Z on GitLab

Joseph wrote:

I agree that it will be not be deterministic from the user's perspective, however I do not perceive this as a problem. A given run produces a given set of calls to the learner's API (e.g: get(3), feed(...), get(2), feed(...), ...), and if you create a new learner and repeat the exact same set of calls, then the learners will be in the same state.

I think that this is too weak. That means that when you do computations on a cluster and start getting weird results the problem will be extremely difficult to debug. If you're lucky, you can reduce the size of the problem and run it on a single CPU, but what to do if this is not possible?

But even if there are no bugs being non-deterministic is highly problematic. So you run a simulation and it gives you different results each time even when starting from exactly the same state? That's like working with an analog computer!

There are certainly problems where non-deterministic concurrent algorithms may be the only way to go, but I do not have the impression that learners belong to that category. Let's not open the can of worms of non-determinism without need!

That brings me to the following. Further up I wrote:

One can imagine cases where it is useful for the learner to know how many evaluations can be done in parallel right now. In the C++ interface sketched above I would add a parameter to get that defaults to 1 and specifies a "soft" maximum number of points. The learner would try to provide a number of points that is as close to but not higher than that maximum. But it would still work in full "chunks", i.e. the first request from cquad would yield 33 points.

I think now that I was wrong there. There's no need for any parameter to get(). it can be useful to provide some typical chunk size or similar hint upfront, but If determinism is to be preserved, any hint that is provided during evaluation cannot have any influence on how the learner continues to work.

So sure, one could provide the number of currently free cores to get() but this could serve only as a mere optimization to avoid calling get() multiple times. That seems unnecessary, since an appropriate chunk size should be selected before learning starts.

@basnijholt
Copy link
Member Author

originally posted by Christoph Groth (@cwg) at 2018-04-19T14:50:11.555Z on GitLab

Joseph, concerning the 4 types of goals that you raised, I think that they are perfectly compatible with the iterator API. (Although I consider usage type 2 only useful for benchmarking learners.)

A reasonable default "loss" in that case would be zero, i.e. continue forever. Then case 1 would be handled by specifying a different precision goal, and cases 2 to 4 would be handled by breaking out of the loop when the appropriate condition is met.

You say that you find it problematic that

for point in learner:
    pass

changes the state of the learner. I guess that you would prefer instead

while point := learner.get():
    pass

if only that was valid Python.

Now it happens to be that PEP 572 (currently in the works) proposes to make the above valid Python and one strong argument against it is that assignment expressions are not really necessary in the above case because iterators already provide a way to do the same. So if you can convince me why the second syntax is actually better than the first one, I'd be actually happy because that would give me an argument in support of PEP 572!

The second syntax allows values to be fed-back to the "iterator", but I actually can't think of that being very useful in general. (As I detailed in a comment above, IMO it's not useful in the case of learners.)

@basnijholt
Copy link
Member Author

originally posted by Joseph Weston (@jbweston) at 2018-05-23T14:09:11.898Z on GitLab

scikit-optimize has a similar architecture to ours (though their Learner is narrower in scope). They have methods ask and tell, which I think are nice and succinct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant