cold start handling in ranked batch sampling #28

zhangyu94 · 2019-01-10T18:15:16Z

Hi!

The behavior of cold start handling in ranked batch sampling seems different from the Cardoso et al.'s "Ranked batch-mode active learning".

modAL/modAL/batch.py

Lines 133 to 139 in 452898f

    
           if classifier.X_training is None: 
        
               labeled = select_cold_start_instance(X=unlabeled, metric=metric, n_jobs=n_jobs) 
        
           elif classifier.X_training.shape[0] > 0: 
        
               labeled = classifier.X_training[:] 
        
           # Define our record container and the maximum number of records to sample. 
        
           instance_index_ranking = []

In modAL's implementation, in the case of cold start, the instance selected by select_cold_start_instance is not added to the instance list instance_index_ranking.
While in "Ranked batch-mode active learning", the instance selected by select_cold_start_instance seems to be the first item in instance_index_ranking.

modAL/modAL/batch.py

Line 46 in 452898f

return X[best_coldstart_instance_index].reshape(1, -1)

If my understanding on the algorithm proposed in the paper and modAL's implementation is correct, we can change the return of select_cold_start_instance to
return best_coldstart_instance_index, X[best_coldstart_instance_index].reshape(1, -1),
store best_coldstart_instance_index in instance_index_ranking, and revise ranked_batch correspondingly.

The text was updated successfully, but these errors were encountered:

cosmic-cortex · 2019-01-11T06:16:33Z

Great, thanks! It is certainly not added. I'll take a look at the paper as soon as possible!

cosmic-cortex · 2019-01-14T09:10:35Z

Thanks for the PR! Fixed by #29.

zhangyu94 · 2019-01-14T10:56:57Z

Hmm... This issue and PR #29 actually are addressing different problems.
This issue is on the problem that the best cold start instance is not added to the first batch (in ranked_batch), while PR #29 address the problem that the computation of instance index is incorrect (in select_instance).

I haven't started a pull request for this issue since solving it very likely requires changing the API of select_cold_start_instance.
If needed, I can start a PR for this issue later today.

cosmic-cortex · 2019-01-14T11:18:27Z

Sorry, remembered the issue wrong and didn't read the post again before commenting. Issue reopened! :)

cosmic-cortex · 2019-01-14T14:41:19Z

In any case, no need to rush with the PR! Probably I won't have time to work on this week, so any help is appreciated!

zhangyu94 · 2019-01-14T22:05:42Z

Hi, I have opened PR #30 for this issue.

By the way, I think it will be great if we can compose the cold start handling mechanism that currently works for ranked batch sampling (and possibly other cold start handling strategies in the future) with other active sampling strategies supported by modAL.

cosmic-cortex · 2019-01-21T07:57:51Z

Alright, thanks for the PR! I finally had the time to review and merge it. Currently, some cold start is implemented for the utility measure functions, but it only checks whether the estimator has been fitted yet, and if not, it returns a zero array. Implementing the same density based cold start criteria for a general query function is a good idea.

zhangyu94 · 2019-01-21T12:04:49Z

I see. I will take a look at how to integrate cold start handling mechanisms.

One thing that I have been thinking about is whether it is better to pass the cold start function to the query strategy functions or to the Learner when initialized. It is logically sounder to pass cold start criteria to a query strategy since "cold start criteria" are part of the "query strategy", while in implementation, it seems much easier to do it the other way. If we pass the cold start criteria to the Learner, it seems that we only need to change the Learner.query method to support cold start handling for all the query strategies. By comparison, if the cold start criteria is to be passed to the query strategy functions, all the query strategy functions may need to be revised.

Thanks.

cosmic-cortex · 2019-01-22T17:02:23Z

I agree completely. I think it is better if the cold start strategy is passed to the query strategy, even so if all query strategy functions need to be modified.

In connection with this, I also plan to do a refactor of the query strategy functions. If you check the code, for instance here, the implementation of the uncertainty_sampling, margin_sampling, entropy_sampling functions is almost identical, aside from the function they call for calculating the utility. This can be solved with a function factory or some other construct. The only reason I implemented it this way because I wanted to avoid adding docstrings one by one later. Do you have any idea which might be good for this? We might hit two birds with one stone, because this would solve the problem outlined by you.

zhangyu94 · 2019-01-22T20:27:56Z

Hmm... I don't have better ideas than using a function factory.

A possible alternative is to lift the query strategies from functions to instances of a QueryStrategy class. Different instances of this QueryStrategy class can have different scorers (e.g., classifier_entropy) and cold start handlers. This solution doesn't seem to have a clear advantage over the function factory solution.

cosmic-cortex closed this as completed Jan 14, 2019

cosmic-cortex reopened this Jan 14, 2019

zhangyu94 mentioned this issue Jan 14, 2019

fix: store cold start instance in batch.ranked_batch #30

Merged

cosmic-cortex closed this as completed May 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cold start handling in ranked batch sampling #28

cold start handling in ranked batch sampling #28

zhangyu94 commented Jan 10, 2019

cosmic-cortex commented Jan 11, 2019

cosmic-cortex commented Jan 14, 2019

zhangyu94 commented Jan 14, 2019

cosmic-cortex commented Jan 14, 2019

cosmic-cortex commented Jan 14, 2019

zhangyu94 commented Jan 14, 2019

cosmic-cortex commented Jan 21, 2019

zhangyu94 commented Jan 21, 2019

cosmic-cortex commented Jan 22, 2019

zhangyu94 commented Jan 22, 2019

cold start handling in ranked batch sampling #28

cold start handling in ranked batch sampling #28

Comments

zhangyu94 commented Jan 10, 2019

cosmic-cortex commented Jan 11, 2019

cosmic-cortex commented Jan 14, 2019

zhangyu94 commented Jan 14, 2019

cosmic-cortex commented Jan 14, 2019

cosmic-cortex commented Jan 14, 2019

zhangyu94 commented Jan 14, 2019

cosmic-cortex commented Jan 21, 2019

zhangyu94 commented Jan 21, 2019

cosmic-cortex commented Jan 22, 2019

zhangyu94 commented Jan 22, 2019