-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cold start handling in ranked batch sampling #28
Comments
Great, thanks! It is certainly not added. I'll take a look at the paper as soon as possible! |
Thanks for the PR! Fixed by #29. |
Hmm... This issue and PR #29 actually are addressing different problems. I haven't started a pull request for this issue since solving it very likely requires changing the API of select_cold_start_instance. |
Sorry, remembered the issue wrong and didn't read the post again before commenting. Issue reopened! :) |
In any case, no need to rush with the PR! Probably I won't have time to work on this week, so any help is appreciated! |
Hi, I have opened PR #30 for this issue. By the way, I think it will be great if we can compose the cold start handling mechanism that currently works for ranked batch sampling (and possibly other cold start handling strategies in the future) with other active sampling strategies supported by modAL. |
Alright, thanks for the PR! I finally had the time to review and merge it. Currently, some cold start is implemented for the utility measure functions, but it only checks whether the estimator has been fitted yet, and if not, it returns a zero array. Implementing the same density based cold start criteria for a general query function is a good idea. |
I see. I will take a look at how to integrate cold start handling mechanisms. One thing that I have been thinking about is whether it is better to pass the cold start function to the query strategy functions or to the Learner when initialized. It is logically sounder to pass cold start criteria to a query strategy since "cold start criteria" are part of the "query strategy", while in implementation, it seems much easier to do it the other way. If we pass the cold start criteria to the Learner, it seems that we only need to change the Learner.query method to support cold start handling for all the query strategies. By comparison, if the cold start criteria is to be passed to the query strategy functions, all the query strategy functions may need to be revised. Thanks. |
I agree completely. I think it is better if the cold start strategy is passed to the query strategy, even so if all query strategy functions need to be modified. In connection with this, I also plan to do a refactor of the query strategy functions. If you check the code, for instance here, the implementation of the |
Hmm... I don't have better ideas than using a function factory. A possible alternative is to lift the query strategies from functions to instances of a QueryStrategy class. Different instances of this QueryStrategy class can have different scorers (e.g., |
Hi!
The behavior of cold start handling in ranked batch sampling seems different from the Cardoso et al.'s "Ranked batch-mode active learning".
modAL/modAL/batch.py
Lines 133 to 139 in 452898f
In modAL's implementation, in the case of cold start, the instance selected by select_cold_start_instance is not added to the instance list instance_index_ranking.
While in "Ranked batch-mode active learning", the instance selected by select_cold_start_instance seems to be the first item in instance_index_ranking.
modAL/modAL/batch.py
Line 46 in 452898f
If my understanding on the algorithm proposed in the paper and modAL's implementation is correct, we can change the return of select_cold_start_instance to
return best_coldstart_instance_index, X[best_coldstart_instance_index].reshape(1, -1)
,store best_coldstart_instance_index in instance_index_ranking, and revise ranked_batch correspondingly.
The text was updated successfully, but these errors were encountered: