-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update GridSearch and add to tutorial #439
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Two overall comments (besides inline):
- Need documentation of the optional params in the doc string so we know what they are :)
- Let's just make this the default in the tutorial?
@@ -97,7 +97,7 @@ def predict(self, X, b=0.5): | |||
"""Return numpy array of elements in {-1,0,1} based on predicted marginal probabilities.""" | |||
return np.array([1 if p > b else -1 if p < b else 0 for p in self.marginals(X)]) | |||
|
|||
def score(self, X_test, L_test, gold_candidate_set, b=0.5, set_unlabeled_as_neg=True): | |||
def score(self, X_test, L_test, gold_candidate_set, b=0.5, set_unlabeled_as_neg=True, disp=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
display
for readability?
def search_space(self): | ||
return product(*self.param_val_ranges) | ||
|
||
def fit(self, X_test, L_test, gold_candidate_set, b=0.5, set_unlabeled_as_neg=True, **model_hyperparams): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things:
- I would call e.g.
X_validation
to make it clear that they shouldn't use the test set here? L_test
is a vector of labels, so should be e.g.validation_labels
(upper case single letter reserved for matrices)
p, r, f1 = scores[:3] | ||
tp, fp, tn, fn = self.model.score(X_test, L_test, gold_candidate_set, b, set_unlabeled_as_neg, disp=False) | ||
p, r = float(len(tp)) / (len(tp) + len(fp)), float(len(tp)) / (len(tp) + len(fn)) | ||
f1 = 2.0 * (p * r) / (p + r) if (p + r) > 0 else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should put these as standard utility fns
|
||
|
||
|
||
class RandomSearch(GridSearch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I was thinking RandomSearch
would be over a continuous space of param values, rather than filtering a set of given discreet values as here; I think that would be more convenient because then the user only needs to specify a range rather than a set of values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shhhhhhhhh values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well played
Ok @ajratner changes made |
@henryre looks good, added a bit of text in tutorial. Two questions remaining:
|
Update: 0.4 F1 from non-grid-search tutorial, 0.667 for grid-search tutorial |
lol progressbar broke the build |
Might be helpful. Same issue: keras-team/keras#2110 |
So let's just put in how to proceed without grid search (if you don't have dev set labeled)- this will preempt a lot of questions! |
Dev set F1: 0.4 -> 0.57 |
LGTM- will merge in after Travis passes; we should merge into |
Berg gets back on the board
Fixes #436