Explanation for 7.19 #190

qbolec opened this Issue Nov 28, 2015 · 0 comments


None yet

1 participant

qbolec commented Nov 28, 2015

You wrote:

Solving for θ0 is identical to solving for the biased coin case from
before: it is just the relative frequency of positive labels in your data
(because θ0 doesn’t depend on x at all).

While the solution is correct, I'm not sure if this is the correct argumentation behind it.
I think that sigma_0 is a number, not a variable and thus can not really "depend on x" as such.
And even if this phrase is just a shorter way of saing that "y does not depend on x", it would not be true neither - we know that y depends on x, as otherwise we would not even try to attempt the problem of analyzing movie reviews.

The reason I've came up with to justify your claim while reading this paragraph was different.
If I imagine copies of equation 7.19 written for all data in the training set and multiplied together, then I'll get the probability of observing this trainging set w.r.t to my current model (sigmas).
As I want to maximize this likelihood, I want this long product to be as big as possible.
I can regroup the terms of this product so that (sigma_0) and (1-sigma_0) are all together, and the other part of the expression is "sigma_0-free" and can be treated as a constant multiplier - to maximize the product I can focus on the part containing (sigma_0) and (1-sigma_0) only.
And you already showed me that this is maximized for sigma_0 equal to the relative frequency of label 1.
Similar "grouping" argument can be used for other "coins" in this huge product.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment