- Given a large set of triples that come from some family trees, figure out irregularities
- (x has-mom y) & (y has-husband z) => (x has-father y)
- Squared error measure drawbacks:
- If the desired output is 1 and actual output is 1e-9, then almost no gradient for the logistic unit to fix the error
- If assigning probabilities to mutually exclusive classes, outputs should sum to 1.
- Force outputs to represent a probability distribution across discrete alternatives
and
- Cost function: negative log probability of the right answers
- The steepness of dC/dy exactly balances the flatnes of dy/dz