You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because we compute the Hessian of the cross-entropy, for classification we don't scale well to many classes and have limited the maximum number to 30.
We can likely already increase that limit as a result of performance enhancements. However, to increase it significantly we need to think about how to tackle this bottleneck. A reasonable option to explore is identifying classes which are well separated in feature space. For any given class pair (i,j) for which there aren't regions where you are choosing between i and j then one could safely zero {H(i,j), H(j,i)}. This would in effect create block diagonal Hessians and we can bound the maximum block size as we do now.