Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
feat(experiments): add text taggers #4294
This PR adds the ability to classify text. We define two different classifiers, a Naïve Bayes (NB) classifier, and a multiclass nonnegative matrix factorization (NMF) classifier. Both use a bag of words, TF-IDF vectors as features. The purpose of this code is to allow Firefox to classify pages into topics, by examining the text found on the page.
This code is part of the Pocket Personalization v2 experiment which uses content analysis to locally build interest profiles.
This code is dark.
We reviewed this internally on PR Pocket#1
@k88hudson We're aiming to land the feature before August 23 code freeze. There's more code to be written, but because it's a fairly large amount new code, I thought it was prudent to break it up into smaller PRs rather than hold on to one giant PR. The idea was that it makes it easier to understand, and lowers the overall risk. The feature is going to gated behind an A/B test and will launch dark, and then be turned on for some fraction of users through Shield.
If it's useful, I'm on slack, and have been working with Scott Downe, who I think sits next to you.
Ok thanks! Just wanted to know if this was planned for an uplift or not.
The plan is to try really really hard to not have to get uplift after the Sept 4 beta release, but if we miss the date, we'll need uplift to beta. Otherwise, we're stuck for a long time. Fortunately, we're going to be behind a flag, so hopefully it should possible.
Let's also track this on Bugzilla https://bugzilla.mozilla.org/show_bug.cgi?id=1483667
Please resolve all the issues in my comments.
For the Object->Map switch, if you'd like to do that later, please file a follow-up bug for that. It's not only for speed, but also to be consistent in this codebase. If you're looking for a collection structure to store key/value pairs, Map is always preferred.