Skip to content
This is my code for An Introduction to Machine Learning with Web Data by Hilary Mason
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

ML Master Class

This is my code for An Introduction to Machine Learning with Web Data by Hilary Mason

Classifying Web Documents


Searching it:

http GET\?query\=jazz\&api-key\=####

This is supervised learning, we provide some known labels so we can guest what labels belong to other training data we want to predict.

Use Naive Bayes to predict what kind of article is certain article.

Naive Bayes Algorithm

ab + (1 - a)(1 - b)

(.60)(.72) + (1 - .60)(1 - .72)


Reduce words to linguistic stems. It can be used instead of complex natural language analysis.

program = programming = programmer


Lexical database every word in english is in wordnet. And it can decide if something can be used as a noun or not, or a verb or not. You can use the Wordnik API for increasing your data set.

K Nearest Neighbors

Metaphor of distance. Find the K nearest points near an unlabeled point and infer the label.

  • Good for image analysis
  • Face recognition
  • If you can imagine it in space then you can use it
  • Anything you can plot

Euclidean distance, good algorithm.


Support Vector Machine calculates a divided hyperbole between two sets.

libsvm, is a good library.


Take week learners and make them stronger by combining them.

Active Learning

User a classifier to label items that we understand.

Bring unknown items to an alternative system for labeling.

Re-train classifiers.

Something went wrong with that request. Please try again.