## Story Time : The Netflix Prize
<img src="figures/netflix-prize.png" width="60%">

The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users or the films being identified except by numbers assigned for the contest.

 On 21 September 2009, the grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%

This competition took 3 years to complete. The winner was not a single algorithm but a complex combination of many algorithms, each one reducing the error ever so slightly.

Netflix doesn't use it !!! Because in it's own words it “did not seem to justify the engineering effort needed to bring them into a production environment,”

### Ok nice story, but why?

## ML is not magic!

<img src="figures/magician.jpg" width=70%>

The point is, it's very easy to underestimate the complexity that goes with ML.
A couple of very important points from the paper, however I urge you to read the paper itself.

There is a pretty pretty awesome paper named **["A Few Useful Things to Know about Machine Learning"](http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)** by Prof. Pedro Domingos, wherein he talks about the pitfalls of ML.

* **Sometimes data is not enough ** Quoting Domingos: "... the need for knowledge in learning should not be surprising. Machine learning is not magic; it can’t get something from nothing. What it does is get more from less. Programming, like all engineering, is a lot of work: we have to build everything from scratch. Learning is more like farming, which lets nature do most of the work. Farmers combine seeds with nutrients to grow crops. Learners combine knowledge with data to grow programs."

* **More Data > Clever Algorithm** Quoting Domingos: "Suppose you’ve constructed the best set of features you can, but the classifiers you’re getting are still not accurate enough. What can you do now? There are two main choices: design a better learning algorithm, or gather more data. [...] As a rule of thumb, a dumb algorithm with lots and lots of data beats a clever one with modest amounts of it. (After all, machine learning is all about letting data do the heavy lifting.)"

* **The CURSE of Dimensionality** This expression was coined by Bellman in 1961 to refer to the fact that many algorithms that work fine in low dimensions become intractable when the input is high-dimensional. Generalizing correctly becomes exponentially harder as the dimensionality (number of features of the examples grows, because a fixed-size training set covers a dwindling fraction of the input space. Even with a moderate dimension of 100 and a huge training set of a trillion examples, the latter covers only a fraction of about 10−18 of the input space. This is what makes machine learning both necessary and hard.



### Which brings us too ....

## Diving Deeper into ML

If you wish to better understand the inner workings of how Machine Learning functions, how each of the prediction algorithms work, what makes them work, then you can follow in this path.

The thing to note here is that this path is filled with Math and Statistics along with programming. It is very easy to forget that on the ground level ML is all about math and statistics, especially when we have an elegant library like scikit-learn which provides a very splendid abstraction.

However, this path too has its benefits:
    * You get a better understanding of hyperparameters.
    * Visualization becomes easier.
    * Standard Algorithms sometimes fail in non-trivial problems :(
    

### How to go about it then?

* A much recommended approach will be to start with Andrew Ng's Machine Learning course on [Coursera](https://www.coursera.org/learn/machine-learning). It provides a gentle and solid introduction to the insides of Machine Learning and has lots of programming exercises too. 
* Start playing with a number of ML related notebooks shared by [IPython]
(https://github.com/ipython/ipython/wiki/A-gallery-of-interesting-IPython-Notebooks).
* Some examples shared by [Scikit-Learn](http://scikit-learn.org/stable/auto_examples/)
* [Kaggle](https://www.kaggle.com/)
* This excellent post about Machine Learning on [Github](https://github.com/hangtwenty/dive-into-machine-learning)

### Finally

<img src="figures/ml_map.png">

# THANK YOU