## Introduction ##
* This chapter introduces a lot of fundamental concepts (and jargon) that every data scientist should know by heart. It will be a high-level overview (the only chapter without much code), all rather simple, but you should make sure everything is crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s get started!

## What is Machine Learning ##

* Machine Learning is the science (and art) of programming computers so they can learn from data.
* Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed. (Arthur Samuel, 1959)
* A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. (Tom Mitchell, 1997)

> **Example:** Spam Filter.
>
> **Task:** Correctly identify spam.
>
> **Experience:** Training data (collection of emails) + algorithms (machine learning) = model
>
> **Performance:** Ratio of correctly classified emails as spam.

## Why Use Machine Learning? ##

Machine Learning is great for:

* Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better.
    > * For example: Spam Filter.
    > * Traditional approach: Write a rule based system ('4U', 'cre%it car%'). But as spammers become smarter (using 'ForU' instead of '4U'), the rules list will keep growing and quickly get unweildy.
    > * Machine learning approach will keep learning as new training data is fed to the algorithm.
* Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution.
    > * For example: Speech recognition for words 'one' & 'two'.
    > * Traditional rule based approach will be very difficult to even define a good solution.
    > * Machine learning approach only needs a good training set of recordings of the words.
* Fluctuating environments: a Machine Learning system can adapt to new data.
* Getting insights about complex problems and large amounts of data.
    > * For example: Data mining.
    > * Inspect the ML based solution.
    > * Understand the problem better.
    > * Iterate if needed.

## Types of Machine Learning Systems ##

1. Trained with/without human supervision:
    > * Supervised
    > * Unsupervised
    > * Semisupervised
    > * Reinforcement learning
2. Can/cannot learn incrementally on the fly:
    > * Online learning
    > * Batch learning
3. Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do:                   
    > * Instance-based learning
    > * Model-based learning
4. Example:
    > * A state-of-the-art spam filter may learn on the fly using a deep neural network model trained using examples of spam and ham; this makes it an online, model-based, supervised learning system.
    
### Supervised Learning ###

* The training data fed to the algorithm includes the desired solutions, called *labels*.
* The training data features/attribute are called *predictors*.
* Two types:
    1. Classification
        > * Training data labels are non-numeric.
        > * Example: Spam filter (spam or ham)
    2. Regression
        > * Training data labels are numeric.
        > * Example: Car prices
        > * Note that some regression algorithms can be used for classification as well, and vice versa. For example, Logistic Regression is commonly used for classification, as it can output a value that corresponds to the probability of belonging to a given class (e.g., 20% chance of being spam).
* Important supervised learning algorithms:
    > * k-Nearest Neighbors
    > * Linear Regression
    > * Logistic Regression
    > * Support Vector Machines (SVMs)
    > * Decision Trees and Random Forests
    > * Neural networks (Some neural network architectures can be unsupervised, such as autoencoders and restricted Boltzmann machines. They can also be semisupervised, such as in deep belief networks and unsupervised pretraining.)
    
 
> **_NOTE_**
> In Machine Learning an attribute is a data type (e.g., “Mileage”), while a feature has several meanings depending on the context, but generally means an attribute plus its value (e.g., “Mileage = 15,000”). Many people use the words attribute and feature interchangeably, though.