## Introduction ##
* This chapter introduces a lot of fundamental concepts (and jargon) that every data scientist should know by heart. It will be a high-level overview (the only chapter without much code), all rather simple, but you should make sure everything is crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s get started!

## What is Machine Learning ##

* Machine Learning is the science (and art) of programming computers so they can learn from data.
* Machine Learning is the field of study that gives computers the ability to learn without being explicitly programmed. (Arthur Samuel, 1959)
* A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. (Tom Mitchell, 1997)

> **Example:** Spam Filter.
>
> **Task:** Correctly identify spam.
>
> **Experience:** Training data (collection of emails) + algorithms (machine learning) = model
>
> **Performance:** Ratio of correctly classified emails as spam.

## Why Use Machine Learning? ##

Machine Learning is great for:

* Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better.
    > * For example: Spam Filter.
    > * Traditional approach: Write a rule based system ('4U', 'cre%it car%'). But as spammers become smarter (using 'ForU' instead of '4U'), the rules list will keep growing and quickly get unweildy.
    > * Machine learning approach will keep learning as new training data is fed to the algorithm.
* Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution.
    > * For example: Speech recognition for words 'one' & 'two'.
    > * Traditional rule based approach will be very difficult to even define a good solution.
    > * Machine learning approach only needs a good training set of recordings of the words.
* Fluctuating environments: a Machine Learning system can adapt to new data.
* Getting insights about complex problems and large amounts of data.
    > * For example: Data mining.
    > * Inspect the ML based solution.
    > * Understand the problem better.
    > * Iterate if needed.

## Types of Machine Learning Systems ##

1. Trained with/without human supervision:
    > * Supervised
    > * Unsupervised
    > * Semisupervised
    > * Reinforcement learning
2. Can/cannot learn incrementally on the fly:
    > * Online learning
    > * Batch learning
3. Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like scientists do:                   
    > * Instance-based learning
    > * Model-based learning
4. Example:
    > * A state-of-the-art spam filter may learn on the fly using a deep neural network model trained using examples of spam and ham; this makes it an online, model-based, supervised learning system.
    
### Supervised Learning ###

* The training data fed to the algorithm includes the desired solutions, called *labels*.
* The training data features/attribute are called *predictors*.
* Two types:
    1. Classification
        > * Training data labels are non-numeric.
        > * Example: Spam filter (spam or ham)
    2. Regression
        > * Training data labels are numeric.
        > * Example: Car prices
        > * Note that some regression algorithms can be used for classification as well, and vice versa. For example, Logistic Regression is commonly used for classification, as it can output a value that corresponds to the probability of belonging to a given class (e.g., 20% chance of being spam).
* Important supervised learning algorithms:
    > * k-Nearest Neighbors
    > * Linear Regression
    > * Logistic Regression
    > * Support Vector Machines (SVMs)
    > * Decision Trees and Random Forests
    > * Neural networks (Some neural network architectures can be unsupervised, such as autoencoders and restricted Boltzmann machines. They can also be semisupervised, such as in deep belief networks and unsupervised pretraining.)
    
 
> **_NOTE_**
> In Machine Learning an attribute is a data type (e.g., “Mileage”), while a feature has several meanings depending on the context, but generally means an attribute plus its value (e.g., “Mileage = 15,000”). Many people use the words attribute and feature interchangeably, though.

### Unsupervised Learning ###

* The training data does not contain the desired solutions and hence is *unlabeled*.
* Unsupervised learning tasks:
    > * **Clustering** - detect groups/sub-groups (HCA) within data.
    > * **Visualization** - visualize (2D/3D) clusters in the input space (t-SNE)
    > * **Dimensionality Reduction** - simplify the data without losing too much information. One way to do this is to merge several correlated features into one. For example, a car’s mileage may be very correlated with its age, so the dimensionality reduction algorithm will merge them into one feature that represents the car’s wear and tear. This is called **_feature extraction_**. **[TIP:** _It is often a good idea to try to reduce the dimension of your training data using a dimensionality reduction algorithm before you feed it to another Machine Learning algorithm (such as a supervised learning algorithm). It will run much faster, the data will take up less disk and memory space, and in some cases it may also perform better._**]**
    > * **Anomaly Detection** - detecting unusual credit card transactions to prevent fraud, catching manufacturing defects, or automatically removing outliers from a dataset before feeding it to another learning algorithm. The system is trained with normal instances, and when it sees a new instance it can tell whether it looks like a normal one or whether it is likely an anomaly.
    > * **Association Rule Learning** - dig into large amounts of data and discover interesting relations between attributes. For example, suppose you own a supermarket. Running an association rule on your sales logs may reveal that people who purchase barbecue sauce and potato chips also tend to buy steak. Thus, you may want to place these items close to each other.
* Important unsupervised learning algorithms:
    > * Clustering
    > * k-Means
    > * Hierarchical Cluster Analysis (HCA)
    > * Expectation Maximization
    > * Visualization and dimensionality reduction
    > * Principal Component Analysis (PCA)
    > * Kernel PCA
    > * Locally-Linear Embedding (LLE)
    > * t-distributed Stochastic Neighbor Embedding (t-SNE)
    > * Association rule learning
    > * Apriori
    > * Eclat

### Semisupervised Learning ###

* The training data has little bit of labeled data and a lot of unlabeled data.
* Example:
    > Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just one label per person and it is able to name everyone in every photo, which is useful for searching photos.
* Algorithms:
    > Most semisupervised learning algorithms are combinations of unsupervised and supervised algorithms. For example, deep belief networks (DBNs) are based on unsupervised components called restricted Boltzmann machines (RBMs) stacked on top of one another. RBMs are trained sequentially in an unsupervised manner, and then the whole system is fine-tuned using supervised learning techniques.
    
### Reinforcement Learning ###

* The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards. It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.
* Examples:
    > * Robots learning how to walk.
    > * DeepMind’s AlphaGo learned its winning policy by analyzing millions of games, and then playing many games against itself.
    
### Batch Learning (aka Offline Learning) ###

* System needs to be trained on all the available data.
* It can consume a lot of time and computing resources based on the amount of data.
* Hence, the learning needs to be done offline usually.
* A schedule (daily or weekly) can be setup for re-learning using new plus existing data.

### Online Learning ###


