## Machine Learning

![ML_overview](ML_overview.png "ML_overview")


### [Definition @wikipedia](https://en.wikipedia.org/wiki/Machine_learning)

>"Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases."

### Learning

- [**Definition** @wikipedia](https://en.wikipedia.org/wiki/Machine_learning)

    Learning is the act of acquiring new, or modifying and reinforcing, existing knowledge, behaviors, skills, values, or preferences and may involve synthesizing different types of information.
    
    
- The **author's** opinion:

    In simple terms, historical data or observations are used to predict or derive actionable tasks.
    
    
- **Data**:


    Unlabeled Data: ordinary the raw forms of the data
    
    Labeled Data: when a meaning is attached to the data
    
- **Tasks**:


    A task is a problem that a machine learning algorithm to solve
    
    
- **Algorithms** (***By subfields***):
        
    ***supervised***: inferring a function from labeled training data (wikipedia)

    ***unsupervised***: inferring a function to describe hidden structure from unlabeled data (wikipedia)

    ***semi-supervised***: learning using both labeled and unlabeled data to infer models better

    ***reinforcement***: (this book)

        learning that focuses on maximizing the rewards from the result, 
        also called credit assessment learning 
        1. Rewards will be issued for decisions made
        2. Results are not be seen immediately, rather a sequence of steps maybe required to be executed.
        3. The goal of the learning algorithm is to maximize the cumulative rewards

    ***deep learning***: (wikipedia) (I think the wikipedia definition is better)

        1. Deep learning has been characterized as a buzzword, or a rebranding of neural networks
        2. Deep learning (deep structured learning, hierarchical learning or deep machine learning) 
           is a branch of machine learning based on a set of algorithms that attempt to model high-level 
           abstractions in data by using multiple processing layers, with complex structures or otherwise, 
           composed of multiple non-linear transformations
        3. Various deep learning architectures such as deep neural networks, convolutional deep neural networks, 
           deep belief networks and recurrent neural networks have been applied to fields like computer vision, 
           automatic speech recognition, natural language processing, audio recognition and bioinformatic
        
        

        


- **Algorithms** ( ***By problem categories***):
        
    ***Classification***: ( <a href=https://en.wikipedia.org/wiki/Statistical_classification> Def at wikipedia </a> )

        1. classification is the problem of identifying to which of a set of categories (sub-populations) 
           a new observation belongs, on the basis of a training set of data containing observations 
           (or instances) whose category membership is known
        
        2. outputs are discrete
        
        3. supervised learning approach

    ***Regression***: 
    
        1. outputs are continuos
        
        2. supervised learning approach
    
    ***Clustering***: (the author)
    
        1. In short, clustering is a classification analysis that does not start with a specific target in mind 
           (e.g. good/bad, will buy/will not buy) 
        
        2.  unsupervised learning approach.
        
    ***Optimization***:
    
        I think it is NOT appropriate to place this concept here. 
        It is a match concept rather than a machine learning one.
        

- **Models**

    A model describes data observed in a system. E.g.
    
    - ***Logical models***

    - ***Geometric models***

    - ***Probabilistic models***
    

- **Data inconsistencies** in Machine learning


    Under-fitting
    
    Over-fitting
    
    Data instability
    
    Unpredictable future

### Performance measures

#### 1. Discrete Outputs: Accuracy, Recall, Precision

- Assume total observations is 10,000


- Actual and predicted:

<table>
<tr><td>_</td><td>Predicted Positive</td><td>Predicted Negtive</td></tr>
<tr><td>Actual Positive</td><td><b>TP</b> (True Positive): 500</td><td><b>FN</b> (False Negtive): 400</td></tr>
<tr><td>Actual Negtive</td><td><b>FP</b> (False Positive): 100</td><td><b>TN</b> (True Negtive): 9000</td></tr>
</table>

- **Accuracy**:

    is the percentage of predictions that were correct
    
    = (TP+TN) / Total = (500+9000) / 10000 = 95%
    

- **Recall**:

    is the percentage of predictions that were correct among the total cases that were predicted positive
    
    = TP / (TP + FP) = 500 / (500 + 100) = 83.3%
    
    
- **Precision**:

    is the percentage of predictions that were correct among the total cases that were actual positive
    
    = TP / (TP + FN) = 500 / (500 + 400) = 55.5%


#### 2. Continuous Outputs: 

- **MSE** (mean square error) or **MSD** (mean-square deviation)
    ![MEAN SQUARED ERROR](mse.jpg 'MSE')
        where:

            Pi: predicted value
            Ai: actual value
        
    **RMSE** (root-mean-square error) or **RMSD** (root-mean-square deviation): the square root of this quantity.
    

- **MAE** (mean absolute error) or **MAD** (mean absolute deviation)
    ![MEAN ABSOLUTE ERROR](mae.jpg 'MAE')


- **NMSE** (normalized mean square error)
    
    Comparing with a benchmarking index.
    
    ![NORMALIZED MEAN SQUARED ERROR](nmse.jpg 'NMSE')

- **NMAE** (normalized mean absolute error)
    
    Similiar to NMSE


#### 3. Bias–Variance Tradeoff 

- [Definition at wikipedia](https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff)
    
    In statistics and machine learning, the bias–variance tradeoff (or dilemma) is the problem of simultaneously minimizing **two sources of error** that prevent supervised learning algorithms from generalizing beyond their training set:
    ![Bias Variance Tradeoff](bias_variance_tradeoff.jpg 'bias_variance_tradeoff') 
    
    - **TotalErr(x) = Bias^2 + Variance + Irreducible Error**
    
    - The **bias** is error from erroneous assumptions in the learning algorithm. 
        - High bias can cause an algorithm to miss the relevant relations between features and target outputs (**underfitting**).
        - Normally for less complex models, the bias is high
    
    - The **variance** is error from sensitivity to small fluctuations in the training set. 
        - High variance can cause **overfitting**: modeling the random noise in the training data, rather than the intended outputs.
        - Normally for higher complex models, the variance is high


- If a model has a high bias, how does its error vary as a function of the amount of data?
    ![Bias vs Data Size](bias_data_size.jpg 'bias_data_size') 
    - Training set error goes up as data size increases
    - Testing set error goes down
    - As the model gets more refined, these 2 kinds of errors tend to be the same (converge)


- The remedy for high bias:
        
    - Maybe too few features selected ==> choose more features
    - Increase the complexity of the model 
    - Increasing the data size will not be of much help
    
    
- The remedy for high variance:
    
    Training set error and testing set error tend not to converge.
    
    ![Variance vs Data Size](variance_data_size.jpg 'variance_data_size') 
    
    - Maybe too many features selected, reduce the features
    - Decrease the complexity of the model
    - Increasing the data size will be some help


#### PAC 

I don't see the PAC concept introduced in thie section is much helpful.

Two types of uncertainties per Probably Approximately Correct (**PAC**) theory:

- Approximate: 
    This measures the extent to which an error is accepted for a hypothesis
    
- Probability: 
    This measure is the percentage certainty of the hypothesis being correct
