# Machine Learning

- Machine learning is a method of data analysis that automates analytical model building. 
- Using algorithms that iteratively learn from data, machine learning allows computers to find hidden insight without being explicitly programmed where to look.  

## What is it used for?
- Fraud detection
- Web search results
- Real-time ads on web pages
- Credit scoring and next-best offers
- Prediction of equipment failures
- New pricing models
- Network intrusion detection
- Recommendation Engines
- Customer Segmentation
- Predicting Customer Churn
- Pattern and image recoginition
- Email spam filtering
- Financial Modeling

## Machine Learning Process

Data Acquisition > Data Cleaning > Model Training & Building > Model Testing > Model Deployment  

Data Cleaning > Test Data > Model Testing  

Model Testing > Model Training & Building

## Machine Learning Types

### There are 3 types of Machine Learning Algorithms
- Supervised Learning
    - You have labeled data and are trying to predict a label based off of known features
    - Supervised Learning Algorithms are trained using label examples, such as an input where the desired opt is known
        - Ex. A piece of equipment could have data points labeled either "F" (Failed) or "R" (Runs)
    - The learning algorithm receives set of input along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors. It then midifies the model accordingly.
    - Through methods like classification, regression, prediction, and gradient boosting, supercised learning uses patterns to predict the values of the label on additional unlabeled data.
    - Supervised learning is commonly used in application where historical data predicts likely future events
        - Ex. It can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim.
        - Or it can attempt to predict the price of a house based on different features for houses for which we have historical price data
    
- Unsupervised Learning
    - You have unlabeled data and truing to group together similar data points based off of features
        - Unsupervised Learning is used against data that has no historical labels
        - The system is not told the "right answer". The algorithm must figure out what is being shown.
        - The goal is to explore the data and find some structure within
        - Or it can find the main attributes that separate customer segments from each other.
        - Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition
        - These algorithms are also used to segment text topics, recommend items and identify data outliers.
        
- Reinforcement Learning
    - Algorithm learns to perform an action from experience
    - Reinforcement Learning is often used for robotics, gaming, and navigation
    - With reinforcement learning, the algorithm discovers through "Trial and Error" which actions yield the greatest rewards.
    - This type of learning has three primary components: the agent (the learner or decision maker), the environment (everything the agent interacts with) and actions (what the agent can do)
    - The objective is for the agent to choose actions that maximize the expected reward over a given amount of time
    - The agent will reach the goal much faster by following a god policy
    - So the goal in reinforement learning is to learn the best policy

### Linear Regression

#### Bias Variance Trade-off
- Bias: High error due to assumption
- Variance: Error due to an overly-complex that tries to fit the training data as closely as possible
- Trade-off: A balance achieved between two desirable but incompatible features; a compromise
- The Bias-Variance trade-off is the point where we are adding model complexity (flexibility)
    - Error = noise + Bias**2 + Varience
- The training error goes down as it has to, but the test error is starting to go up
- The model after the bias trade-off begins to overfit

### Logistic Regression

- Logistic Regression is a method for Classification
- Although the name may be confusing at first, logistic regression allows us to solve classification problems, where we are trying to predict discrete categories
- The convention for binary classification is to have two classes 0 and 1
- We can't use a normal linear regression model on binary groups. It won't lead to good fit
- Instead we can transform our linear regression to a logistic regression curve
- The Sigmoid (aka. Logistic) Function takes in any value and outputs it to be between 0 and 1
- The means we can take our Linear Regression Solution and place it into the Sigmoid Function
- We can set a cutoff point at 0.5, anything below it results in class 0, anything above is class 1

### Model Evaluation

- After you train a logistic regression model on some training data, you will evaluate your model's performance on some test data.
- You can use a confusion matrix to evaluate classification models.
- Basic Terminology
    - True Positives (TP)
    - True Negatives (TN)
    - False Positves (FP)
    - False Negatives (FN)

### K Nearest Neighbors (Knn)

- Training Algorithm
    1. Prepare all label dataset for comparing a new data
    2. Choose k points of Nearest Neighbors
- Prediction Algorithm
    1. Calculate the distance from x (unknown data) from k closest
    2. Predict the majority label of the k closest

### Decision Trees and Random Forests

#### Tree Method
- Nodes (box) 
    - Split for the value of a certain attribute  
- Edges (arrow)
    - Outcome of a split to next node
- Root
    - The node that performs the first split
- Leaves
    - Terminal nodes that predict the outcome

#### Entropy and Information Gain
- The mathematical Methods of choosing the best split  
    - Entropy
    - Information Gain

#### Random Forests
- To improve performance, we can use many trees with a random sample of features chosen as the split
    - A new random sample of features is chosen for every single tree at every single split
    - For Classificationn, m is typically chosen to be the square root of p
    - Suppose there is one very strong feature in the dataset. When using "bagged" trees, most of the trees will use the feature as the top split, resulting in an ensemble of similar trees that are highly correlated
    - Averaging highly correlated quantities does not significantly reduce variance
    - By randomly leaving out candidate features from each split, Random Forests "decorrelates" the trees, such that the averaging process can reduce the variance of the

## Support Vector Machines (SVM)

Support vector machines (SVMs) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis

Given a set of training examples, each marked for belonging to one of two categories, an SVM training algorithm builds a model that assigns new wxamples into one category or the other, making it a non-probabilistic binary linear classifier

An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gop that is as wide as possible.  

New examples are then mapped into that same space and predicted to beling to a category based on which side the gap they fall on.   

Hyperplanes - maximizes the margin between classes  