# Machine Learning Master Notes 1 - Introduction to Machine Learning

## What is Machine Learning?

>Arthur Sameul - `"Machine Learning (ML) is the field of study that gives computers the ability to learn without explicitly being programmed."`

1. A machine learning (ML) algorithm is a computer program that is able to make a prediction based on the input data without being explicitly programmed on how to arrive at the prediction.
2. Machine learning programs are trained to make predictions based on historical examples/data. 
3. Prediction results will be used as the performance metric against the machine learning model. We use such metrics to determine the performance of the machine learning models. 
4. The machine learning algorithm will then improve its performance with additional historical examples while fine tuning its parameters.


### Function of Machine Learning

1. Machine Learning is able to use data to explain what happened.
2. Machine Learning is able to use data to predict what will happen.
3. Machine Learning is able to make relevant suggestions based on historical data.

### Machine Learning Types

There are 2 main types of machine learning. They are **Supervised Learning** and **Unsupervised Learning**.

1. In **Supervised Learning**, we provide the machine learning model a predetermined set of input with the expected outcome. The learning model will need to determine the machine learning parameters by learning the historical data and use such parameters as the basis for the prediction. Examples of supervised learning are:
    - Predicting housing price based on historical housing size and prices;
    - Predicting the sales of product based on historical data;
    - Prediction of an email if it is a spam email based on the words in the email and historical data.


2. For **Unsupervised Learning**, we feed the machine learning model with raw input data and let the machine learning model discover the structure or pattern of the data by itself. Examples are:
    - Classifying relating news into a group.
    - Classifying similar data into the same group.
    - Detection data that are different from the normal.


### Supervised Learning

1. In **Supervised Learning**, we provide input data (x) together with the expected answer or outcome (y) in order for the machine learning model to learn. The learning model will learn the input data and determine the machine learning parameters by examining the input data with the expected answers. The learning model will develop its own parameters while learning from the data. After some iterations, the machine learning model is able to predict any new inputs without an expected answer using the learned parameters. The basis of prediction is based on previous historical data.
2. Common applications for supervised learning are:
    - predicting prices (such as housing prices, stock prices etc) based on historical trends,
    - predicting temperatures,
    - prediction production volume,
    - classifying email as spam or not spam,
    - determining if a tumor is benign or malignant,
    - translating a group of text into another language, or
    - converting an audio input to text output.
3. **Supervised Learning** can be further classified into **Regression or Classification** problems.
    - **Regression** problems are problems that require the machine learning (ML) model to make predictions from infinitely many possible answers. Examples for regression problems are predicting housing prices, stock price, temperature, water level, sales volume or production volume etc.
    - Supervised learning can also be used for **Classification** problems where the task of the ML model is to classify a problem/object into 2 or more classes/categories. For example, we can classify a tumor as malignant or benign; or we can classify an email as spam or not spam.
    - Besides classifying data into Yes/No, True/False; we can classify/categorize data into multiple classes/categories. For example, we can classify books into different genres such as Mystery,  SciFi, Thrillers, Self-Help, Business â€¦etc. This is considered as **Multi-Class Classification**.
    - A common example of classification problem is to determine if a tumor is benign or malignant based on its size and shape. In this application, we provided the ML model a set of historical data with tumor size, tumor shape and the expected answer. The ML model will examine the input data with the expected answer and fine tune its learning parameters. Once the training is done, the ML model will be able to predict a new tumor as benign or malignant without the expected outcome.
4. There are many types machine learning algorithms. The most basic ML algorithm is Linear Regression. However, we do not apply linear regression very often since most prediction we want to predict are not in a linear form. Most commonly, we use **Polynomial Regression** algorithms. We use **Logistic Regression** for classification problems. In addition, we also use other ML models such as **Neural Networks** and **Decision Trees** for supervised learning.
5. Most applications of machine learning are supervised learning.


### Unsupervised Learning
1. For **Unsupervised Learning**, we feed the computer program with raw data without expected or predetermined result. The ML model is supposed to examine the raw data and define its structure by detecting similar features or characteristics. 
2. The most common application of unsupervised learning is **Clustering**. Other common applications of unsupervised learning are **Anomaly Detection** and **Dimensionality Reduction**. 
3. An example of clustering is where Google News grouped various related news together. This ML model has to find similarity with input features given without referencing any labels or output.
4. We use anomaly detection to detect any data/activities that are not normal. Dimensionality reduction is used to compress data using fewer numbers.
5. For unsupervised learning, we use **K-Means Clustering** and **PCA (Principal Component Analysis)**. We also use **Gaussian Abnormally Detection Algorithm** to monitor and detect abnormality.


## Machine Learning Component 

There are 3 main components in a machine learning process. They are data, machine learning models and the result. See diagram below.

![MLd1a.jpg](attachment:7cc11919-335d-405e-84c2-bf2b1273d4c5.jpg)

### Data
- Data is the most important component, without data there will be no machine learning.
- For supervised learning, data consist of independent features and expected outcome. We need a minimum of one feature. In practice, we include many features to improve the model's accuracy.
- In unsupervised learning, data consists of only independent features.
- In the example of predicting housing prices, we predict housing prices based on size, number of rooms and location. Size, number of rooms and location are features.
- The historical transacted housing price is the expected outcome. 

### ML Model
- The machine learning model includes the methods and algorithms that will generate the prediction of the result.
- The method and algorithm usually involved some mathematical/statistical computation.
- The learning parameters are the component in the formula that is used to make new predictions.
- Additionally, we may add some control elements to control the speed and accuracy of the algorithm.

### Result
- In machine learning, the results are the analysis, prediction, classification and suggestion derived from the model.

## Machine Learning Process in Supervised Learning 

In supervised learning, the process also includes additional steps where the results were evaluated and adjustment or optimization was done to fine tune the learning parameters that generated the result.

![MLd1b.jpg](attachment:5999f069-8d1d-448a-8b3c-45bb859b4a0b.jpg)

### How Supervised Learning is Performed
1. Using an example of predicting housing prices, first we use a predetermined dataset with features such as the size of the house, location, number of rooms, age of the house together with the transacted price (expected outcome).
2. Then we separate the dataset into training and test dataset.
3. After that, we choose and adopt a learning model before we start training the learning algorithm with our data. We can use a regression model or neural network for this problem.
4. In this example, the ML model algorithm will examine each house input data/features together with the transacted sale price. It will formulate a set of learning parameters.
5. When training is done with the input features, the ML model will use the learned parameters to predict new housing prices with input features only.
6. We will check the model accuracy by predicting the outcome of the test data. The differences between the prediction and actual outcome will be used as the basis of the performance of the ML model.
7. We can choose to improve the learning parameters by adding more historical data with expected outcome.
8. If the performance is not satisfactory, we can adjust the input data by including more features.
9. We can also modify the machine learning model settings such as learning rate etc.
10. Increasing the number of iterations is also part of the optimization.
11. More historical data might also impact the performance of the process. The training cycle will be repeated by adjusting all the necessary parameters until the performance of the model is satisfactory.
12. If we still could not arrive at a satisfactory result, then we need to consider a different ML model in step 3.


## Machine Learning Advancement

**If we can perform machine learning using statistical computation, why did machine learning only start to be more effective in the last decade?**

We can answer the question above by briefly describing the historical progress of machine learning. Machine learning started as early as the 1950s and 1960s. In the beginning, most machine learning predictions are based on statistical algorithms such as regression, nearest neighbor, support vector machine etc. These statistical algorithms are sometimes called classical machine learning. 

However, due to lack of computational power, we cannot do very complex computation cheaply. Acquisition of data was also costly. For a regression problem, the lack of computational power and data causes the progress to be slow. Only organizations with large mainframe computers can perform large scale regression computation.

With the development of microcomputers and integrated circuits, computing power has become more commonly available. This trend improves computational power, and we are able to handle more demanding computation. Classical machine learning algorithms are embedded in enterprise statistical software.

However, there is one key element missing. Acquisition of data is still expensive. Development of neural networks also matures with development of back propagation. However, the neural network did not take off due to lack of massive amounts of data.
The beginning of Web 2.0 and accessibility of the internet starts around 2004 and the Internet becomes a business transactional platform. This transition generates a large amount of data. The quantity of data has become so big that there is a separate term called big data. 

Big data helps to improve machine learning performance, neural networks also start producing promising results around the 2010s.
Thus Neural Networks are also considered differently with classical machine learning.


## Reference

https://mitsloan.mit.edu/ideas-made-to-matter/machine-learning-explained#:~:text=It%20was%20defined%20in%20the,learn%20without%20explicitly%20being%20programmed.%E2%80%9D

https://www.coursera.org/specializations/machine-learning-introduction

## End of Note 1