# The Machine Learning Landscape

### What is Machine Learning

*Machine Learning -* **(1)** is the science (and art) of programming computers to learn from data. **(2)** the field of study that givse computers the ability to learn without being explicitly programmed.

### Understanding a Machine Learning Program

**Spam Filter Example**

The examples that the system uses to learn are called the *training set*. Each training example is called a *training instance (or sample)*. In this case, the *task T* is to flag spam for new emails, the *experience E* is the training data, and the *performance measure P* needs to be defined; for example, you can use the ratio of correctly classified emails. This particular performance measure is called *accuracy*, and it is often used in classification tasks. \
\
One of central arguments for machine learning is scalability. It would be very hard to scale a spam filter using deterministic means, in contrast to statistical methods. \
\
With machine learning, we can create ML pipelines as data becomes more granular, and the data set grows. In essense, you can scale machine learning solutions in CI/CD manner by creating a continuous feed back loop. \
\
Also, *Data Mining* is a machine learning technique that creates insight and discovery in large ammounts of data. In other words, a program that helps humans see correlations and patterns that otherwise would not be discovered. \
\
To summarize, Machine Learning is great for:

* Problems for which existing solutions require a lot of fine-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better than the traditional approach.


* Complex problems for which using a traditional approach yields no good solution: the best Machine Learning techniques can perhaps find a solution.


* Fluctuating environments: a Machine Learning system can adapt to new data. 


* Getting insights about complex problems and large amounts of data.

### Types of Machine Learning Systems

There are so many different types of learning systems that it is useful to classify them broadly:

* Whether or not they are trained with human supervision (supervised, unsupervised, semisupervised,


* Whether or not they can learn incrementally on the fly (online versus batch learning)


* Whether they work by simply comparing new data points to known data points, or instead by detecting patterns in the training data and building a predictive model, much like scientists do (instance-based versus model-based learning)



#### Supervised vs. Unsupervised Learning

##### *Supervised*
In a supervised learning system, the training set you feed to the system includes the **Desired Solution**, known as **labels**. \
\
The typical supervised learning tasks are *Classification Problems* and *Numeric Prediction* using *Targets*. Numerical target prediction is acheived when the alorithm is given *Features*, such as (mileage, age, brand, etc.) called *predictors*. This is called *regression*.



##### Notes

*In Machine Learning an attribute is a data type (e.g., “mileage”), while a feature has several meanings, depending on the context, but generally means an attribute plus its value (e.g., “mileage = 15,000”). Many people use the words attribute and feature interchangeably.*


##### Notable Supervised Systems

* k-Nearest Neighbors

* Linear Regression

* Logistic Regression

* Support Vector Machines (SVMs)

* Decision Trees Random Forests

* Neural networks


##### *Unsupervised*

In an unsupervised learning system, the training data is **unlabled**.


##### Notable Supervised Systems

* Clustering
    - K-Means
    - DBSCAN
    - Hierarchical Cluster Analysis (HCA)


* Anomaly and Novelty Detection
    - One-class SVM
    - Isolation Forest
    

* Visualization and Dimensionality Reduction
    - Principal Component Analysis (PCA)
    - Kernel PCA
    - Locally Linear Embedding (LLE)
    - t-Distributed Stochastic Neighbor Embedding (t-SNE)
    

* Association rule learning
    - Apriori
    - Eclat
    
    
    
##### Notes:

*Feature Extraction -* the process of merging correlated features to simplify the prediction process.


*Anomoly Detection -* can be used to remove outliers to help remove unneccesary data from a training set.


*Novelty Detection -* can be used to identify "unique training instances" when fed into a dataset.


*Association Rule Learning -* Can be used to find correlations between data features.


##### Semisupervised Learning

Algorithms that are efficient at performing tasks when *traning data* sets are partially labled.


**Ex.** Some photo-hosting services, such as Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just add one label for the person.


##### Reinforcement Learning


In this context, the learning system, an *agent*, can observe the environment, select and perform actions, and recieve *rewards* or *penalties*. The agent must then learn a *policy*, or optimal strategy to perfrom the task. A *policy*
defines what an *agent* should do in a certain environmental condition.



##### Batch Learning

A system that cannot learn incrementally. In other words, it must be trained with all of the avalible data. This requires a lot of data, so it is typically done *offline*. \
\
Today, it is much easier to incorporate updated, full datasets in machine learning pipeline.


##### Online Learning

Systems that can learn incrementally by feeding it data sequentially, either individually, or in small groups *mini-batches*. \
\
These systems are great for programs that have a continous flow of data (i.e. stock data), and need to change rapidly or autonomously.


##### Notes:

An important parameter in online learning is the *learning rate*. If you set the learning rate high, then your system will learn with new data quickly, but it will forget the old data. Conversely, if you set it low, it will learn at a slower pace, but it will be less sensitive to noise and *outliers*.



##### Instance Based Learning vs. Model Learning

One of the ways to classify machine learning is how they *generalize*, or make preditions about new data coming in as inputs. Having a performance measure on the training data is all well and good, but the ultimate goal is in generalizing new data instances coming in.


##### Instance Based Learning

The system learns the examples by heart, then generalizes to new cases by using a similarity measure to compare them to the learned examples (or a subset of them).


##### Model-based Learning

Another way to generalize from a set of examples is to build a model of these examples and then use that model to make *predictions*.


##### Notes:

When testing how well your model describes the data, using a *performance measure*, you can either use a *utility function*, or *fitness function*, that measures how *well* your model performs, or you can define a *cost function*. A *cost function* measures how poorly your model performs.


**Training** is using *utility functions*, and/or *cost functions* to optimize your models for prediction.


## Training Example 1

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn.linear_model

In [37]:
oecd_df = pd.read_csv('./data/oecd_bli.csv', index_col='Country', delimiter=',').dropna(axis='columns')
gdp_per_captia_per_country = pd.read_csv('./data/gdp-per-capita-worldbank.csv', delimiter=',', index_col=['Entity', 'Year'])

### OECD Happiness Data By Country

In [38]:
oecd_df.head()

Unnamed: 0_level_0,LOCATION,INDICATOR,Indicator,MEASURE,Measure,INEQUALITY,Inequality,Unit Code,Unit,PowerCode Code,PowerCode,Value
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Australia,AUS,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,5.4
Austria,AUT,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,3.5
Belgium,BEL,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,3.7
Canada,CAN,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,6.0
Czech Republic,CZE,JE_LMIS,Labour market insecurity,L,Value,TOT,Total,PC,Percentage,0,Units,3.1


### GDP Per Capita

In [41]:
gdp_per_captia_per_country.head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Code,GDP per capita (int.-$) (constant 2011 international $)
Entity,Year,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,2002,AFG,1063.635574
Afghanistan,2003,AFG,1099.194507
Afghanistan,2004,AFG,1062.24936
Afghanistan,2005,AFG,1136.123214
Afghanistan,2006,AFG,1161.124889
Afghanistan,2007,AFG,1284.775213
Afghanistan,2008,AFG,1298.143159
Afghanistan,2009,AFG,1531.173993
Afghanistan,2010,AFG,1614.255001
Afghanistan,2011,AFG,1660.739856


In [None]:
#### View 