# Quick journey into Unsupervised Learning

> Most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don’t know how to make the cake. We need to solve the unsupervised learning problem before we can even think of getting to true AI.

-- _Yann LeCun_

![](https://miro.medium.com/max/1062/1*iRO5wpHndXLmiD_ljoY8zw.png)

#### Supervised vs Unsupervised ML Settings

In **Supervised Learning**, we have a dataset consisting of both input features and a desired output, such as in the spam / no-spam example.

The task is to construct a model (or program) which is able to predict the desired output of an unseen object
given the set of features.

![supervised examples](images/ml_supervised_example.png)

Supervised learning is further broken down into two categories, **classification** and **regression**.

In classification, the label is discrete (a.k.a. *Categorical Data*, i.e. *Integer values*), such as "spam" or "no spam". 

In other words, it provides a clear-cut distinction between categories. 

In regression, the label is continuous, i.e. *floating point numerical output*.

---

In **Unsupervised Learning** there is no desired output associated with the data.

Instead, we are interested in extracting some form of knowledge or model from the given data.

In a sense, you can think of unsupervised learning as a means of discovering labels from the data itself.

Unsupervised learning comprises tasks such as **dimensionality reduction**, **clustering**, and **feature extraction**. 

![unsupervised examples](images/ml_unsupervised_example.png)

###### Data Workflows

**Supervised ML workflow**

![supervised workflow](./images/supervised_workflow.svg)

**Unsupervised ML workflow**

<img src="./images/unsupervised_workflow.svg" class="maxw50" />

**It is not two separate worlds though..**

##### Unsupervised learning to improve Supervised ML solutions

Recent successes in machine learning have been driven by the availability of **lots of data**, advances in computer hardware and cloud-based resources, and breakthroughs in machine learning algorithms. But these successes have been in mostly narrow AI problems such as image classification, computer vision, speech recognition, natural language processing, and machine translation.

###### Insufficient Labeled data

> I think AI is akin to building a rocket ship. You need a huge engine and a lot of fuel. If you have a large engine and a tiny amount of fuel, you won’t make it to orbit. If you have a tiny engine and a ton of fuel, you can’t even lift off. To build a rocket you need a huge engine and a lot of fuel.

-- _Andew Ng_

If machine learning were a rocket ship, data would be the fuel—without lots
and lots of data, the rocket ship cannot fly. But not all data is created equal.
To use supervised algorithms, we need lots of labeled data, which is hard and costly to generate.

With unsupervised learning, we can automatically label unlabeled examples (**clustering**), or we can also **generate new data** (e.g. Variational Autoencoders, Generative Adversarial Networks). 

###### Curse of Dimensionality

Even with the advances in computational power, big data is hard for machine learning algorithms to manage. In general, adding more instances is not too problematic because we can parallelize operations. 

However, the **more features** we have, the **more difficult** training becomes.

In a very high-dimensional space, supervised algorithms need to learn how to separate points and build a function approximation to make good decisions. 

When the features are very numerous, this search becomes very expensive, both from a time and compute perspective. In some cases, it may be impossible to find a good solution fast enough.

This problem is known as the **curse of dimensionality**, and unsupervised learning is well suited to help manage this. 

With **dimensionality reduction**, we can find the most salient features in the original feature set, reduce the number of dimensions to a more manageable number while losing very little important information in the process, and then apply supervised algorithms to more efficiently perform the search for a good function approximation. 

###### Feature Engineering

Feature engineering is one of the most vital tasks data scientists perform. 

Without the **right** features, the machine learning algorithm will not be able to separate points in space well enough to make good decisions on *never-before-seen* examples. 

In ML jargon, it is very common to repeat the mantra: _Garbage in, Garbage out_ .

However, feature engineering is typically very labor-intensive; it requires humans to creatively hand-engineer the right types of features. 

Instead, we can use **representation learning** from unsupervised learning algorithms to automatically learn the right types of feature representations to help solve the task at hand.

---

## Scikit-learn Cheatsheet

_or.. a possible rule-of-thumbs map to guide you through understanding the `Machine learning` $\mapsto$ `Data` relationship_

<img src="./images/scikit-learn-cheatsheet.png" class="maxw100" />