### **Introduction to Unsupervised Learning**

- **"After supervised learning, the most widely used form of machine learning is unsupervised learning."**  
  This means that after supervised learning (where we teach the algorithm using labeled data), unsupervised learning is the next most popular method.

- **"Let's take a look at what that means."**  
  Now, we'll explore what "unsupervised learning" is and how it works.

---

### **Comparing Supervised and Unsupervised Learning**

- **"When we're looking at supervised learning in the last video, recall it looks something like this in the case of a classification problem."**  
  In supervised learning, we have input data (like tumor size and patient age) and output labels (like whether a tumor is benign or malignant).

- **"Each example was associated with an output label y such as benign or malignant, designated by the poles and crosses."**  
  Supervised learning uses examples with clear answers or "labels," like 'X' for malignant and 'O' for benign.

- **"In unsupervised learning, we're given data that isn't associated with any output labels y."**  
  In unsupervised learning, we only have input data and **no labels** telling us the "right" answer.

---

### **An Example of Unsupervised Learning**

- **"Say you're given data on patients and their tumor size and the patient's age, but not whether the tumor was benign or malignant."**  
  Imagine you have data like:

  - Patient A: Age 40, Tumor size 2 cm
  - Patient B: Age 60, Tumor size 4 cm  
    But you don’t know if their tumors are benign (harmless) or malignant (dangerous).

- **"The dataset looks like this on the right."**  
  This refers to the chart or graph where the data points (patients) are plotted based on their age and tumor size.

  ![Example Image](Unsupervised.png)

- **"We're not asked to diagnose whether the tumor is benign or malignant, because we're not given any labels y in the dataset."**  
  Unlike supervised learning, we don’t have answers (labels) like "benign" or "malignant" to guide us.

- **"Instead, our job is to find some structure or some pattern or just find something interesting in the data."**  
  Here, the goal is to figure out patterns in the data without having labels to guide us.

- **"This is unsupervised learning."**  
  This is what unsupervised learning is all about: discovering patterns or relationships in unlabeled data.

---

### **Clustering Example**

- **"An unsupervised learning algorithm might decide that the data can be assigned to two different groups or two different clusters."**  
  The algorithm might notice that the data naturally forms two groups (clusters) based on similarities.

- **"And so it might decide that there's one cluster or group over here, and there's another cluster or group over here."**  
  For example:

  - Cluster 1: Patients with small tumors and younger ages.
  - Cluster 2: Patients with large tumors and older ages.

- **"This is a particular type of unsupervised learning called a clustering algorithm."**  
  Clustering is a type of unsupervised learning where data is grouped into similar clusters.

---

### **Clustering in Real Life: Google News Example**

![Example Image](GoogleNews.png)

- **"For example, clustering is used in Google News."**  
  Clustering is used by Google News to group related news stories together.

- **"What Google News does is every day it looks at hundreds of thousands of news articles on the internet and groups related stories together."**  
  Google’s algorithm reads all the news articles on the internet and identifies which ones are related.

- **"For example, here is a sample from Google News, where the headline of the top article is 'Giant panda gives birth to rare twin cubs at Japan's oldest zoo.'”**  
  Let’s say this is one article. The algorithm looks at all articles and finds others with similar words or topics.

- **"The algorithm notices repeated words like "panda," "twins," and "zoo" and groups articles with these common words together."**

- **"The clustering algorithm is finding articles that mention similar words and grouping them into clusters."**  
  The algorithm groups related articles into one cluster based on shared words.

- **"There isn’t an employee at Google News who’s telling the algorithm to find articles with these specific words."**  
  No one manually tells the algorithm what to group. The algorithm figures it out on its own. This is why it’s "unsupervised."

---

### **Clustering Genetic Data Example**

- **"This image shows a picture of DNA microarray data."**  
  DNA microarrays are like spreadsheets showing genetic information for multiple individuals.

- **"Each column represents one person, and each row represents a gene."**  
  Think of it like this:

  - Each column is a person (Person A, Person B, etc.).
  - Each row is a gene (e.g., eye color gene, height gene).

- **"What you can do is run a clustering algorithm to group individuals into categories or types of people."**  
  The algorithm looks at patterns in the data and groups similar people together based on their genetic traits.

---

### **Clustering Customers Example**

- **"Many companies have huge databases of customer information."**  
  Companies collect a lot of data about their customers, like age, shopping habits, or spending patterns.

- **"Can you automatically group your customers into different market segments?"**  
  Businesses use clustering to identify customer groups (e.g., young shoppers, budget-conscious buyers, etc.).

---

### **Summary**

- **"To summarize, a clustering algorithm, which is a type of unsupervised learning algorithm, takes data without labels and tries to automatically group them into clusters."**  
  Clustering groups unlabeled data into clusters based on similarities.

- **"So maybe the next time you see or think of a panda, maybe you think of clustering as well."**  
  A fun way to remember clustering—think of pandas grouped together based on their shared features!
