# Introduction to Machine Learning

## Fundamentals of Machine Learning

* **Learn from Data**. Computers can learn from data without being explicitly programmed for every specific task (<u>rule-based</u> &rarr; <u>data-driven</u>). 

* **Identify Patterns**. Algorithms can automatically find patterns, relationships, and insights within data.

* **Generalization**. Ability to make predictions on new, unseen data

* **Algorithms and Models**. Algorithms (sets of instructions) are used to build models (representations of the learned patterns).

## Learning Paradigms and Machine Learning Tasks

Machine learning tasks can be categorized according to the learning paradigm and the specific goal they pursue:

### Supervised Learning

* Each input data is associated with a known output or target variable.
* The goal is to learn a *mapping function* that can predict the output for new, unseen input data.

Some common tasks: Classification & Regression

### Classification Task
* **Goal**: Assign data points to <u>predefined</u> categories or classes.
* The output variable is categorical.
    * **Binary Classification**: Predicting one of two classes.
    * **Multi-class Classification**: Predicting one of more than two classes.
* Examples:
    * <u>Image Classification</u>: Identifying objects in an image (e.g., cat, dog, car).   
    * <u>Spam Detection</u>: Classifying emails as spam or not spam.   
    * <u>Medical Diagnosis</u>: Determining if a patient has a certain disease based on symptoms and test results.   
    * <u>Fraud Detection</u>: Identifying fraudulent transactions based on user behavior and transaction details.   

A model that fits a task can be used <u>regardless of what the inputs and outputs represent</u>.

* If a model fits a binary text classification 

    * Sentiment Analysis: Classifying text (e.g., reviews, tweets) as positive, negative, or neutral.   

* Unsupervised Learning:


* Semi-Supervised Learning:


* Self-Supervised Learning:


* Reinforcement Learning:


There are several different types of tasks in machine learning, broadly categorized by the learning paradigm and the specific goal. Here's a breakdown of the main categories and some common tasks within them:   

* Supervised Learning: In supervised learning, the algorithm learns from labeled data, meaning each data point is associated with a known output or target variable. The goal is to learn a mapping function that can predict the output for new, unseen input data.   

   * Classification: The goal is to assign data points to predefined categories or classes. The output variable is discrete.
      * Binary Classification: Predicting one of two classes (e.g., spam/not spam, cat/dog).   
      * Multi-class Classification: Predicting one of more than two classes (e.g., identifying different types of flowers, classifying news articles into topics).   
  
   * Regression: The goal is to predict a continuous numerical value. The output variable is continuous.
      *Linear Regression: Predicting a value based on a linear relationship with the input features (e.g., predicting house prices based on size).   
      * Polynomial Regression: Predicting a value based on a polynomial relationship with the input features.   
      * Time Series Forecasting: Predicting future values in a sequence based on past values (e.g., predicting stock prices, weather forecasting).   
  
* Unsupervised Learning: In unsupervised learning, the algorithm learns from unlabeled data, without any explicit output or target variable. The goal is to discover hidden patterns, structures, or relationships in the data.   

   * Clustering: Grouping similar data points together based on their features without prior knowledge of the groups (e.g., customer segmentation, document analysis).
      * K-Means Clustering: Partitioning data into k clusters based on distance to centroids.   
      * Hierarchical Clustering: Creating a tree-like structure of clusters.   
      * Density-Based Clustering: Identifying clusters based on the density of data points.   
  
   * Dimensionality Reduction: Reducing the number of features in a dataset while preserving its essential information (e.g., data visualization, feature extraction).
      * Principal Component Analysis (PCA): Finding the principal components that capture the most variance in the data.   
      * t-distributed Stochastic Neighbor Embedding (t-SNE): Reducing dimensionality for visualizing high-dimensional data.   
  
   * Association Rule Mining: Discovering interesting relationships or associations between variables in large datasets (e.g., market basket analysis).
      * Apriori Algorithm: Finding frequent itemsets in transactional data.   
      * Eclat Algorithm: Another algorithm for finding frequent itemsets.   
  
   * Anomaly Detection: Identifying data points that deviate significantly from the normal behavior or patterns in the data (e.g., fraud detection, fault detection).   

* Reinforcement Learning: In reinforcement learning, an agent learns to interact with an environment by taking actions and receiving rewards or penalties. The goal is for the agent to learn an optimal policy (a mapping from states to actions) that maximizes the cumulative reward over time.   

   * Control Tasks: Learning to control a system or agent to achieve a specific goal (e.g., robotics, autonomous driving).
   * Game Playing: Training agents to play games against opponents (e.g., AlphaGo, Atari games).   
   * Recommendation Systems: Optimizing recommendations based on user feedback and rewards.   
   * Resource Management: Learning to allocate resources efficiently.   

* Other Types of Machine Learning Tasks (Sometimes Considered Subcategories or Hybrid Approaches):

   * Semi-Supervised Learning: Learning from a combination of labeled and unlabeled data. This is useful when labeling data is expensive or time-consuming.   
   * Self-Supervised Learning: Learning from unlabeled data where the labels are generated from the data itself through a pretext task (e.g., predicting a missing part of an image). The learned representations can then be used for downstream supervised tasks.   
   * Machine Translation: Translating text from one language to another. This can be approached with sequence-to-sequence models in supervised learning.   
   * Transcription: Converting unstructured data like audio into a structured format like text (e.g., speech recognition).   
Synthesis and Sampling: Generating new data samples that are similar to the training data (e.g., generating images, text, or music).

The choice of machine learning task depends heavily on the nature of the data, the problem you are trying to solve, and the desired outcome. Understanding these different types of tasks is fundamental to applying machine learning effectively.