# <font color="#418FDE" size="6.5" uppercase>**ML Problem Types**</font>

>Last update: 20260131.
    
By the end of this Lecture, you will be able to:
- Describe supervised, unsupervised, and reinforcement learning in intuitive terms. 
- Match simple problem descriptions to the appropriate machine learning category. 
- Identify the type of data and feedback signal required for each learning category. 


## **1. Supervised Learning Basics**

### **1.1. Inputs and Labels**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_01/Lecture_C/image_01_01.jpg?v=1769916633" width="250">



>* Model learns from paired inputs and answers
>* Labels guide learning like a teacher’s corrections

>* Inputs and labels can take many forms
>* Models learn patterns from many input–label examples

>* Messy inputs or wrong labels confuse models
>* Carefully chosen features and accurate labels boost performance



### **1.2. Classification and Regression**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_01/Lecture_C/image_01_02.jpg?v=1769916644" width="250">



>* Classification predicts discrete category labels from examples
>* Regression predicts continuous numeric values from features

>* Classification chooses from fixed category options
>* Regression predicts numeric values on continuous scales

>* Same data can support both prediction types
>* Choosing type guides data prep and evaluation



### **1.3. Loan Approval Prediction**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_01/Lecture_C/image_01_03.jpg?v=1769916654" width="250">



>* Bank uses past applicants’ details and outcomes
>* Model learns from feature–label pairs to predict

>* Model repeatedly adjusts itself using labeled examples
>* Learns numeric patterns to predict loan outcomes

>* Uses labeled past data to learn decisions
>* Applies learned patterns for accurate future predictions



## **2. Finding Hidden Groups**

### **2.1. Input Only Scenarios**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_01/Lecture_C/image_02_01.jpg?v=1769916669" width="250">



>* Only inputs, no known correct labels given
>* Model finds hidden patterns; this is unsupervised

>* Supervised uses labeled examples to predict answers
>* Unsupervised finds patterns from unlabeled input data

>* Reinforcement learning involves actions, rewards, and sequences
>* Unsupervised input scenarios use static data without labels



### **2.2. Discovering Hidden Structure**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_01/Lecture_C/image_02_02.jpg?v=1769916682" width="250">



>* Many unlabeled examples; we suspect hidden patterns
>* Algorithms group raw data, revealing underlying structure

>* Exploratory questions signal hidden-structure style problems
>* Algorithms group unlabeled data into meaningful clusters

>* Prediction with known labels is different
>* Grouping unlabeled data means discovering hidden structure



### **2.3. Customer Grouping Example**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_01/Lecture_C/image_02_03.jpg?v=1769916694" width="250">



>* Retailer uses rich customer data without labels
>* Algorithm finds unexpected customer groups for marketing

>* Clustering groups customers with similar shopping behavior
>* Humans interpret clusters and design targeted marketing actions

>* Unsupervised learning finds patterns in unlabeled data
>* Discovered clusters guide decisions across many domains



## **3. Reinforcement Learning Basics**

### **3.1. Agent Environment Interaction**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_01/Lecture_C/image_03_01.jpg?v=1769916708" width="250">



>* Agent repeatedly observes state and chooses actions
>* Environment returns new state and reward over time

>* Agent gets time-linked states, actions, rewards
>* Feedback is delayed, noisy; learning uses ongoing interaction

>* Agent actions create a continuous feedback loop
>* Traffic light example shows learning from changing rewards



### **3.2. Rewards and Actions**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_01/Lecture_C/image_03_02.jpg?v=1769916719" width="250">



>* Agent acts, environment returns simple numeric rewards
>* Agent learns to favor actions with higher rewards

>* Actions are discrete or continuous ways to act
>* Agent learns which actions pay off through rewards

>* Agent creates data; rewards are sparse, delayed
>* Learns policies maximizing long-term total reward



In [None]:
#@title Python Code - Rewards and Actions

# This script illustrates reinforcement learning rewards and actions.
# We simulate a tiny grid world environment example.
# Focus on data feedback signals instead of model details.

# Import required built in and numerical libraries.
import random
import numpy as np

# Set deterministic random seed for reproducible behavior.
random.seed(42)
np.random.seed(42)

# Define grid world size and terminal goal position.
GRID_ROWS, GRID_COLS = 2, 3
GOAL_STATE = (0, 2)

# Define available discrete actions for the agent.
ACTIONS = ["left", "right", "up", "down"]

# Define function returning next state and reward signal.
def step(state, action):
    row, col = state

    # Compute candidate next position based on chosen action.
    if action == "left":
        col = max(col - 1, 0)
    elif action == "right":
        col = min(col + 1, GRID_COLS - 1)

    if action == "up":
        row = max(row - 1, 0)
    elif action == "down":
        row = min(row + 1, GRID_ROWS - 1)

    next_state = (row, col)

    # Assign reward based on reaching goal or moving otherwise.
    if next_state == GOAL_STATE:
        reward = 1.0
    else:
        reward = -0.1

    return next_state, reward

# Define simple policy choosing random actions uniformly.
def random_policy(state):
    _ = state
    return random.choice(ACTIONS)

# Run several episodes to collect actions and rewards.
num_episodes = 5
max_steps_per_episode = 6

# Store trajectories for later inspection and printing.
trajectories = []

# Simulate episodes where agent explores environment.
for episode in range(num_episodes):
    state = (1, 0)
    episode_steps = []

    for step_index in range(max_steps_per_episode):
        action = random_policy(state)
        next_state, reward = step(state, action)

        episode_steps.append((state, action, reward, next_state))
        state = next_state

        if state == GOAL_STATE:
            break

    trajectories.append(episode_steps)

# Print concise summary emphasizing actions and rewards.
print("Episode summaries showing actions and reward feedback:")

# Iterate through trajectories and display key information.
for episode_index, episode_steps in enumerate(trajectories):
    total_reward = sum(step_info[2] for step_info in episode_steps)

    first_step = episode_steps[0]
    last_step = episode_steps[-1]

    print(
        "Episode", episode_index + 1,
        "started", first_step[0],
        "ended", last_step[3],
        "total_reward", round(total_reward, 2),
    )




### **3.3. Game Playing Agent**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_01/Lecture_C/image_03_03.jpg?v=1769916762" width="250">



>* Agent sees game state and chooses moves
>* Learns from delayed rewards instead of correct labels

>* Agent experiences state, chooses actions, receives rewards
>* Interactive trajectories teach strategies for long-term reward

>* RL uses delayed, sparse, sometimes noisy rewards
>* Agent learns policies from long histories of interactions



# <font color="#418FDE" size="6.5" uppercase>**ML Problem Types**</font>


In this lecture, you learned to:
- Describe supervised, unsupervised, and reinforcement learning in intuitive terms. 
- Match simple problem descriptions to the appropriate machine learning category. 
- Identify the type of data and feedback signal required for each learning category. 

In the next Module (Module 2), we will go over 'Data And Features'