# <font color="#418FDE" size="6.5" uppercase>**Features And Targets**</font>

>Last update: 20260131.
    
By the end of this Lecture, you will be able to:
- Define features and targets in the context of supervised learning. 
- Identify features and targets in simple problem descriptions and tables. 
- Compare alternative feature choices for a basic prediction task. 


## **1. Features and Targets**

### **1.1. Understanding Features**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_02/Lecture_B/image_01_01.jpg?v=1769918083" width="250">



>* Features are input details describing each example
>* Together they let models find patterns and predict

>* Features may be numeric, categorical, or binary
>* Raw data can be transformed into clearer features

>* Use only timely, prediction-relevant information as features
>* Avoid future data leaks; ensure features are observable



### **1.2. Understanding Targets**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_02/Lecture_B/image_01_02.jpg?v=1769918094" width="250">



>* Target is the outcome the model predicts
>* Known targets guide learning to improve future predictions

>* Targets may be numbers, categories, or structures
>* Target type guides model, loss, and metrics

>* Targets reflect noisy, biased real-world data
>* Careful target design affects learning, accuracy, fairness



### **1.3. Real World Examples**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_02/Lecture_B/image_01_03.jpg?v=1769918107" width="250">



>* House price is the target to predict
>* House details are features used to learn patterns

>* Patient data features predict future health risk
>* Model learns from labeled records to estimate probability

>* Spam filters use email properties as features
>* Model predicts spam label as target category



## **2. Choosing Prediction Targets**

### **2.1. Defining Prediction Questions**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_02/Lecture_B/image_02_01.jpg?v=1769918121" width="250">



>* Define a precise question for model predictions
>* Question choice sets target variable and features

>* Targets are future outcomes, unknown at prediction
>* Features are known details used to label columns

>* Same dataset can answer different prediction questions
>* State question first, then choose targets and features



### **2.2. Classification and Regression**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_02/Lecture_B/image_02_02.jpg?v=1769918132" width="250">



>* Classification targets are categories, not numbers
>* Look for limited labels like yes/no, approved/denied

>* Regression predicts numeric amounts like price or time
>* Numeric targets change error meaning and evaluation

>* Decide if the target is category or number
>* This choice defines task type and feature use



### **2.3. Prediction Timeframe**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_02/Lecture_B/image_02_03.jpg?v=1769918144" width="250">



>* Prediction timeframe links features to outcome timing
>* Look for phrases that specify when prediction applies

>* Use only information known at prediction time
>* Earlier columns are features; later outcomes are targets

>* Different timeframes create separate prediction targets
>* Match each targetâ€™s column to its timeframe



## **3. Evaluating Feature Usefulness**

### **3.1. Comparing Feature Perspectives**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_02/Lecture_B/image_03_01.jpg?v=1769918157" width="250">



>* Experts describe real-world concepts; scientists model variables
>* Different feature choices reflect different views of reality

>* Users want simple, observable, ethical features
>* Builders balance accuracy with fairness and transparency

>* Short-term features favor easy, highly correlated data
>* Long-term features prioritize stability, fairness, and durability



### **3.2. Spotting Useful Features**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_02/Lecture_B/image_03_02.jpg?v=1769918169" width="250">



>* Choose features plausibly linked to the outcome
>* Ask why each feature might affect predictions

>* Look for features that vary with outcomes
>* Prefer features whose changes track target changes

>* Prefer specific, stable, timely features for prediction
>* Combine these traits to systematically judge feature usefulness



### **3.3. Information Rich Features**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_02/Lecture_B/image_03_03.jpg?v=1769918181" width="250">



>* Information rich features strongly track target changes
>* They hold concentrated signal, not random noise

>* Avoid features that mostly repeat each other
>* Prefer features adding new, complementary information

>* Reliable, precise measurements make features more useful
>* Prefer features close to real causal processes



In [None]:
#@title Python Code - Information Rich Features

# This script compares simple feature usefulness visually.
# It shows which feature is information rich.
# We use tiny synthetic data for clarity.

# import required numerical and plotting libraries.
import numpy as np
import matplotlib.pyplot as plt

# set deterministic random seed for reproducibility.
np.random.seed(42)

# create small array of study hours as main feature.
study_hours = np.array([1, 2, 3, 4, 5, 6, 7, 8])

# create exam scores mostly depending on study hours.
exam_scores = 50 + study_hours * 5 + np.array([0, 2, -1, 1, 0, -2, 1, 0])

# create backpack color codes as uninformative feature.
backpack_color_code = np.array([1, 2, 1, 3, 2, 1, 3, 2])

# compute correlation between study hours and exam scores.
corr_study = np.corrcoef(study_hours, exam_scores)[0, 1]

# compute correlation between backpack color and exam scores.
corr_color = np.corrcoef(backpack_color_code, exam_scores)[0, 1]

# print short summary comparing both feature correlations.
print("Correlation study_hours vs exam_scores:", round(float(corr_study), 3))
print("Correlation backpack_color vs exam_scores:", round(float(corr_color), 3))

# create figure with two subplots for visual comparison.
fig, axes = plt.subplots(1, 2, figsize=(8, 4))

# plot exam scores against study hours as scatter.
axes[0].scatter(study_hours, exam_scores, color="tab:blue")
axes[0].set_title("Study hours vs exam score")
axes[0].set_xlabel("Study hours feature")
axes[0].set_ylabel("Exam score target")

# plot exam scores against backpack color codes.
axes[1].scatter(backpack_color_code, exam_scores, color="tab:orange")
axes[1].set_title("Backpack color vs exam score")
axes[1].set_xlabel("Backpack color feature")
axes[1].set_ylabel("Exam score target")

# adjust layout and display the single comparison figure.
plt.tight_layout()
plt.show()




# <font color="#418FDE" size="6.5" uppercase>**Features And Targets**</font>


In this lecture, you learned to:
- Define features and targets in the context of supervised learning. 
- Identify features and targets in simple problem descriptions and tables. 
- Compare alternative feature choices for a basic prediction task. 

In the next Module (Module 3), we will go over 'Math Essentials'