# Machine Learning & Decision-Making – Training Worksheet

### Topics: Markov Decision Processes (MDP), Python Programming, Decision Trees, and K-Means Clustering

### Section 1: Markov Decision Processes (MDP)

### 1. Define a Markov Decision Process (MDP).

##### A Markov Decision Process (MDP) is a mathematical framework used to model decision-making problems where actions affect future outcomes. it is mainly used in Reinforcement Learning when an agent interacts with an environment over time to maximize rewards.

### 2. List and explain the five components of an MDP.

##### 1.States (S) - All possible situations the agent can be in.

##### 2.Actions (A) - All possible actions the agent can take.

##### 3.Transition Probability (P) – Probability of moving from one state to another after an action.

##### 4.Reward (R) – Immediate feedback received after taking an action.

##### 5.Discount Factor (γ) – Decides how much future rewards matter (0 ≤ γ ≤ 1).

### 3. Explain the Markov Property.

##### The Markov Property means, the future depends only on the present state, not on the past.

##### Example: If you know where the robot is now, you don’t need to know how it reached there to decide the next move.

### 4. Difference between policy vs value function, and reward vs return.
#### Policy vs Value Function

##### Policy (π): A rule that tells which action to take in a given state.

##### Value Function (V): Tells how good a state is in the long run.

#### Reward vs Return

##### Reward: Immediate feedback.

##### Return: Total accumulated reward over time.

### Scenario: A robot moves in a grid with rewards for reaching a goal and penalties per move. Identify states, actions, rewards, task type, and define a deterministic policy.

##### States: Each cell in the grid

##### Actions: Up, Down, Left, Right

##### Rewards: +10 for goal, −1 per move

##### Task Type: Episodic (ends when goal is reached)

##### Deterministic Policy: Always move toward the goal

### Numerical Exercise: Given γ = 0.9 and rewards [5, 2, 1], compute the return G■.

##### Given:γ= 0.9, Rewards = [5,2,1]
##### Return: G = 5+(0.9 * 2)+(0.9² × 1) G = 5 + 1.8 + 0.81 = 7.61

## Section 2: Python Programming Basic
#### 1. Even Numbers (1-20)

In [37]:
evens = []
for i in range(1,21):
    if i % 2 == 0:
        evens.append(i)
print(evens)

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]


##### List comprehension

In [38]:
evens = [i for i in range(1,21) if i % 2 == 0]
print(evens)

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]


### 2. Write a function to compute the mean of a list.

In [39]:
def mean(numbers):
    return sum (numbers)/len(numbers)
print(mean([1,2,3,4,5]))

3.0


### 3. Create NumPy arrays and compute statistics.

In [40]:
import numpy as np
arr = np.array([10,20,30,40])
print(arr.mean())
print(arr.std())

25.0
11.180339887498949


### 4. Create a Pandas DataFrame and filter rows based on conditions.


In [41]:
import pandas as pd

data = {'age':[22,25,30], 'Salary': [30000, 50000, 70000]}
df = pd.DataFrame(data)

print(df[df["Salary"]> 40000])

   age  Salary
1   25   50000
2   30   70000


### Debug the following code:

In [42]:
for i in range(5):
    print(i)

0
1
2
3
4


## Section 3: Decision Trees

### 1. What problems do decision trees solve?

##### Classification (Yes/No, Spam/Not Spam)
##### Regression (price prediction)

### 2. Define entropy and information gain.

##### Entropy: Measures impurity or randomness
##### information Gain : Reduction in emtropy after. a split

### 3. Why do decision trees overfit?

##### Decision trees keep splitting until they memorize data. Deep Trees = hing variance

### 4. Given a small dataset, decide the best root node intuitively.

##### Choose the feature that. splits data most clearly ( highest information gain).

### Python Practice: Train a DecisionTreeClassifier with max_depth=3 and generate predictions.

In [43]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

In [44]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris


X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)


model = DecisionTreeClassifier(max_depth=3)
model.fit(X_train, y_train)


print(model.predict(X_test))

[1 2 0 1 0 0 2 1 0 1 1 0 1 0 1 0 2 0 0 1 2 2 0 0 2 0 2 1 1 0 2 1 0 0 0 0 0
 2]


## Section 4: K-Means Clustering

#### 1. Is K-Means supervised or unsupervised?

##### K-means is Unsupervised leaning

### 2. Describe the K-Means algorithm steps.

##### 1. Choose K-clusters
##### 2. Assign points to nearest centroid
##### 3. Update centroids
##### 4. Repeat unitl convergence

### 3. Why is feature scaling important?

##### Distance-based algorithms need same scale; otherwise large values dominate.

### 4. Explain the Elbow Method and inertia.

##### Plots K vs inertia to find optimal K where decrease slows.

### Python Practice: Fit a KMeans model with k=3 and extract cluster labels.


In [47]:
from sklearn.cluster import KMeans
import numpy as np

X = np.array([[20, 20000], [25, 50000],[40, 90000]])

Kmeans = KMeans(n_clusters=3)
labels = Kmeans.fit_predict(X)
print(labels)

[0 2 1]


## Section 5: Applied Scenario

You are given customer data (Age, Income, Spending Score). Choose between Decision Tree, K-Means, or MDP for segmentation and justify your choice. If recommendations are sequential, which concept applies?

#### K-Means:
##### 1. No Lebels
##### 2. Goal is customer segmentation

## End of Worksheet