# 🧠 Assignment 1: Understanding Machine Learning Basics

Welcome! In this notebook, we’ll explore the core concepts of **Machine Learning (ML)** and implement a simple ML pipeline using the Iris dataset.

---

**Objectives:**
- Explain types of ML with examples and analogies
- Use a real dataset for demonstration
- Build, train, and test a classifier
- Interpret the model results


## 📘 What is Machine Learning?

Machine Learning is a method where computers **learn from data** to make decisions or predictions.

---

### 🎓 Supervised Learning
> Like teaching a child using flashcards.

We feed the model with *features* (inputs) and *labels* (correct answers), so it learns to map input to output.

🟢 **Use Case**: Predict if a customer will churn  
🛠️ Algorithms: Logistic Regression, Decision Tree, SVM

---

### 🧩 Unsupervised Learning
> Like letting a child explore toys without labels.

The model finds hidden patterns or groups in the data without being told what to look for.

🟢 **Use Case**: Grouping customers by behavior  
🛠️ Algorithms: K-Means, PCA, DBSCAN

---

### 🕹️ Reinforcement Learning
> Like training a dog: reward good behavior, ignore bad.

The model learns by receiving rewards or penalties for actions.

🟢 **Use Case**: Robotics, game-playing agents  
🛠️ Algorithms: Q-Learning, PPO, DDPG


## 📊 Dataset Used: Iris Dataset

In [None]:
from sklearn.datasets import load_iris
import pandas as pd

# Load Iris dataset
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Display first 5 rows
df.head()

**Features:**
- Sepal length
- Sepal width
- Petal length
- Petal width

**Target:**
- `species` → One of `setosa`, `versicolor`, `virginica`

This dataset is often used for testing classification models because it is small and well-structured.


## 🛠️ ML Workflow: Train-Test Split and Model Training

In [None]:
from sklearn.model_selection import train_test_split

# Split into features and target
X = df[iris.feature_names]
y = df['species']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [None]:
from sklearn.tree import DecisionTreeClassifier

# Initialize and train model
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

In [None]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Make predictions
y_pred = model.predict(X_test)

# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

accuracy, report, conf_matrix

## ✅ Result Interpretation

- **Accuracy**: 100% on test data (30 samples)
- **Classification Report**: Perfect precision and recall for all 3 classes
- **Confusion Matrix**:
  ```
  [[10,  0,  0],
   [ 0,  9,  0],
   [ 0,  0, 11]]
  ```

---

### 🧠 What Did the Model Learn?

> "The model acts like a decision tree — a flowchart that splits data based on the most informative features. For example, it learned that *Setosa* flowers have short petals. So it uses petal length as a key decision point. It continues branching to find the most likely species."

This works very well for this dataset, but we must always test on larger and more complex datasets to ensure generalizability.


---

### 🎓 Summary

- ML teaches machines to learn from data (supervised, unsupervised, and reinforcement).
- We demonstrated a supervised learning pipeline using the Iris dataset.
- Model was trained and evaluated with excellent performance.
- Real-world datasets may have noise, imbalance, and require deeper tuning.

Feel free to experiment with different classifiers like `LogisticRegression`, `RandomForestClassifier`, or even use `GridSearchCV` for tuning!

