# DAY 8: Introduction to Machine Learning + First Model

**1. What is Machine Learning (ML)?**

- Machine Learning is the ability of machines to learn patterns from data and make decisions without being explicitly programmed.

**2. Types of ML**  

| Type              | Example                     |
| ----------------- | --------------------------- |
| **Supervised**    | Predict marks, pass/fail    |
| **Unsupervised**  | Group students (clustering) |
| **Reinforcement** | AI game players, robots     |


We’ll start with **Supervised ML**.



**3. Common ML Terms**

| Term             | Meaning                                     |
| ---------------- | ------------------------------------------- |
| **Features (X)** | Input data (e.g., age, marks)               |
| **Target (y)**   | What we predict (e.g., passed: Yes/No)      |
| **Model**        | A trained machine that maps input to output |
| **Fit/Train**    | Teach the model using data                  |
| **Predict**      | Use model to make predictions               |
| **Accuracy**     | How good the model performs                 |


**4. First ML Model: Logistic Regression (Classification)**
- We’ll predict if a student will pass or not based on marks.
- This is called a classification problem because the output is a category (Yes/No)



# Code

**1. Importing libraries**

In [34]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score



**2. Importing csv file**

In [2]:
from google.colab import files
uploaded = files.upload()

Saving day8_students_data.csv to day8_students_data.csv


**3. Loading the Dataset**

In [45]:
df = pd.read_csv("day8_students_data.csv")

**4. Clean the Data**

In [46]:
df.dropna(inplace=True)
df

Unnamed: 0,Name,Age,Marks,Passed
0,Mritunjay,19,85,Yes
1,Priya,21,45,No
2,Rahul,20,78,Yes
3,Aditi,22,50,No
4,Vikram,23,90,Yes
5,Sita,20,33,No
6,Aman,21,60,Yes
7,Neha,19,88,Yes
8,Karan,22,70,Yes
9,Pooja,20,40,No


This removes rows where the data is missing (e.g., where Marks is empty), so the model doesn't break.



**5. Convert 'Passed' to Numbers**

In [47]:
df["Passed"] = df["Passed"].map({"Yes": 1, "No": 0})
df

Unnamed: 0,Name,Age,Marks,Passed
0,Mritunjay,19,85,1
1,Priya,21,45,0
2,Rahul,20,78,1
3,Aditi,22,50,0
4,Vikram,23,90,1
5,Sita,20,33,0
6,Aman,21,60,1
7,Neha,19,88,1
8,Karan,22,70,1
9,Pooja,20,40,0


ML models only understand numbers — not text like "Yes"/"No", so we convert:

"Yes" → 1

"No" → 0

This is called label encoding.

**6. Choose Input (X) and Output (y)**

In [48]:
X = df[["Marks"]]      # Feature (input)
y = df["Passed"]       # Target (what we want to predict)
print(X.head())
print(y.head())


   Marks
0     85
1     45
2     78
3     50
4     90
0    1
1    0
2    1
3    0
4    1
Name: Passed, dtype: int64


Here:

X = marks of the students (input)

y = whether they passed or not (output)

**7. Split the Data**

In [49]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.shape, X_test.shape)


(8, 1) (2, 1)


You’re dividing the data into:

*  Training Set (80%) – to teach the model

*  Testing Set (20%) – to check how well the model learned

This prevents overfitting (memorizing the data instead of learning patterns).

**8. Train the ML Model**

In [36]:
model = LogisticRegression()
model.fit(X_train, y_train)


You’re training a model called Logistic Regression.
Despite the name, it’s used for classification (predicting categories like Yes/No).

**9. Make Predictions**

In [50]:
y_pred = model.predict(X_test)
print(y_pred)

[1 0]


**10. Check Accuracy**

In [39]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


Accuracy: 1.0


An accuracy of 1.0 (or 100%) means:

✅ Your model made all predictions correctly on the test data.
It did not make a single mistake.