# üå≥ XGBoost for Beginners

This notebook teaches **XGBoost** using **one simple real-life example**.

Think of **XGBoost as a smart decision-maker** that learns by fixing its mistakes step by step.

We will predict **Pass or Fail** based on how many hours a student studies.

üëâ This notebook is **Google Colab ready**.


In [None]:
# Step 1: Install & Import XGBoost (already available in Colab)
import xgboost as xgb
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

print("XGBoost version:", xgb.__version__)


## üìä Our One Example: Study Hours ‚Üí Pass / Fail

Rule (for understanding only):
- Study more ‚Üí higher chance to pass


In [None]:
# Create a simple dataset
data = {
    "Study_Hours": [1, 2, 3, 4, 5, 6, 7, 8],
    "Result": [0, 0, 0, 1, 1, 1, 1, 1]  # 0 = Fail, 1 = Pass
}

df = pd.DataFrame(data)
df


## üß† What is XGBoost? (Very Simple)

- XGBoost uses **many small decision trees**
- Each new tree fixes mistakes made earlier
- That‚Äôs why it is called **Boosting**


## ‚úÇÔ∏è Split Data into Training and Testing

- Training data ‚Üí teach the model
- Testing data ‚Üí check learning


In [None]:
X = df[["Study_Hours"]]
y = df["Result"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.25, random_state=42
)

print("Training size:", len(X_train))
print("Testing size:", len(X_test))


## üå≥ Train an XGBoost Model

We use a **classifier** because output is Pass or Fail.


In [None]:
model = xgb.XGBClassifier(
    n_estimators=20,
    max_depth=3,
    learning_rate=0.3,
    use_label_encoder=False,
    eval_metric='logloss'
)

model.fit(X_train, y_train)

print("Model trained successfully")


## üîÆ Make Predictions

Let the model predict Pass or Fail.


In [None]:
y_pred = model.predict(X_test)

pd.DataFrame({
    "Study_Hours": X_test["Study_Hours"].values,
    "Actual_Result": y_test.values,
    "Predicted_Result": y_pred
})


## üìè Check Accuracy

Accuracy tells how many predictions were correct.


In [None]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)


## ‚úÖ Key Takeaway

üëâ **XGBoost is a powerful model that learns from mistakes step by step**.

Used in:
- Kaggle competitions
- Business predictions
- AI & ML systems
