<a href="https://colab.research.google.com/github/iyan-coder/EDA-and-Model-Trainer-3/blob/main/BankMarketing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🏦 Bank Marketing Dataset – README & Exploratory Data Analysis

## 1. Problem Statement
A Portuguese bank ran several **phone‑based direct‑marketing campaigns** to convince clients to subscribe to a term deposit.  
Your task is to **predict whether a client will subscribe (`y`)** using the historic campaign data in **`bank.csv`**.

Why it matters  
- **Marketing efficiency** – call only the clients most likely to convert.  
- **Customer experience** – avoid annoying customers with irrelevant calls.  
- **Cost reduction** – fewer wasted minutes → lower outbound‑call costs.

---

## 2. Column Dictionary & Business Insight

| Column | Type | What it Means (Business Angle) |
|--------|------|--------------------------------|
| `age` | numeric | Customer’s age in years. Often U‑shaped effect: middle‑aged clients may be more financially stable, retirees may have idle funds. |
| `job` | categorical | Profession (e.g., *management*, *services*, *technician*). Relates to **income level** and financial literacy. |
| `marital` | categorical | *married*, *single*, *divorced*. Life stage can affect saving goals. |
| `education` | categorical | *primary*, *secondary*, *tertiary*, *unknown*. Higher education often ↗ adoption of financial products. |
| `default` | categorical | Has credit in default? *yes/no*. Strong negative signal for new deposits. |
| `balance` | numeric | Average yearly bank balance (EUR). Proxy for liquidity and trust in the bank. |
| `housing` | categorical | Has a housing loan? Indicates existing liabilities. |
| `loan` | categorical | Has personal loan? Similar to above; can reduce disposable income. |
| `contact` | categorical | Contact channel (*cellular*, *telephone*, *unknown*). Campaigns via mobile often perform better. |
| `day` | numeric | Last contact day of month (1–31). Not very predictive alone; combine with `month`. |
| `month` | categorical | *jan … dec*. Strong seasonal pattern: e.g., **May** & **October** often see spikes. |
| `duration` | numeric | Call length in seconds. ***Most predictive*** feature (longer calls → higher likelihood of “yes”) but beware: **leakage** if used in real‑time predictions. |
| `campaign` | numeric | Number of contacts with this client during the campaign. Repeated calls may wear people out. |
| `pdays` | numeric | Days passed since client was last contacted (−1 means *never*). Small value ⇒ recently called. |
| `previous` | numeric | How many prior contacts in past campaigns. |
| `poutcome` | categorical | Outcome of the **previous** campaign (*success*, *failure*, *other*, *unknown*). Historical success is a strong positive signal. |
| `y` | target | *yes* if the client subscribed; otherwise *no*. **Imbalanced** (~11 % “yes”). |

---


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
## Display all the columns of the dataframe
pd.set_option('display.max_rows', 130)  # to view all 122 rows
pd.set_option('display.max_columns', None)  # show all columns
pd.set_option('display.width', 1000)  # avoid column wrapping

%matplotlib inline
import warnings
warnings.filterwarnings("ignore")