# Classification Machine Learning - Bank Term Deposit Subscription

Classification in Machine Learning is a type of supervised learning where the goal is to predict the category or class of an input data point based on its features. The model is trained on a labeled dataset, where each input data point has a corresponding class label.

**Key Characteristics of Classification:**
1. **Discrete Output:** The output of a classification model is categorical, meaning the predictions belong to predefined classes or categories.
2. **Supervised Learning:** It requires labeled data for training, where each input is paired with its correct class label.
3. **Decision Boundary:** The model learns a decision boundary that separates different classes in the feature space.

The dataset appears to be related to a bank marketing campaign, with the target variable likely being whether a client will subscribe to a term deposit ('y'). The target variable y in your dataset seems to be binary, indicating 'yes' or 'no'.


#### Step-by-step process:

**Step 0 - Load and Explore the Data**

**Step 1 - Data Preprocessing**

* Handle missing values
* Encode categorical variables
* Split the data into training and testing sets

**Step 2 - Feature Selection: Select relevant features for the model**

**Step 3 - Model Training: Train a machine learning model (e.g., Logistic Regression, Random Forest, etc.)**

**Step 4 - Model Saving: Save the trained model for later use.**

**Step 5 - Streamlit App: Create a Streamlit app to get predictions using the trained model.**

Let's start with data loading.

**Step 0 to Step 4: Preprocess, Train, and Save the Model and Label Encoders**

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
import joblib

# Step 1: Load the data
file_path = 'bank.csv'
data = pd.read_csv(file_path)
data.head()

# Step 2: Check for missing values
missing_values = data.isnull().sum()
if missing_values.any():
    print("Missing values found:\n", missing_values)
    # Handle missing values (e.g., fill with mode for simplicity)
    for col in data.columns:
        if data[col].isnull().sum() > 0:
            data[col].fillna(data[col].mode()[0], inplace=True)

# Encode categorical variables
label_encoders = {}
categorical_columns = data.select_dtypes(include=['object']).columns

for col in categorical_columns:
    le = LabelEncoder()
    le.fit(data[col])
    data[col] = le.transform(data[col])
    label_encoders[col] = le

# Save the label encoders immediately to check their contents
joblib.dump(label_encoders, 'label_encoders.joblib')

# Logging stored label encoders for debug
print("Label encoders saved, keys:", label_encoders.keys())

# Split the data into features (X) and target (y)
X = data.drop('y', axis=1)
y = data['y']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train and evaluate the model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(report)

# Step 4: Save the trained model and label encoders
model_path = 'rf_model.joblib'
joblib.dump(model, model_path)
print(f"Model saved at {model_path}")


Label encoders saved, keys: dict_keys(['job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'day_of_week', 'poutcome', 'y'])
Accuracy: 0.9133284777858703
              precision    recall  f1-score   support

           0       0.94      0.96      0.95      7303
           1       0.65      0.51      0.57       935

    accuracy                           0.91      8238
   macro avg       0.79      0.74      0.76      8238
weighted avg       0.91      0.91      0.91      8238

Model saved at rf_model.joblib


**Step 5: Create a Streamlit App for Prediction**

Save the following code in a file named app_class.py in the same folder where the model is and run in command line in VS Code: streamlit run app_class.py

In [None]:
import streamlit as st
import joblib
import pandas as pd

# Load the model and label encoders
model = joblib.load('rf_model.joblib')
label_encoders = joblib.load('label_encoders.joblib')
expected_columns = ['job', 'marital', 'education', 'default', 'housing',
                    'loan', 'contact', 'month', 'day_of_week', 'poutcome']

# Check if all expected encoders are present
missing_columns = set(expected_columns) - set(label_encoders.keys())
if missing_columns:
    st.write("Missing columns in label encoders:", missing_columns)
else:
    st.write("All required columns are present in label encoders.")

# Title
st.title('Bank Marketing Prediction')

# Collecting user input
def get_user_input():
    age = st.number_input('Age', min_value=18, max_value=100, value=30)
    job = st.selectbox('Job', options=list(label_encoders['job'].classes_))
    marital = st.selectbox('Marital', options=list(label_encoders['marital'].classes_))
    education = st.selectbox('Education', options=list(label_encoders['education'].classes_))
    default = st.selectbox('Default', options=list(label_encoders['default'].classes_))
    housing = st.selectbox('Housing', options=list(label_encoders['housing'].classes_))
    loan = st.selectbox('Loan', options=list(label_encoders['loan'].classes_))
    contact = st.selectbox('Contact', options=list(label_encoders['contact'].classes_))
    month = st.selectbox('Month', options=list(label_encoders['month'].classes_))
    day_of_week = st.selectbox('Day of Week', options=list(label_encoders['day_of_week'].classes_))
    duration = st.number_input('Duration', min_value=0, step=10, value=1)
    campaign = st.number_input('Campaign', min_value=1, step=1, value=1)
    pdays = st.number_input('Pdays', min_value=0, step=1, value=999)
    previous = st.number_input('Previous', min_value=0, step=1, value=0)
    poutcome = st.selectbox('Poutcome', options=list(label_encoders['poutcome'].classes_))
    emp_var_rate = st.number_input('Employment Variation Rate', value=1.0)
    cons_price_idx = st.number_input('Consumer Price Index', value=93.994)
    cons_conf_idx = st.number_input('Consumer Confidence Index', value=-36.4)
    euribor3m = st.number_input('Euribor 3 Month Rate', value=4.857)
    nr_employed = st.number_input('Number of Employees', value=5191.0)

    user_input = {
        'age': age,
        'job': label_encoders['job'].transform([job])[0],
        'marital': label_encoders['marital'].transform([marital])[0],
        'education': label_encoders['education'].transform([education])[0],
        'default': label_encoders['default'].transform([default])[0],
        'housing': label_encoders['housing'].transform([housing])[0],
        'loan': label_encoders['loan'].transform([loan])[0],
        'contact': label_encoders['contact'].transform([contact])[0],
        'month': label_encoders['month'].transform([month])[0],
        'day_of_week': label_encoders['day_of_week'].transform([day_of_week])[0],
        'duration': duration,
        'campaign': campaign,
        'pdays': pdays,
        'previous': previous,
        'poutcome': label_encoders['poutcome'].transform([poutcome])[0],
        'emp.var.rate': emp_var_rate,
        'cons.price.idx': cons_price_idx,
        'cons.conf.idx': cons_conf_idx,
        'euribor3m': euribor3m,
        'nr.employed': nr_employed,
    }

    return pd.DataFrame([user_input])

# Get user input
user_input_df = get_user_input()

# Prediction
if st.button('Predict'):
    prediction = model.predict(user_input_df)
    prediction_proba = model.predict_proba(user_input_df)

    predicted_label = label_encoders['y'].inverse_transform(prediction)[0]
    proba_yes = prediction_proba[0][1]
    st.write(f"Prediction: {predicted_label}")
    st.write(f"Probability of Yes: {proba_yes:.2f}")

2025-01-03 10:40:42.964 
  command:

    streamlit run C:\Users\losts\AppData\Roaming\Python\Python312\site-packages\ipykernel_launcher.py [ARGUMENTS]
