<a href="https://colab.research.google.com/github/halimcan/Home-Credit-Default-Project/blob/credit_card_balance_branch7/credit_card_balance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Installments of required tables
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
#  BigQuery API activation
from google.colab import auth
auth.authenticate_user()

In [3]:
# big query add-on installation

!pip install --quiet google-cloud-bigquery
from google.cloud import bigquery

In [4]:
# BigQuery client initiation
client = bigquery.Client(project="homecredit-478707")

# From BigQuery
query = """
SELECT *
FROM `homecredit-478707.Homecredit_Tables.Credit_card_balance`
"""
credit_card_balance = client.query(query).to_dataframe()

In [5]:
credit_card_balance.head(3)

Unnamed: 0,SK_ID_PREV,SK_ID_CURR,MONTHS_BALANCE,AMT_BALANCE,AMT_CREDIT_LIMIT_ACTUAL,AMT_DRAWINGS_ATM_CURRENT,AMT_DRAWINGS_CURRENT,AMT_DRAWINGS_OTHER_CURRENT,AMT_DRAWINGS_POS_CURRENT,AMT_INST_MIN_REGULARITY,...,AMT_RECIVABLE,AMT_TOTAL_RECEIVABLE,CNT_DRAWINGS_ATM_CURRENT,CNT_DRAWINGS_CURRENT,CNT_DRAWINGS_OTHER_CURRENT,CNT_DRAWINGS_POS_CURRENT,CNT_INSTALMENT_MATURE_CUM,NAME_CONTRACT_STATUS,SK_DPD,SK_DPD_DEF
0,2740914,340339,-1,131669.145,225000,10800.0,10800.0,0.0,0.0,6000.48,...,127891.935,127891.935,2.0,2,0.0,0.0,6.0,Active,0,0
1,1598223,427122,-6,44959.455,45000,450.0,1664.46,0.0,1214.46,2291.04,...,43613.955,43613.955,1.0,3,0.0,2.0,30.0,Active,0,0
2,1128014,102143,-3,0.0,135000,0.0,20419.74,0.0,20419.74,0.0,...,0.0,0.0,0.0,5,0.0,5.0,4.0,Active,0,0


Primary Key Check

The natural primary key of the credit_card_balance table is:

(SK_ID_PREV + MONTHS_BALANCE)


MONTHS_BALANCE represents monthly snapshots of the credit card account, with values such as -1, -2, -3, …, where each number corresponds to one month in the past.

SK_ID_PREV uniquely identifies a specific credit card account.

Together, these two fields uniquely define one monthly record for one credit card account.

Note :

SK_ID_CURR is not a unique identifier in this table, because a single customer (SK_ID_CURR) may have multiple credit cards, each represented by a different SK_ID_PREV.

In [7]:
duplicates = credit_card_balance.duplicated(
    subset=["SK_ID_PREV", "MONTHS_BALANCE"]
).sum()

print("Duplicate PK rows:", duplicates)
print("Total rows:", credit_card_balance.shape[0])

# Primary Key is Clean & Valid


Duplicate PK rows: 0
Total rows: 3840312


# Feature Engineering

# a) Utilization features

In [8]:
cc = credit_card_balance.copy()


# UTILIZATION = balance / limit

cc["UTILIZATION"] = cc["AMT_BALANCE"] / (cc["AMT_CREDIT_LIMIT_ACTUAL"] + 1e-6)

cc["UTILIZATION_CLIPPED"] = cc["UTILIZATION"].clip(0, 5)  # sanity cap


# Payment Behavior

In [9]:
# Minimum payment ratio
cc["MIN_PAYMENT_RATIO"] = cc["AMT_PAYMENT_CURRENT"] / (cc["AMT_INST_MIN_REGULARITY"] + 1e-6)

# Missing minimum payment = no required minimum
cc["MISSING_MIN_PAYMENT_FLAG"] = cc["AMT_INST_MIN_REGULARITY"].isna().astype(int)

# Underpayment (<1)
cc["UNDERPAYMENT_FLAG"] = (cc["MIN_PAYMENT_RATIO"] < 1).astype(int)

# Overpayment (>1)
cc["OVERPAYMENT_FLAG"] = (cc["MIN_PAYMENT_RATIO"] > 1).astype(int)

# No payment this month
cc["NO_PAYMENT_FLAG"] = (cc["AMT_PAYMENT_CURRENT"] == 0).astype(int)

# 3) Drawings Behavior (ATM, POS, Other)

In [10]:
cc["TOTAL_DRAWINGS"] = (
    cc["AMT_DRAWINGS_ATM_CURRENT"] +
    cc["AMT_DRAWINGS_POS_CURRENT"] +
    cc["AMT_DRAWINGS_OTHER_CURRENT"]
)

cc["DRAWING_FREQUENCY"] = (
    cc["CNT_DRAWINGS_ATM_CURRENT"] +
    cc["CNT_DRAWINGS_POS_CURRENT"] +
    cc["CNT_DRAWINGS_OTHER_CURRENT"]
)


4) Delinquency (DPD)

In [11]:
cc["LATE_MONTH_FLAG"] = (cc["SK_DPD"] > 0).astype(int)
cc["SERIOUS_DPD_FLAG"] = (cc["SK_DPD_DEF"] > 0).astype(int)

# 5) Categorical – Contract Status

In [12]:
cc_status = pd.get_dummies(cc["NAME_CONTRACT_STATUS"], prefix="CC_STATUS")
cc = pd.concat([cc, cc_status], axis=1)

# 6) Customer-Level Aggregation (SK_ID_CURR)

In [15]:
# 6) Customer-Level Aggregation (SK_ID_CURR)
# Each row becomes one customer-level summary of credit card behavior.
# These engineered features capture spending, repayment, and delinquency patterns.


agg_dict = {
    # Utilization statistics: Indicates how much of the card limit the customer uses.
    "UTILIZATION": ["mean", "max", "min", "std"],     # Higher utilization → higher credit risk
    "UTILIZATION_CLIPPED": ["mean", "max"],           # Stabilized version of utilization

    # Payment behavior: Measures minimum payment discipline.
    "MIN_PAYMENT_RATIO": ["mean", "min", "max", "std"],  # <1 = underpayment, >1 = overpayment
    "UNDERPAYMENT_FLAG": ["mean", "sum"],                # Frequency and count of underpayments
    "OVERPAYMENT_FLAG": ["mean", "sum"],                 # Frequency and count of overpayments
    "NO_PAYMENT_FLAG": ["mean", "sum"],                  # Months with zero payment → strong risk indicator
    "MISSING_MIN_PAYMENT_FLAG": ["mean"],                # Bank did not require minimum payment

    # Drawings behavior: ATM and POS usage — indicates liquidity needs.
    "TOTAL_DRAWINGS": ["mean", "sum", "max"],            # Cash withdrawal or spending amounts
    "DRAWING_FREQUENCY": ["mean", "sum", "max"],         # Frequency of drawing transactions

    # Delinquency features: Measures past due behavior.
    "LATE_MONTH_FLAG": ["mean", "sum"],                  # DPD > 0 → late months
    "SERIOUS_DPD_FLAG": ["mean", "sum"],                 # DPD_DEF > 0 → serious delinquency

    # Balance & Limit behavior: Captures debt level and credit limit trends.
    "AMT_BALANCE": ["mean", "max"],                      # Card balance patterns
    "AMT_CREDIT_LIMIT_ACTUAL": ["mean", "max"],          # Credit limit levels (higher limit → lower risk)
}

# Add one-hot encoded contract status distributions (Active, Closed)
for col in cc_status.columns:
    agg_dict[col] = ["mean"]                             # Share of months in each status category

# Perform the aggregation at the customer level
cc_agg = cc.groupby("SK_ID_CURR").agg(agg_dict)

# Flatten multi-index columns for readability
cc_agg.columns = ["CC_" + "_".join(col).upper() for col in cc_agg.columns]
cc_agg.reset_index(inplace=True)


# Merge into application_train

In [None]:
# application_train = application_train.merge(cc_agg, on="SK_ID_CURR", how="left")

Credit Card Balance – Summary of Feature Engineering

Processed 3.84 million monthly credit card snapshots and aggregated them into customer-level behavioral features.

Engineered utilization metrics (balance-to-limit ratios) capturing financial stress and credit usage patterns.

Derived payment behavior indicators, including minimum payment discipline, underpayment frequency, and months with no payment.

Extracted cash withdrawal and POS spending characteristics through drawing amounts and transaction frequency.

Built delinquency features using DPD and DPD_DEF to quantify late and serious overdue behavior.

Included contract status history (Active/Closed) using one-hot encoded month-level distributions.

Aggregated all signals using mean, max, min, std, and sum operations to produce rich behavioral profiles for each customer.

Final output: a powerful set of credit card–based predictive features ready for model training and integration with the main application data.