**An interactive interface** is created for users to see and modify their own data in real time. The interface allows users to input their unique ID to retrieve their existing data, including personal and financial details. Users can view a summary of their data, such as age, occupation, annual income, and credit-related information, and then make modifications directly through input fields. The interface includes dropdowns for categorical features (e.g., occupation, payment behavior) and sliders for continuous features like credit history age.

Once the user has updated their data, they can click a "Predict" button to see how the changes affect the predicted credit score and associated probabilities (e.g., likelihood of being classified as "Good" or "Poor").

**NOTE**

Streamlit is usable on Google Colab. However, it is more suitable on python script. The application can still be built and launch but launching it will take a few more steps, such as installing pyngrok and having an account on the pyngrok website to use it. So we advise you to run the **app.py** on python script and type 'streamlit run app.py' in terminal to launch for easier use.

In [None]:
!pip install streamlit pyngrok



In [None]:
from google.colab import drive
drive.mount('/content/drive')
import pydeck as pdk
import pickle
from pyngrok import ngrok

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Application

In [None]:
%%writefile app.py
import streamlit as st
import pickle as pkl
import numpy as np
import pandas as pd
import warnings
from xgboost import XGBClassifier
# from alibi.explainers import AnchorTabular

Xtrain = pd.read_csv('/content/drive/MyDrive/NCKH 2024/data - NCKH-NEU/Xtrain.csv')
with open('/content/drive/MyDrive/NCKH 2024/Modeling - NCKH-NEU/soft_voting_clf.pkl', 'rb') as file:
    model = pkl.load(file)
with open('/content/drive/MyDrive/NCKH 2024/Modeling - NCKH-NEU/minmax_scaler.pkl', 'rb') as file:
    scaler = pkl.load(file)
data_display = pd.read_csv('/content/drive/MyDrive/NCKH 2024/data - NCKH-NEU/display_data.csv')

# # Define the function for suggesting changes
# def suggest_changes_for_poor_class(input_data, train = Xtrain):
#     feature_names = train.columns.tolist()

#     warnings.filterwarnings("ignore", message="X does not have valid feature names")

#     explainer = AnchorTabular(model.predict_proba, feature_names=feature_names)

#     train_array = train.to_numpy()
#     numerical_features = [train.columns.get_loc(col) for col in train.select_dtypes(include=['float', 'int']).columns]
#     categorical_features = ['Occupation', 'Payment_of_Min_Amount', 'Payment_Behaviour', 'Auto Loan',
#            'Credit-Builder Loan','Debt Consolidation Loan', 'Home Equity Loan',
#            'Mortgage Loan', 'No Loan','Not Specified', 'Payday Loan', 'Personal Loan', 'Student Loan']
#     for col in categorical_features:
#         numerical_features.append(train.columns.get_loc(col))
#     explainer.fit(train_array, numerical_features=numerical_features)

#     input_data_array = input_data.to_numpy()
#     input_data_scaled = scaler.transform(input_data_array)

#     explanation = explainer.explain(input_data_scaled[0])

#     anchor_rule = explanation.anchor
#     predicted_class = explanation.raw['prediction']

#     if predicted_class == 1:  # 'Poor' class
#         suggestion = f"The model predicts Poor.\n"
#         suggestion += "To improve the likelihood of being classified as Standard, you can consider:\n"
#         # Map the anchor rule back to the original (unscaled) values
#         for condition in anchor_rule:
#             if '<=' in condition:
#                 feature, threshold = condition.split(' <= ')
#                 # Get the corresponding scaled value and reverse the scaling
#                 scaled_value = float(threshold)
#                 original_value = scaler.inverse_transform([[scaled_value if col == feature else 0 for col in feature_names]])[0][feature_names.index(feature)]
#                 suggestion += f" - Try to increase {feature} to be greater than {round(original_value)}.\n"
#             elif '>' in condition:
#                 feature, threshold = condition.split(' > ')
#                 # Get the corresponding scaled value and reverse the scaling
#                 scaled_value = float(threshold)
#                 original_value = scaler.inverse_transform([[scaled_value if col == feature else 0 for col in feature_names]])[0][feature_names.index(feature)]
#                 suggestion += f" - Try to decrease {feature} to be less than {round(original_value)}.\n"
#         return suggestion
#     else:  # Not 'Poor' class
#         return f"The model predicts Standard.\n"

st.title("Credit Score Prediction and Suggestions")

# User ID input
st.markdown('<h2 style="font-size: 27px;">Enter your User ID:</h2>', unsafe_allow_html=True)
user_id = st.text_input("")

cols_df = ['Annual_Income', 'Monthly_Inhand_Salary', 'Num_Bank_Accounts',
               'Num_Credit_Card', 'Interest_Rate', 'Num_of_Loan', 'Delay_from_due_date',
               'Num_of_Delayed_Payment', 'Changed_Credit_Limit', 'Num_Credit_Inquiries',
               'Outstanding_Debt', 'Credit_Utilization_Ratio', 'Total_EMI_per_month',
               'Amount_invested_monthly', 'Monthly_Balance']
cols_display = ['Annual Income', 'Monthly Inhand Salary', 'Number Bank Accounts',
                'Number Credit Card', 'Interest Rate', 'Number of Loan',
                'Total days delay from due date', 'Number of Delayed Payment', 'Percentage of change Credit Limit',
                'Number of Credit Inquiries', 'Outstanding Debt', 'Credit Utilization Ratio',
                'Total EMI per month','Amount invested monthly', 'Monthly Balance']

# Mappings for encoding
occupation_mapping = {0: 'Accountant', 1: 'Architect', 2: 'Developer', 3: 'Doctor', 4: 'Engineer',
                      5: 'Entrepreneur', 6: 'Journalist', 7: 'Lawyer', 8: 'Manager', 9: 'Mechanic',
                      10: 'Media Manager', 11: 'Musician', 12: 'Scientist', 13: 'Teacher', 14: 'Writer'}

payment_behaviour_mapping = {0: 'High spent Large value payments', 1: 'High spent Medium value payments',
                             2: 'High spent Small value payments', 3: 'Low spent Large value payments',
                             4: 'Low spent Medium value payments', 5: 'Low spent Small value payments'}

payment_of_min_amount_mapping = {0: 'No Minimum Payment Due', 1: 'Below Minimum Payment', 2: 'Pays Minimum Required'}

# Reverse mappings
occupation_reverse_mapping = {v: k for k, v in occupation_mapping.items()}
payment_behaviour_reverse_mapping = {v: k for k, v in payment_behaviour_mapping.items()}
payment_of_min_amount_reverse_mapping = {v: k for k, v in payment_of_min_amount_mapping.items()}

order = ['Age', 'Occupation', 'Annual_Income',
       'Monthly_Inhand_Salary', 'Num_Bank_Accounts', 'Num_Credit_Card',
       'Interest_Rate', 'Num_of_Loan', 'Delay_from_due_date',
       'Num_of_Delayed_Payment', 'Changed_Credit_Limit',
       'Num_Credit_Inquiries', 'Outstanding_Debt', 'Credit_Utilization_Ratio',
       'Credit_History_Age', 'Payment_of_Min_Amount', 'Total_EMI_per_month',
       'Amount_invested_monthly', 'Payment_Behaviour', 'Monthly_Balance',
       'Auto Loan', 'Credit-Builder Loan',
       'Debt Consolidation Loan', 'Home Equity Loan', 'Mortgage Loan',
       'No Loan', 'Not Specified', 'Payday Loan', 'Personal Loan',
       'Student Loan']


if user_id in data_display['ID'].values:
    st.markdown('<h2 style="font-size: 25px;"><strong>User Found!</strong></h2>', unsafe_allow_html=True)

    # Fetch and display existing user data
    user_data = data_display[data_display['ID'] == user_id].iloc[0]
    st.markdown('<h2 style="font-size: 20px;">Your Data</h2>', unsafe_allow_html=True)
    st.dataframe(user_data.drop(['ID', 'Customer ID', 'Credit Score']), use_container_width=True, height=400)

    # Show original class
    st.markdown(f'<p style="font-size: 25px;"><strong>Class:</strong> {user_data["Credit Score"]}</p>', unsafe_allow_html=True)

    # Allow user to modify data
    st.markdown('<h2 style="font-size: 30px;">Modify Your Data Below to see the result changes</h2>', unsafe_allow_html=True)
    modified_data = {}
    st.markdown(f'<p style="font-size: 18px;"><strong>ID:</strong> {user_data["ID"]}</p>', unsafe_allow_html=True)
    st.markdown(f'<p style="font-size: 18px;"><strong>Name:</strong> {user_data["Name"]}</p>', unsafe_allow_html=True)
    st.markdown(f'<p style="font-size: 18px;"><strong>Age:</strong> {user_data["Age"]}</p>', unsafe_allow_html=True)
    st.markdown(f'<p style="font-size: 18px;"><strong>Month:</strong> {user_data["Month"]}</p>', unsafe_allow_html=True)

    modified_data['Age'] = user_data['Age']

    for col_df, col_display in zip(cols_df, cols_display):
        st.markdown(f'<p style="font-size: 18px;"><strong>{col_display.capitalize()}:</strong> {user_data[col_display]}</p>', unsafe_allow_html=True)
        modified_data[col_df] = st.text_input(f"Edit {col_display}", value=str(user_data[col_display]))
        try:
            modified_data[col_df] = float(modified_data[col_df])
        except ValueError:
            st.error(f"Please enter a valid number for {col_display.capitalize()}.")

    # Credit History Age
    st.markdown('<p style="font-size: 18px;"><strong>Enter Your Credit History Age:</strong></p>', unsafe_allow_html=True)
    current_credit_history_age = user_data['Credit History Age']

    # Extract years and months from the string format
    import re
    match = re.match(r"(\d+)\s+Years\s+and\s+(\d+)\s+Months", current_credit_history_age)
    if match:
        years_from_data = int(match.group(1))
        months_from_data = int(match.group(2))
    else:
        years_from_data, months_from_data = 0, 0  # Fallback in case of format issues

    # Input fields with default values from the dataset
    years = st.slider("Years", min_value=0, max_value=100, value=years_from_data)
    months = st.slider("Months", min_value=0, max_value=11, value=months_from_data)

    credit_history_age_numerical = years + months / 12
    modified_data['Credit_History_Age'] = credit_history_age_numerical

    # Single-choice categorical features (showing original labels)
    st.markdown('<p style="font-size: 18px;"><strong>Occupation:</strong></p>', unsafe_allow_html=True)
    modified_data['Occupation'] = st.selectbox(
        "Choose Occupation",
        list(occupation_mapping.values()),
        index=occupation_reverse_mapping[user_data['Occupation']]
    )

    st.markdown('<p style="font-size: 18px;"><strong>Payment Behaviour:</strong></p>', unsafe_allow_html=True)
    modified_data['Payment_Behaviour'] = st.selectbox(
        "Choose Payment Behaviour",
        list(payment_behaviour_mapping.values()),
        index=payment_behaviour_reverse_mapping[user_data['Payment Behaviour']]
    )

    st.markdown('<p style="font-size: 18px;"><strong>Payment of Minimum Amount:</strong></p>', unsafe_allow_html=True)
    modified_data['Payment_of_Min_Amount'] = st.selectbox(
        "Is payment in minimum amount?",
        list(payment_of_min_amount_mapping.values()),
        index=payment_of_min_amount_reverse_mapping[user_data['Payment of Min Amount']]
    )

    # Map the selected values back to encoded values
    modified_data['Occupation'] = occupation_reverse_mapping[modified_data['Occupation']]
    modified_data['Payment_Behaviour'] = payment_behaviour_reverse_mapping[modified_data['Payment_Behaviour']]
    modified_data['Payment_of_Min_Amount'] = payment_of_min_amount_reverse_mapping[modified_data['Payment_of_Min_Amount']]

    # Multi-choice categorical features
    all_loan_type = sorted(
        data_display['Type of Loan']
        .str.replace("and ", "", regex=False)
        .str.split(", ")
        .explode()
        .unique()
    )
    user_loan_types = user_data['Type of Loan'].replace("and ", "").split(", ")

    st.markdown('<p style="font-size: 18px;"><strong>Type of Loan:</strong></p>', unsafe_allow_html=True)
    modified_data['Type_of_Loan'] = st.multiselect("Select Type of Loan", all_loan_type, default=user_loan_types)
    for loan_type in all_loan_type:
        modified_data[loan_type] = 1 if loan_type in modified_data['Type_of_Loan'] else 0
    modified_data.pop('Type_of_Loan')

    # Combine all features into a DataFrame
    input_data = pd.DataFrame([modified_data])
    input_data = input_data[order]

    if st.button("Predict"):
        # Get the probabilities for each class (Good and Poor)
        y_prob = model.predict_proba(input_data)

        # Get the probabilities for Good (class 0) and Poor (class 1)
        prob_good = y_prob[0][0] * 100  # Probability of Good (class 0)
        prob_poor = y_prob[0][1] * 100  # Probability of Poor (class 1)

        if prob_good > prob_poor:
            st.markdown(f'<p style="font-size: 20px; color: green;"><strong>The Credit Score {prob_good:.2f}% will be Standard</strong></p>', unsafe_allow_html=True)
        else:
            st.markdown(f'<p style="font-size: 20px; color: red;"><strong>The Credit Score {prob_poor:.2f}% will be Poor</strong></p>', unsafe_allow_html=True)
            # suggestion = suggest_changes_for_poor_class(input_data)
            # st.markdown(f'<p style="font-size: 20px;"><strong>Suggestions:</strong></p>', unsafe_allow_html=True)
            # st.write(suggestion)


else:
    st.markdown('<h2 style="font-size: 20px;">If You Do Not Have An ID, Please Enter Your Data Below</h2>', unsafe_allow_html=True)
    # Collect user input
    user_data = {}
    user_data['Age'] = st.text_input("Age", value=0)
    try:
        user_data['Age'] = int(user_data['Age'])
    except ValueError:
        st.error("Please enter a valid number for Age.")

    # Numerical Features
    for col_df, col_display in zip(cols_df, cols_display):
        user_data[col_df] = st.text_input(
            f"{col_display.capitalize()}",
            value=0
        )
        try:
            user_data[col_df] = float(user_data[col_df])
        except ValueError:
            st.error(f"Please enter a valid number for {col_display.capitalize()}.")

    years = st.slider("Years", min_value=0, max_value = 100, value=0)
    months = st.slider("Months", min_value=0, max_value=11, value=0)
    credit_history_age_numerical = years + months / 12
    user_data['Credit_History_Age'] = credit_history_age_numerical    # Convert user input to array

    # Single-choice categorical features (showing original labels)
    user_data['Occupation'] = st.selectbox(
        "Occupation",
        list(occupation_mapping.values())
    )

    user_data['Payment_Behaviour'] = st.selectbox(
        "Payment Behaviour",
        list(payment_behaviour_mapping.values())
    )

    user_data['Payment_of_Min_Amount'] = st.selectbox(
        "Payment of Minimum Amount",
        list(payment_of_min_amount_mapping.values())
    )

    # Map the selected values back to encoded values
    user_data['Occupation'] = occupation_reverse_mapping[user_data['Occupation']]
    user_data['Payment_Behaviour'] = payment_behaviour_reverse_mapping[user_data['Payment_Behaviour']]
    user_data['Payment_of_Min_Amount'] = payment_of_min_amount_reverse_mapping[user_data['Payment_of_Min_Amount']]

    # Multi-choice categorical features
    all_loan_type = sorted(
        data_display['Type of Loan']
        .str.replace("and ", "", regex=False)
        .str.split(", ")
        .explode()
        .unique()
    )
    user_data['Type_of_Loan'] = st.multiselect("Type of Loan", all_loan_type)
    for loan_type in all_loan_type:
        user_data[loan_type] = 1 if loan_type in user_data['Type_of_Loan'] else 0
    user_data.pop('Type_of_Loan')
    # Display prediction
    user_input = pd.DataFrame([user_data])
    user_input = user_input[order]

    if st.button("Predict"):
        # Get the probabilities for each class (Good and Poor)
        y_prob = model.predict_proba(user_input)

        # Get the probabilities for Good (class 0) and Poor (class 1)
        prob_good = y_prob[0][0] * 100  # Probability of Good (class 0)
        prob_poor = y_prob[0][1] * 100  # Probability of Poor (class 1)

        # Display the result with percentages
        st.markdown(f'<p style="font-size: 20px; color: green;"><strong>The Credit Score {prob_good:.2f}% will be Standard</strong></p>',unsafe_allow_html=True)
        st.markdown(f'<p style="font-size: 20px; color: red;"><strong>The Credit Score {prob_poor:.2f}% will be Poor</strong></p>',unsafe_allow_html=True)


        # # Suggest changes for the class
        # suggestion = suggest_changes_for_poor_class(user_input)
        # st.markdown(f'<p style="font-size: 20px;"><strong>Suggestions:</strong></p>', unsafe_allow_html=True)
        # st.write(suggestion)


Overwriting app.py


# Run interface

To initiate on Colab, you will have to use pyngrok and have an account registered in the ngrok website. An authtoken is needed to authorise usage, this authtoken is free and can obtain after signing up for an account. For convenience, i have already have an authtoken inplace.

First run cell 1 , 2 then 3. Then you can click the first link of cell 2 to access to interface, clicking on the link of cell 3 will be block as Colab does not directly support it.

In [None]:
#Cell 1

!ngrok authtoken 2qnF0dEZHMTsjxp2BEsikpa23fK_2neHTnR9hLQbMdKpY9Nwa

Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


In [None]:
#Cell 2

ngrok.kill()  # Stops any active tunnels
url = ngrok.connect(8502)  # Creates a fresh tunnel on a new port
print(f"New Streamlit app URL: {url}")

New Streamlit app URL: NgrokTunnel: "https://a113-35-197-105-138.ngrok-free.app" -> "http://localhost:8502"


In [None]:
#Cell 3

!streamlit run app.py --server.port 8502 &


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8502[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8502[0m
[34m  External URL: [0m[1mhttp://35.197.105.138:8502[0m
[0m
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
configuration generated by an older version of XGBoost, please export the model by calling
`Booster.save_model` from that version first, then load it back in current version. See:

    https://xgboost.readthedocs.io/en/stable/tutorials/saving_model.html

for more details about differences between saving model and serializing.

https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations
https://scikit-learn.org/stable/model_persistence.html#