Visualization 1:



Risk Profile Dashboard:

An interactive dashboard where users can select different loan attributes (e.g., income category, home ownership, loan term) and see how these factors affect loan conditions. 

We can use filters, sliders, and drop-down menus for user interaction to explore various scenarios and understand the impact of different variables on loan risk.

In [1]:
pip install pandas dash plotly


Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd

# Load the dataset
df = pd.read_csv('https://github.com/movcha/team_project/blob/main/data/mortgage.csv')

# Display the first few rows to understand the structure
print(df.head())


ParserError: Error tokenizing data. C error: Expected 1 fields in line 40, saw 26


In [None]:
import dash
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output
import plotly.express as px

# Initialize the Dash app
app = dash.Dash(__name__)

# Define the layout of the app
app.layout = html.Div([
    html.H1("Interactive Risk Profile Dashboard"),

    # Dropdown for Income Category
    dcc.Dropdown(
        id='income-dropdown',
        options=[{'label': cat, 'value': cat} for cat in df['income_category'].unique()],
        value=df['income_category'].unique().tolist(),
        multi=True
    ),

    # Dropdown for Home Ownership
    dcc.Dropdown(
        id='home-ownership-dropdown',
        options=[{'label': ho, 'value': ho} for ho in df['home_ownership'].unique()],
        value=df['home_ownership'].unique().tolist(),
        multi=True
    ),

    # Dropdown for Loan Term
    dcc.Dropdown(
        id='loan-term-dropdown',
        options=[{'label': term, 'value': term} for term in df['term'].unique()],
        value=df['term'].unique().tolist(),
        multi=True
    ),

    # Graph for visualizing the filtered data
    dcc.Graph(id='risk-profile-graph')
])

# Define the callback to update the graph based on the dropdown selections
@app.callback(
    Output('risk-profile-graph', 'figure'),
    [
        Input('income-dropdown', 'value'),
        Input('home-ownership-dropdown', 'value'),
        Input('loan-term-dropdown', 'value')
    ]
)
def update_graph(selected_income, selected_home_ownership, selected_loan_term):
    filtered_df = df[
        df['income_category'].isin(selected_income) &
        df['home_ownership'].isin(selected_home_ownership) &
        df['term'].isin(selected_loan_term)
    ]

    # Create a scatter plot with Plotly Express
    fig = px.scatter(
        filtered_df,
        x='annual_inc',
        y='loan_amount',
        color='loan_condition',
        title="Loan Amount vs. Annual Income",
        labels={"annual_inc": "Annual Income", "loan_amount": "Loan Amount"}
    )

    # Update layout for better readability
    fig.update_layout(
        xaxis_title='Annual Income',
        yaxis_title='Loan Amount',
        legend_title='Loan Condition'
    )

    return fig

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True)


The dash_core_components package is deprecated. Please replace
`import dash_core_components as dcc` with `from dash import dcc`
  import dash_core_components as dcc
The dash_html_components package is deprecated. Please replace
`import dash_html_components as html` with `from dash import html`
  import dash_html_components as html


Running the App :
Step 1 -
Save this script as app.py and run it from bash terminal >>

python app.ipynb

Step 2 -
Access the interactive dashboard at http://127.0.0.1:8050/




Visualization 2 :

Correlation matrix heatmap- 
To showcase relationships between different variables (e.g., interest rate, dti, income, loan amount) and loan conditions and provide insights on how different variables correlate with loan conditions and identify key factors influencing loan risk.

In [None]:
from sklearn.preprocessing import LabelEncoder

# Initialize LabelEncoder
label_encoder = LabelEncoder()

# Encode categorical columns
df['home_ownership_encoded'] = label_encoder.fit_transform(df['home_ownership'])
df['income_category_encoded'] = label_encoder.fit_transform(df['income_category'])
df['loan_condition_encoded'] = label_encoder.fit_transform(df['loan_condition'])

# Display the DataFrame with encoded columns
print(df.head())


        id  year     issue_d  final_d  emp_length_int home_ownership  \
0  1062177  2011  01/12/2011  1062013             2.0       MORTGAGE   
1  1049352  2011  01/12/2011  1042013             1.0       MORTGAGE   
2  1062976  2011  01/12/2011  1042013            10.0           RENT   
3  1058564  2011  01/12/2011  1122014             6.0           RENT   
4  1061837  2011  01/12/2011  1122014             7.0           RENT   

  income_category  annual_inc  income_cat  loan_amount  ... grade_cat    dti  \
0             Low       44400           1        15000  ...         4   3.59   
1             Low      100000           1         6600  ...         2  15.53   
2             Low       45000           1         4000  ...         5  15.20   
3             Low       57600           1         8000  ...         1  11.52   
4             Low       60000           1        15000  ...         1  12.78   

    total_pymnt total_rec_prncp  recoveries installment        region  \
0  17991.5300

In [None]:
# Select numerical columns including encoded ones
numerical_df = df[['annual_inc', 'loan_amount', 'interest_rate', 'dti', 'home_ownership_encoded', 'income_category_encoded', 'loan_condition_encoded']]

# Calculate the correlation matrix
corr_matrix = numerical_df.corr()

# Display the correlation matrix
print(corr_matrix)


                         annual_inc  loan_amount  interest_rate       dti  \
annual_inc                 1.000000     0.448802      -0.046938 -0.240445   
loan_amount                0.448802     1.000000       0.124681 -0.117306   
interest_rate             -0.046938     0.124681       1.000000  0.204087   
dti                       -0.240445    -0.117306       0.204087  1.000000   
home_ownership_encoded    -0.132741    -0.148310       0.110455  0.031725   
income_category_encoded    0.112003     0.187335      -0.052455 -0.089661   
loan_condition_encoded     0.081224     0.000517      -0.124773 -0.062724   

                         home_ownership_encoded  income_category_encoded  \
annual_inc                            -0.132741                 0.112003   
loan_amount                           -0.148310                 0.187335   
interest_rate                          0.110455                -0.052455   
dti                                    0.031725                -0.089661   
hom

The heatmap uses colors to represent correlation values. Darker colors often indicate stronger correlations.
We will look for strong correlations (close to 1 or -1) between variables and loan conditions. 

For example, if interest_rate has a high negative correlation with loan_condition_encoded, it may indicate that higher interest rates are associated with worse loan conditions.

In [None]:
import plotly.express as px
import plotly.graph_objects as go

# Create the heatmap
fig = go.Figure(data=go.Heatmap(
    z=corr_matrix.values,
    x=corr_matrix.columns,
    y=corr_matrix.columns,
    colorscale='Viridis',
    colorbar=dict(title='Correlation')
))

# Update layout for better readability
fig.update_layout(
    title='Correlation Matrix Heatmap',
    xaxis_title='Variables',
    yaxis_title='Variables',
    xaxis=dict(tickvals=list(range(len(corr_matrix.columns))), ticktext=corr_matrix.columns),
    yaxis=dict(tickvals=list(range(len(corr_matrix.columns))), ticktext=corr_matrix.columns)
)

# Show the heatmap
fig.show()
