**Introduction to the Company Credit Risk Predictor**

This Jupyter notebook is a part of a project focused on assessing company credit risk, predicting whether a company's credit rating falls into investment-grade or junk-grade categories. This notebook sets up an interactive environment where users can select different machine learning models and enter a company's ticker symbol for analysis. Upon execution, the selected model processes the data and provides predictions, along with relevant metrics and visualizations such as confusion matrices and importance plots.

This user-friendly tool serves as a practical aid for investors and financial analysts, offering insights into potential credit risks associated with different companies.


In [1]:
from ipywidgets import interact, widgets
from IPython.display import clear_output


In [2]:
import sys
sys.path.append('../scripts')  
from functions import process_data_and_predict


In [7]:
from ipywidgets import interact, widgets, Output, HTML
from IPython.display import clear_output, display
from functions import process_data_and_predict
from PIL import Image
import os
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.expand_frame_repr', True)

# Define model types and model numbers
model_types = ['random_forest', 'random_forest_kfolding', 'gbm', 'svc', 'xgboost', 'deeplearning']
model_numbers = [1, 2, 3, 4, 5]

# Create dropdown widgets
model_type_dropdown = widgets.Dropdown(options=model_types, description='Model Type:')
model_number_dropdown = widgets.Dropdown(options=model_numbers, description='Model Number:')
ticker_input = widgets.Text(description='Ticker:', value='AAPL')  # Set default value for Ticker
execute_button = widgets.Button(description='Execute')

# Create Output widgets to display the results and the image
output_widget = Output()
image_output_widget = Output()

# Define a variable to store the current value of the ticker input
current_ticker_value = 'AAPL'

# Define callback function to update the current ticker value when the input changes
def update_ticker_value(change):
    global current_ticker_value
    current_ticker_value = change.new

# Observe changes to the value attribute of the ticker input widget
ticker_input.observe(update_ticker_value, names='value')

# Define callback function for button click
def on_execute_button_clicked(button):
    model_type = model_type_dropdown.value
    model_number = model_number_dropdown.value
    ticker = current_ticker_value  # Use the current value of the ticker input
    
    # Clear previous outputs
    with output_widget:
        clear_output()
    with image_output_widget:
        clear_output()
    
    predictions, metrics_df = process_data_and_predict(ticker, model_type, model_number)
    
    # Display predictions and metrics in the output widget
    with output_widget:
        print(f"Ticker: {ticker}")
        print("Predictions:")
        if predictions[0] == 0:
            print("Junk Grade")
        elif predictions[0] == 1:
            print("Investment Grade")
        else:
            print("Error: Unexpected prediction value")
        print("\nMetrics:")
        # Get the number of columns
        num_cols = len(metrics_df.columns)

        # Splitting the columns into two halves
        first_half_df = metrics_df.iloc[:, :num_cols // 2]
        second_half_df = metrics_df.iloc[:, num_cols // 2:]

        print(first_half_df.to_string(index=False))
        print("\n", second_half_df.to_string(index=False))
        # print(metrics_df.to_string(index=False))
    
    # Reset dropdowns to their default values
    model_type_dropdown.value = model_types[0]  # Set model type dropdown to its first option
    model_number_dropdown.value = model_numbers[0]  # Set model number dropdown to its first option
    
    # Display Model Information, Confusion Matrix
   
    image_path = f'../img/models/{model_type}/model{model_number}_confusion_matrix.png'
    if os.path.exists(image_path):
        with image_output_widget:
            print('About the Selected Model: ')
            print(f'Model Type: {model_type}')
            print(f'Model Number: {model_number}')
            display(HTML("<h2>Confusion Matrix</h2>"))
            display(Image.open(image_path))
            if model_type == 'random_forest' or model_type == 'random_forest_kfolding':
                image_path2 = f'../img/models/{model_type}/model{model_number}_importances_plot.png'
                display(Image.open(image_path2))
            else:
                print('Model Importance Plot Not Found')
    else:
        with image_output_widget:
            print('About the Selected Model: ')
            print(f'Model Type: {model_type}')
            print(f'Model Number: {model_number}')
            print('Image not found.')

# Attach callback function to button click event
execute_button.on_click(on_execute_button_clicked)

# Create an HTML widget for the banner
banner_html = HTML("<h1 style='color: #FFFFFF; background-color: #ADD8E6; font-size: 24px; font-weight: bold; text-align: center;'>Model Predictor</h1>")

# Display widgets in a VBox layout for cleaner appearance
widgets.VBox([banner_html, model_type_dropdown, model_number_dropdown, ticker_input, execute_button, output_widget, image_output_widget])


VBox(children=(HTML(value="<h1 style='color: #FFFFFF; background-color: #ADD8E6; font-size: 24px; font-weight:…

## Summary

The application of machine learning has yielded encouraging outcomes. Through experimentation with various models, some patterns have emerged: certain models excel in predicting positive outcomes, while others are proficient in identifying negative outcomes. The random forest models are the top performers with 95% accuracy rate on this test dataset.

However, during deployment in real-world scenarios, particularly in predicting junk credit status (S&P BB+ or lower), challenges arose. Despite techniques like oversampling and undersampling to address class imbalances, the models struggled to accurately identify instances of junk credit. They did however exhibit consistent success in predicting good credit status.

To enhance model performance, alternative methods were explored such as k-folding and feature engineering. One notable limitation was the absence of industry sector information in our API. This was available in training and testing datasets, and when utilized the model performance improved. But these features were dropped due to constraints in the API's data retrieval capabilities. It is evident that incorporating industry sector data could significantly enhance prediction accuracy.
