# Credit Risk Classification

In this notebook, we perform a step-by-step analysis using logistic regression to predict loan status. We:
- Read and inspect the lending data
- Create features (X) and labels (y)
- Split the data into training and test sets
- Train a logistic regression model
- Evaluate the model with a confusion matrix and classification report

In [None]:
# Importing necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report

print('Imported necessary libraries.')

## Step 1: Load and Inspect Data

Read the CSV file and check the first few rows of the data to understand its structure.

In [None]:
# Load the lending data into a DataFrame
try:
    df = pd.read_csv('lending_data.csv', encoding='ascii')
    print('Data loaded successfully.')
    print('Data preview:')
    print(df.head())
except Exception as e:
    print('Error loading data:', e)

## Step 2: Create Features and Labels

- Create labels set `y` from the `loan_status` column.
- Create features DataFrame `X` by dropping the `loan_status` column.

In [None]:
# Create labels (y) and features (X)
try:
    y = df['loan_status']
    X = df.drop('loan_status', axis=1)
    print('Labels and features created.')
    print('Labels (first 5):')
    print(y.head())
    print('Features (first 5):')
    print(X.head())
except Exception as e:
    print('Error processing features and labels:', e)

## Step 3: Split Data into Training and Testing Sets

We use a 70/30 split to create training and testing sets.

In [None]:
# Split data into training and testing sets
try:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
    print('Data successfully split into training and testing sets.')
    print('Training set size:', X_train.shape)
    print('Testing set size:', X_test.shape)
except Exception as e:
    print('Error splitting the data:', e)

## Step 4: Train a Logistic Regression Model

We create an instance of LogisticRegression, train it on the training set, and then predict using the testing set.

In [None]:
# Initialize and train the logistic regression model
try:
    model = LogisticRegression(max_iter=1000)  # increased iterations for convergence
    model.fit(X_train, y_train)
    print('Model training completed.')
except Exception as e:
    print('Error training the model:', e)

## Step 5: Model Predictions and Evaluation

- Predict the test data labels using the trained model
- Generate a confusion matrix and classification report

In [None]:
# Generate predictions on the test dataset
try:
    y_pred = model.predict(X_test)
    print('Predictions generated.')

    # Evaluating Model
    cm = confusion_matrix(y_test, y_pred)
    cr = classification_report(y_test, y_pred)
    print('Confusion Matrix:')
    print(cm)
    print('\nClassification Report:')
    print(cr)
except Exception as e:
    print('Error in model prediction or evaluation:', e)

## Conclusion

The logistic regression model has been trained and evaluated. Review the confusion matrix and classification report to understand its performance on predicting healthy (0) versus high-risk (1) loans.

*Note: Debug print statements have been included throughout the notebook to help track the process.*