# CREDIT SCORE ANALYSIS FOR KENYAN SACCOS LIMITED 

- Industry: Financial Services, specifically the SACCO industry in Kenya.
- Business Problem: SACCOs play a crucial role in providing financial services to their members, including loans. Assessing the creditworthiness of SACCO members is vital to making sound lending decisions and managing the risk associated with loan portfolios.

## Project Objectives
The primary objectives of this project within the SACCO industry in Kenya are as follows:

- Credit Scoring Model: Develop a predictive model tailored to SACCOs that accurately assesses the creditworthiness of their members based on demographic and financial attributes.

- Risk Management: Improve the ability of SACCOs to manage loan portfolios effectively by identifying and mitigating the risk of default among members.

- Operational Efficiency: Streamline the credit assessment process for SACCOs, making it more efficient and data-driven.

## Potential Business Impact
The successful completion of this project is expected to deliver several significant business benefits to SACCOs in Kenya:

- Risk Mitigation: A robust credit scoring model will enable SACCOs to reduce the risk of lending to individuals with a higher likelihood of default, enhancing the overall financial stability of the SACCO.

- Improved Lending: SACCOs will be able to make more informed lending decisions, leading to better loan portfolio performance and profitability.

- Member Engagement: Providing fair and transparent credit assessments can improve trust and engagement among SACCO members, potentially attracting more members and deposits.

- Regulatory Compliance: Compliance with regulatory requirements, such as the SASRA (Sacco Societies Regulatory Authority) guidelines in Kenya, is essential for the SACCO industry, and an accurate credit scoring model can aid in achieving compliance.

## Stakeholders
Key stakeholders involved in this project within the SACCO industry in Kenya include:

- SACCOs: These are the primary users of the credit scoring model, including large SACCOs, small community-based SACCOs, and other SACCO institutions in Kenya.

- SASRA and Regulatory Authorities: Government agencies, including SASRA, responsible for overseeing and regulating the SACCO industry in Kenya.

- SACCO Members: Individuals who are members of SACCOs and are seeking loans or other financial services.

- Data Analysts: The team responsible for developing and maintaining the credit scoring model, often working in collaboration with SACCOs and regulatory authorities.

# DATA UNDERSTANDING 

The dataset containing information about SACCO members' attributes and credit scores will be used to train and evaluate the credit scoring model. The model will leverage this data to make predictions regarding the creditworthiness of SACCO members, facilitating more informed lending decisions.

This dataset contains information about a sample of over 100 people across the world. The data includes the following information:

- Age
- Gender
- Income
- Education
- Marital Status
- Number of Children
- Home Ownership
- Credit Score

# DATA PREPROCESSING

## IMPORTS 

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

Load the dataset 

In [13]:
data = pd.read_csv(r"C:\Users\Administrator\OneDrive\Desktop\CREDIT SCORE API\Credit Score Classification Dataset.csv")
data.head()

Unnamed: 0,Age,Gender,Income,Education,Marital Status,Number of Children,Home Ownership,Credit Score
0,25,Female,50000,Bachelor's Degree,Single,0,Rented,High
1,30,Male,100000,Master's Degree,Married,2,Owned,High
2,35,Female,75000,Doctorate,Married,1,Owned,High
3,40,Male,125000,High School Diploma,Single,0,Owned,High
4,45,Female,100000,Bachelor's Degree,Married,3,Owned,High


Check for missing values

In [14]:
# Check for missing values in each column
missing_values = data.isnull().sum()

# Display columns with missing values and their respective counts
print("Columns with Missing Values:")
for column, count in missing_values.items():
    if count > 0:
        print(f"{column}: {count} missing values")

Columns with Missing Values:


No missing values in the data so its clean

Hot-encoding to fit the data 

In [24]:
# Encode categorical variables
data_encoded = pd.get_dummies(data, columns=["Gender", "Education", "Marital Status", "Home Ownership"], drop_first=True)

# Split data into features (X) and target (y)
X = data_encoded.drop(columns=["Credit Score"])
y = data_encoded["Credit Score"]

Spliting the data 

In [30]:
from sklearn.model_selection import train_test_split

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [31]:
from sklearn.preprocessing import StandardScaler

# Scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# MODELING

Using simple linear regression model to see the outcome.

In [33]:
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import LabelEncoder

# Encode the target variable using label encoding
label_encoder = LabelEncoder()
y_train_encoded = label_encoder.fit_transform(y_train)

# Initialize and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train_encoded)

In [40]:
from sklearn.metrics import mean_squared_error

# Encode the test target variable using label encoding
y_test_encoded = label_encoder.transform(y_test)

# Calculate Mean Squared Error (MSE)
mse = mean_squared_error(y_test_encoded, y_pred)

print(f"Mean Squared Error (MSE): {mse}")


Mean Squared Error (MSE): 0.28088010823610005


MSE measures the average squared difference between the model's predictions and the actual values.
Lower MSE values are generally better, indicating better model performance.

The R-squared score measures the proportion of the variance in the dependent variable (target) that is predictable from the independent variables (features). It provides insight into how well your model explains the variability in the data. Higher R2 values (closer to 1) indicate better model fit.

In [41]:
from sklearn.metrics import r2_score

# Calculate R2 score
r2 = r2_score(y_test_encoded, y_pred)

print(f"R-squared (R2): {r2}")


R-squared (R2): 0.07309564282086978
