# _Credit Score Classification Using Machine Learning_

<img src='https://storage.googleapis.com/kaggle-datasets-images/2289007/3846912/ad5e128929f5ac26133b67a6110de7c0/dataset-cover.jpg?t=2022-06-22-14-33-45'>

_Bu projede, banka ve kredi kartı müşterilerinin kredi geçmişi verileri kullanılarak müşterilerin kredi skorlarının Good, Standard ve Poor olmak üzere üç farklı sınıfa ayrılması amaçlanmıştır. Proje, denetimli öğrenme (supervised learning) kapsamında bir sınıflandırma problemi olarak ele alınmıştır._

_Çalışmada, kredi kartı müşterilerine ait demografik bilgiler, finansal durumlar ve kredi kullanım alışkanlıklarını içeren etiketli bir veri seti kullanılmıştır. Veri ön işleme adımlarının ardından makine öğrenmesi algoritmaları uygulanarak model eğitilmiş ve performansı çeşitli değerlendirme metrikleri (Accuracy, Precision, Recall, F1-Score) ile analiz edilmiştir._

_Eğitilen model, gerçek kullanıcı girdileriyle tahmin yapabilmesi için Streamlit kullanılarak bir web uygulamasına dönüştürülmüş ve Hugging Face Spaces üzerinde çevrim içi olarak yayınlanmıştır. Bu sayede proje, hem teknik bir makine öğrenmesi çalışması hem de uçtan uca (end-to-end) bir veri bilimi uygulaması olarak sunulmuştur._

## _Data Preprocessing_

### _Import_

In [1]:
import pandas as pd
pd.set_option('display.max_columns', 100)

import numpy as np

# pandas kütüphanesi veri analizi ve tablo (DataFrame) işlemleri için kullanılır.
# Bu ayar, çıktıdaki maksimum sütun sayısını 100 olarak belirler, böylece veriler tam görünür.

import warnings
warnings.filterwarnings('ignore')

# warnings kütüphanesi, çalışma sırasında çıkan uyarı mesajlarını kontrol eder.
# Bu satır uyarı mesajlarını gizleyerek çıktının temiz görünmesini sağlar.

import matplotlib.pyplot as plt
import seaborn as sns 

# matplotlib.pyplot ve seaborn kütüphaneleri veri görselleştirme için kullanılır.
# plt temel grafik çizimlerinde, sns ise daha gelişmiş ve estetik grafiklerde tercih edilir.

### _Read Data_

In [2]:
df = pd.read_csv('train.csv')

# 'train.csv' adlı CSV dosyasını okur ve veriyi bir pandas DataFrame'i olarak 'df' değişkenine yükler.

### _Exploratory Data Analysis_

In [3]:
df.head()
# Veri setinin ilk 5 satırını görüntüler.
# Bu, veri yapısını ve sütunları genel olarak incelemek için kullanılır.

Unnamed: 0,ID,Customer_ID,Month,Name,Age,SSN,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
0,0x1602,CUS_0xd40,January,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7.0,11.27,4.0,_,809.98,26.82262,22 Years and 1 Months,No,49.574949,80.41529543900253,High_spent_Small_value_payments,312.49408867943663,Good
1,0x1603,CUS_0xd40,February,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",-1,,11.27,4.0,Good,809.98,31.94496,,No,49.574949,118.28022162236736,Low_spent_Large_value_payments,284.62916249607184,Good
2,0x1604,CUS_0xd40,March,Aaron Maashoh,-500,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7.0,_,4.0,Good,809.98,28.609352,22 Years and 3 Months,No,49.574949,81.699521264648,Low_spent_Medium_value_payments,331.2098628537912,Good
3,0x1605,CUS_0xd40,April,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",5,4.0,6.27,4.0,Good,809.98,31.377862,22 Years and 4 Months,No,49.574949,199.4580743910713,Low_spent_Small_value_payments,223.45130972736783,Good
4,0x1606,CUS_0xd40,May,Aaron Maashoh,23,821-00-0265,Scientist,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",6,,11.27,4.0,Good,809.98,24.797347,22 Years and 5 Months,No,49.574949,41.420153086217326,High_spent_Medium_value_payments,341.48923103222177,Good


In [4]:
df.sample()
# Veri setinden rastgele bir satır (veya satırlar) getirir.
# Veri kontrolü ve genel görünüm hakkında fikir edinmek için yararlıdır.

Unnamed: 0,ID,Customer_ID,Month,Name,Age,SSN,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
69282,0x1abf4,CUS_0x2ae8,March,,20,498-74-4915,Engineer,132273.75,,5,4,4,2,"Debt Consolidation Loan, and Mortgage Loan",1,8,6.46,3.0,Good,827.64,24.000386,25 Years and 3 Months,No,144.701312,939.0165324222124,Low_spent_Large_value_payments,266.3634053909602,Standard


In [5]:
df.shape
# Veri setinin boyutlarını (satır, sütun) bir demet (tuple) olarak döndürür.
# Örneğin (1470, 35) gibi bir çıktı verir.

(100000, 28)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 28 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   ID                        100000 non-null  object 
 1   Customer_ID               100000 non-null  object 
 2   Month                     100000 non-null  object 
 3   Name                      90015 non-null   object 
 4   Age                       100000 non-null  object 
 5   SSN                       100000 non-null  object 
 6   Occupation                100000 non-null  object 
 7   Annual_Income             100000 non-null  object 
 8   Monthly_Inhand_Salary     84998 non-null   float64
 9   Num_Bank_Accounts         100000 non-null  int64  
 10  Num_Credit_Card           100000 non-null  int64  
 11  Interest_Rate             100000 non-null  int64  
 12  Num_of_Loan               100000 non-null  object 
 13  Type_of_Loan              88592 non-null   ob

### _Missing Value Imputation | Feature Engineering_

In [7]:
df.isnull().sum()
# Her sütunda kaç adet eksik (NaN) değer bulunduğunu gösterir.
# Eksik veri analizi yapmak için kullanılır.

ID                              0
Customer_ID                     0
Month                           0
Name                         9985
Age                             0
SSN                             0
Occupation                      0
Annual_Income                   0
Monthly_Inhand_Salary       15002
Num_Bank_Accounts               0
Num_Credit_Card                 0
Interest_Rate                   0
Num_of_Loan                     0
Type_of_Loan                11408
Delay_from_due_date             0
Num_of_Delayed_Payment       7002
Changed_Credit_Limit            0
Num_Credit_Inquiries         1965
Credit_Mix                      0
Outstanding_Debt                0
Credit_Utilization_Ratio        0
Credit_History_Age           9030
Payment_of_Min_Amount           0
Total_EMI_per_month             0
Amount_invested_monthly      4479
Payment_Behaviour               0
Monthly_Balance              1200
Credit_Score                    0
dtype: int64

In [8]:
df = df.drop(columns=['ID', 'Customer_ID', 'Name', 'SSN'], errors='ignore')

In [9]:
# Target: Credit_Score  →  Good / Standard / Poor

In [10]:
num_like_cols = [
    'Age', 'Annual_Income', 'Num_of_Loan',
    'Num_of_Delayed_Payment', 'Changed_Credit_Limit',
    'Outstanding_Debt', 'Amount_invested_monthly',
    'Monthly_Balance'
]

In [11]:
df[num_like_cols] = df[num_like_cols].apply(
    lambda x: pd.to_numeric(x, errors='coerce')
)

In [12]:
df['Credit_History_Age_Months'] = (
    df['Credit_History_Age']
    .str.extract(r'(\d+) Years.*?(\d+) Months')
    .astype(float)
    .dot([12, 1])
)

In [13]:
df.drop(columns='Credit_History_Age', inplace=True)

In [14]:
num_cols = [
    'Monthly_Inhand_Salary',
    'Num_Credit_Inquiries',
    'Credit_Utilization_Ratio',
    'Total_EMI_per_month'
]

In [15]:
cat_cols = [
    'Type_of_Loan',
    'Credit_History_Age_Months',
    'Amount_invested_monthly',
    'Payment_Behaviour',
    'Monthly_Balance',
]

In [16]:
from sklearn.impute import SimpleImputer

num_imputer = SimpleImputer(strategy='median')
cat_imputer = SimpleImputer(strategy='most_frequent')

df[num_cols] = num_imputer.fit_transform(df[num_cols])

df[cat_cols] = cat_imputer.fit_transform(df[cat_cols])

In [17]:
df.isnull().sum() # kontrol

Month                           0
Age                          4939
Occupation                      0
Annual_Income                6980
Monthly_Inhand_Salary           0
Num_Bank_Accounts               0
Num_Credit_Card                 0
Interest_Rate                   0
Num_of_Loan                  4785
Type_of_Loan                    0
Delay_from_due_date             0
Num_of_Delayed_Payment       9746
Changed_Credit_Limit         2091
Num_Credit_Inquiries            0
Credit_Mix                      0
Outstanding_Debt             1009
Credit_Utilization_Ratio        0
Payment_of_Min_Amount           0
Total_EMI_per_month             0
Amount_invested_monthly         0
Payment_Behaviour               0
Monthly_Balance                 0
Credit_Score                    0
Credit_History_Age_Months       0
dtype: int64

In [18]:
df[num_like_cols] = df[num_like_cols].fillna(df[num_like_cols].median())

In [19]:
df.isna().sum()

Month                        0
Age                          0
Occupation                   0
Annual_Income                0
Monthly_Inhand_Salary        0
Num_Bank_Accounts            0
Num_Credit_Card              0
Interest_Rate                0
Num_of_Loan                  0
Type_of_Loan                 0
Delay_from_due_date          0
Num_of_Delayed_Payment       0
Changed_Credit_Limit         0
Num_Credit_Inquiries         0
Credit_Mix                   0
Outstanding_Debt             0
Credit_Utilization_Ratio     0
Payment_of_Min_Amount        0
Total_EMI_per_month          0
Amount_invested_monthly      0
Payment_Behaviour            0
Monthly_Balance              0
Credit_Score                 0
Credit_History_Age_Months    0
dtype: int64

In [20]:
df.loc[(df['Age'] < 0) | (df['Age'] > 100), 'Age'] = df['Age'].median()

In [21]:
df.head()

Unnamed: 0,Month,Age,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score,Credit_History_Age_Months
0,January,23.0,Scientist,19114.12,1824.843333,3,4,3,4.0,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7.0,11.27,4.0,_,809.98,26.82262,No,49.574949,80.415295,High_spent_Small_value_payments,312.494089,Good,265.0
1,February,23.0,Scientist,19114.12,3093.745,3,4,3,4.0,"Auto Loan, Credit-Builder Loan, Personal Loan,...",-1,14.0,11.27,4.0,Good,809.98,31.94496,No,49.574949,118.280222,Low_spent_Large_value_payments,284.629162,Good,191.0
2,March,33.0,Scientist,19114.12,3093.745,3,4,3,4.0,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7.0,9.4,4.0,Good,809.98,28.609352,No,49.574949,81.699521,Low_spent_Medium_value_payments,331.209863,Good,267.0
3,April,23.0,Scientist,19114.12,3093.745,3,4,3,4.0,"Auto Loan, Credit-Builder Loan, Personal Loan,...",5,4.0,6.27,4.0,Good,809.98,31.377862,No,49.574949,199.458074,Low_spent_Small_value_payments,223.45131,Good,268.0
4,May,23.0,Scientist,19114.12,1824.843333,3,4,3,4.0,"Auto Loan, Credit-Builder Loan, Personal Loan,...",6,14.0,11.27,4.0,Good,809.98,24.797347,No,49.574949,41.420153,High_spent_Medium_value_payments,341.489231,Good,269.0


In [22]:
df['Type_of_Loan'].unique()

array(['Auto Loan, Credit-Builder Loan, Personal Loan, and Home Equity Loan',
       'Credit-Builder Loan', 'Auto Loan, Auto Loan, and Not Specified',
       ..., 'Home Equity Loan, Auto Loan, Auto Loan, and Auto Loan',
       'Payday Loan, Student Loan, Mortgage Loan, and Not Specified',
       'Personal Loan, Auto Loan, Mortgage Loan, Student Loan, and Student Loan'],
      dtype=object)

In [23]:
df['Payment_Behaviour'] = df['Payment_Behaviour'].replace('!@9#%8', np.nan).fillna(df['Payment_Behaviour'].mode()[0])

In [24]:
df['Type_of_Loan'] = (
    df['Type_of_Loan']
    .str.replace('and', ',', regex=False)
    .str.replace('Not Specified', '', regex=False)
)

In [25]:
loan_dummies = df['Type_of_Loan'].str.get_dummies(sep=', ')

In [26]:
df = pd.concat([df.drop('Type_of_Loan', axis=1), loan_dummies], axis=1)

In [27]:
df['Credit_Mix'] = df['Credit_Mix'].replace('_', df['Credit_Mix'].mode()[0])

In [28]:
df = df[(df['Age'] > 0) & (df['Delay_from_due_date'] >= 0)]

In [29]:
df.head()

Unnamed: 0,Month,Age,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score,Credit_History_Age_Months,Auto Loan,Credit-Builder Loan,Debt Consolidation Loan,Home Equity Loan,Mortgage Loan,Payday Loan,Personal Loan,Student Loan
0,January,23.0,Scientist,19114.12,1824.843333,3,4,3,4.0,3,7.0,11.27,4.0,Standard,809.98,26.82262,No,49.574949,80.415295,High_spent_Small_value_payments,312.494089,Good,265.0,1,1,0,1,0,0,1,0
2,March,33.0,Scientist,19114.12,3093.745,3,4,3,4.0,3,7.0,9.4,4.0,Good,809.98,28.609352,No,49.574949,81.699521,Low_spent_Medium_value_payments,331.209863,Good,267.0,1,1,0,1,0,0,1,0
3,April,23.0,Scientist,19114.12,3093.745,3,4,3,4.0,5,4.0,6.27,4.0,Good,809.98,31.377862,No,49.574949,199.458074,Low_spent_Small_value_payments,223.45131,Good,268.0,1,1,0,1,0,0,1,0
4,May,23.0,Scientist,19114.12,1824.843333,3,4,3,4.0,6,14.0,11.27,4.0,Good,809.98,24.797347,No,49.574949,41.420153,High_spent_Medium_value_payments,341.489231,Good,269.0,1,1,0,1,0,0,1,0
5,June,23.0,Scientist,19114.12,3093.745,3,4,3,4.0,8,4.0,9.27,4.0,Good,809.98,27.262259,No,49.574949,62.430172,Low_spent_Small_value_payments,340.479212,Good,270.0,1,1,0,1,0,0,1,0


In [30]:
df["Credit_Score"].value_counts()

Credit_Score
Standard    52961
Poor        28949
Good        17499
Name: count, dtype: int64

In [31]:
df['Credit_Score'] = df['Credit_Score'].map({'Poor': 0, 'Standard': 1, 'Good': 2})

In [32]:
df.tail()

Unnamed: 0,Month,Age,Occupation,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score,Credit_History_Age_Months,Auto Loan,Credit-Builder Loan,Debt Consolidation Loan,Home Equity Loan,Mortgage Loan,Payday Loan,Personal Loan,Student Loan
99995,April,25.0,Mechanic,39628.99,3359.415833,4,6,7,2.0,23,7.0,11.5,3.0,Standard,502.38,34.663572,No,35.104023,60.971333,High_spent_Large_value_payments,479.866228,0,378.0,1,0,0,0,0,0,0,1
99996,May,25.0,Mechanic,39628.99,3359.415833,4,6,7,2.0,18,7.0,11.5,3.0,Standard,502.38,40.565631,No,35.104023,54.18595,High_spent_Medium_value_payments,496.65161,0,379.0,1,0,0,0,0,0,0,1
99997,June,25.0,Mechanic,39628.99,3359.415833,4,6,5729,2.0,27,6.0,11.5,3.0,Good,502.38,41.255522,No,35.104023,24.028477,High_spent_Large_value_payments,516.809083,0,380.0,1,0,0,0,0,0,0,1
99998,July,25.0,Mechanic,39628.99,3359.415833,4,6,7,2.0,20,14.0,11.5,3.0,Good,502.38,33.638208,No,35.104023,251.672582,Low_spent_Large_value_payments,319.164979,1,381.0,1,0,0,0,0,0,0,1
99999,August,25.0,Mechanic,37550.74,3359.415833,4,6,7,2.0,18,6.0,11.5,3.0,Good,502.38,34.192463,No,35.104023,167.163865,Low_spent_Small_value_payments,393.673696,0,382.0,1,0,0,0,0,0,0,1


In [33]:
df = df[df['Interest_Rate'] < 100]

### _Save Data_

In [34]:
df.to_csv('clean_train.csv', index=False)

_Bu adımda eksik ve tutarsız veriler ele alınarak veri seti temizlenmiş ve standart bir yapıya getirilmiştir. Temizlenen veri, modelleme aşamasında kullanılmak üzere CSV formatında kaydedilmiştir._