# Kredi Karti Dolandiricilik Tespiti
## Ogrenci: Rakan Hejazi - 222040101119

Bu notebook, kredi karti dolandiricilik tespiti icin iki gelismis makine ogrenimi modeli uygular:
1. **Random Forest** (Topluluk Yontemi)
2. **XGBoost** (Gradient Boosting)

Her iki model de hiperparametre ayari ve kapsamli degerlendirme metrikleri icerir.

## 1. Kutuphanelerin Yuklenmesi

In [None]:
# Gerekli kutuphaneleri ice aktar
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, roc_curve, confusion_matrix, classification_report
)
from xgboost import XGBClassifier
import warnings
warnings.filterwarnings('ignore')

print("Kutuphaneler basariyla yuklendi!")

## 2. Temizlenmis Verilerin Yuklenmesi
`data/processed/creditcard_clean.csv` dosyasindan onceden islenmis veriler yukleniyor

In [None]:
# Temizlenmis veri setini yukle
df = pd.read_csv('data/processed/creditcard_clean.csv')

print(f"Veri Seti Boyutu: {df.shape}")
print(f"\nSinif Dagilimi:")
print(df['Class'].value_counts())
print(f"\nDolandiricilik Orani: %{df['Class'].mean() * 100:.2f}")
df.head()

## 3. Egitim Icin Verilerin Hazirlanmasi

In [None]:
# Ozellikler ve hedef degiskeni ayir
X = df.drop('Class', axis=1)
y = df['Class']

# Verileri egitim ve test setlerine bol
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Egitim seti boyutu: {X_train.shape[0]}")
print(f"Test seti boyutu: {X_test.shape[0]}")
print(f"\nEgitim seti sinif dagilimi:")
print(y_train.value_counts())

## 4. Model 1: Hiperparametre Ayarli Random Forest

In [None]:
# Random Forest parametre izgarasini tanimla
rf_param_grid = {
    'n_estimators': [100, 200, 300],
    'max_depth': [10, 20, 30, None],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Random Forest modelini baslat
rf_model = RandomForestClassifier(random_state=42, n_jobs=-1)

# Daha hizli ayarlama icin RandomizedSearchCV uygula
rf_search = RandomizedSearchCV(
    rf_model, 
    rf_param_grid, 
    n_iter=20,
    cv=3, 
    scoring='f1',
    random_state=42,
    n_jobs=-1,
    verbose=1
)

print("Random Forest Hiperparametre Ayari ile Egitiliyor...")
rf_search.fit(X_train, y_train)

print(f"\nEn Iyi Parametreler: {rf_search.best_params_}")
print(f"En Iyi CV F1 Skoru: {rf_search.best_score_:.4f}")

In [None]:
# En iyi Random Forest modelini al
best_rf = rf_search.best_estimator_

# Tahminleri yap
rf_pred = best_rf.predict(X_test)
rf_pred_proba = best_rf.predict_proba(X_test)[:, 1]

# Metrikleri hesapla
rf_accuracy = accuracy_score(y_test, rf_pred)
rf_precision = precision_score(y_test, rf_pred)
rf_recall = recall_score(y_test, rf_pred)
rf_f1 = f1_score(y_test, rf_pred)
rf_auc = roc_auc_score(y_test, rf_pred_proba)

print("=" * 50)
print("RANDOM FOREST SONUCLARI")
print("=" * 50)
print(f"Dogruluk (Accuracy):  {rf_accuracy:.4f}")
print(f"Kesinlik (Precision): {rf_precision:.4f}")
print(f"Duyarlilik (Recall):  {rf_recall:.4f}")
print(f"F1-Skoru:             {rf_f1:.4f}")
print(f"AUC:                  {rf_auc:.4f}")
print("\nSiniflandirma Raporu:")
print(classification_report(y_test, rf_pred, target_names=['Normal', 'Dolandiricilik']))

## 5. Model 2: Hiperparametre Ayarli XGBoost

In [None]:
# Dengesiz veriler icin scale_pos_weight hesapla
scale_pos_weight = len(y_train[y_train == 0]) / len(y_train[y_train == 1])

# XGBoost parametre izgarasini tanimla
xgb_param_grid = {
    'learning_rate': [0.01, 0.05, 0.1],
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 5, 7, 10],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0]
}

# XGBoost modelini baslat
xgb_model = XGBClassifier(
    random_state=42, 
    scale_pos_weight=scale_pos_weight,
    use_label_encoder=False,
    eval_metric='logloss'
)

# RandomizedSearchCV uygula
xgb_search = RandomizedSearchCV(
    xgb_model, 
    xgb_param_grid, 
    n_iter=20,
    cv=3, 
    scoring='f1',
    random_state=42,
    n_jobs=-1,
    verbose=1
)

print("XGBoost Hiperparametre Ayari ile Egitiliyor...")
xgb_search.fit(X_train, y_train)

print(f"\nEn Iyi Parametreler: {xgb_search.best_params_}")
print(f"En Iyi CV F1 Skoru: {xgb_search.best_score_:.4f}")

In [None]:
# En iyi XGBoost modelini al
best_xgb = xgb_search.best_estimator_

# Tahminleri yap
xgb_pred = best_xgb.predict(X_test)
xgb_pred_proba = best_xgb.predict_proba(X_test)[:, 1]

# Metrikleri hesapla
xgb_accuracy = accuracy_score(y_test, xgb_pred)
xgb_precision = precision_score(y_test, xgb_pred)
xgb_recall = recall_score(y_test, xgb_pred)
xgb_f1 = f1_score(y_test, xgb_pred)
xgb_auc = roc_auc_score(y_test, xgb_pred_proba)

print("=" * 50)
print("XGBOOST SONUCLARI")
print("=" * 50)
print(f"Dogruluk (Accuracy):  {xgb_accuracy:.4f}")
print(f"Kesinlik (Precision): {xgb_precision:.4f}")
print(f"Duyarlilik (Recall):  {xgb_recall:.4f}")
print(f"F1-Skoru:             {xgb_f1:.4f}")
print(f"AUC:                  {xgb_auc:.4f}")
print("\nSiniflandirma Raporu:")
print(classification_report(y_test, xgb_pred, target_names=['Normal', 'Dolandiricilik']))

## 6. ROC Egrileri Karsilastirmasi

In [None]:
# ROC egrilerini hesapla
rf_fpr, rf_tpr, _ = roc_curve(y_test, rf_pred_proba)
xgb_fpr, xgb_tpr, _ = roc_curve(y_test, xgb_pred_proba)

# ROC egrilerini ciz
plt.figure(figsize=(10, 8))

plt.plot(rf_fpr, rf_tpr, label=f'Random Forest (AUC = {rf_auc:.4f})', linewidth=2)
plt.plot(xgb_fpr, xgb_tpr, label=f'XGBoost (AUC = {xgb_auc:.4f})', linewidth=2)
plt.plot([0, 1], [0, 1], 'k--', label='Rastgele Siniflandirici', linewidth=1)

plt.xlabel('Yanlis Pozitif Orani (FPR)', fontsize=12)
plt.ylabel('Dogru Pozitif Orani (TPR)', fontsize=12)
plt.title('ROC Egrisi Karsilastirmasi', fontsize=14)
plt.legend(loc='lower right', fontsize=11)
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('roc_curves.png', dpi=300, bbox_inches='tight')
plt.show()

print("ROC Egrisi 'roc_curves.png' olarak kaydedildi")

## 7. Karisiklik Matrisleri

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Random Forest Karisiklik Matrisi
rf_cm = confusion_matrix(y_test, rf_pred)
sns.heatmap(rf_cm, annot=True, fmt='d', cmap='Blues', ax=axes[0],
            xticklabels=['Normal', 'Dolandiricilik'],
            yticklabels=['Normal', 'Dolandiricilik'])
axes[0].set_title('Random Forest - Karisiklik Matrisi', fontsize=12)
axes[0].set_xlabel('Tahmin Edilen')
axes[0].set_ylabel('Gercek')

# XGBoost Karisiklik Matrisi
xgb_cm = confusion_matrix(y_test, xgb_pred)
sns.heatmap(xgb_cm, annot=True, fmt='d', cmap='Greens', ax=axes[1],
            xticklabels=['Normal', 'Dolandiricilik'],
            yticklabels=['Normal', 'Dolandiricilik'])
axes[1].set_title('XGBoost - Karisiklik Matrisi', fontsize=12)
axes[1].set_xlabel('Tahmin Edilen')
axes[1].set_ylabel('Gercek')

plt.tight_layout()
plt.savefig('confusion_matrices.png', dpi=300, bbox_inches='tight')
plt.show()

## 8. Sonuc Karsilastirma Tablosu

In [None]:
# Sonuc DataFrame olustur
results_df = pd.DataFrame({
    'Model': ['Random Forest', 'XGBoost'],
    'Dogruluk': [rf_accuracy, xgb_accuracy],
    'Kesinlik': [rf_precision, xgb_precision],
    'Duyarlilik': [rf_recall, xgb_recall],
    'F1-Skoru': [rf_f1, xgb_f1],
    'AUC': [rf_auc, xgb_auc]
})

# Degerleri yuvarla
results_df = results_df.round(4)

# Sonuclari goster
print("=" * 70)
print("MODEL KARSILASTIRMA SONUCLARI")
print("=" * 70)
print(results_df.to_string(index=False))

# Sonuclari CSV dosyasina kaydet
results_df.to_csv('model_comparison_results.csv', index=False)
print("\nSonuclar 'model_comparison_results.csv' dosyasina kaydedildi")

## 9. En Iyi Model Analizi

In [None]:
# F1-Skoruna gore en iyi modeli belirle (dengesiz veriler icin onemli)
best_model_idx = results_df['F1-Skoru'].idxmax()
best_model_name = results_df.loc[best_model_idx, 'Model']

print("=" * 70)
print("SON ANALIZ")
print("=" * 70)
print(f"\nEN IYI MODEL: {best_model_name}")
print(f"\nEn Iyi Model Metrikleri:")
print(results_df.loc[best_model_idx].to_string())

print("\n" + "-" * 70)
print("ANALIZ NOTLARI:")
print("-" * 70)
print("""
1. Veri Seti Ozellikleri:
   - Bu oldukca dengesiz bir veri setidir (dolandiricilik vakalari nadir)
   - F1-Skoru ve AUC, Dogruluktan daha guvenilir metriklerdir

2. Model Performansi:
   - Her iki model de sinif dengesizligini farkli mekanizmalarla ele alir
   - Random Forest topluluk oylamasi kullanir
   - XGBoost scale_pos_weight parametresini kullanir

3. Asiri Ogrenme/Yetersiz Ogrenme:
   - Hiperparametre ayari asiri ogrenmeyi onlemeye yardimci olur
   - Capraz dogrulama model genellemesini saglar
   - max_depth parametresi model karmasikligini kontrol eder

4. Oneriler:
   - Dolandiricilik tespiti icin daha fazla dolandiricilik yakalamak adina Duyarliliga oncelik verin
   - Is ihtiyaclarina gore Kesinlik ve Duyarlilik arasinda denge kurun
   - Uretim icin her iki modelin toplulugunu dusunun
""")

## 10. Ozellik Onemi

In [None]:
# En iyi modellerden ozellik onemini al
fig, axes = plt.subplots(1, 2, figsize=(16, 8))

# Random Forest Ozellik Onemi
rf_importance = pd.DataFrame({
    'Ozellik': X.columns,
    'Onem': best_rf.feature_importances_
}).sort_values('Onem', ascending=True).tail(15)

axes[0].barh(rf_importance['Ozellik'], rf_importance['Onem'], color='steelblue')
axes[0].set_title('Random Forest - En Onemli 15 Ozellik', fontsize=12)
axes[0].set_xlabel('Onem Derecesi')

# XGBoost Ozellik Onemi
xgb_importance = pd.DataFrame({
    'Ozellik': X.columns,
    'Onem': best_xgb.feature_importances_
}).sort_values('Onem', ascending=True).tail(15)

axes[1].barh(xgb_importance['Ozellik'], xgb_importance['Onem'], color='forestgreen')
axes[1].set_title('XGBoost - En Onemli 15 Ozellik', fontsize=12)
axes[1].set_xlabel('Onem Derecesi')

plt.tight_layout()
plt.savefig('feature_importance.png', dpi=300, bbox_inches='tight')
plt.show()

---
## Ozet

Bu notebook sunlari uyguladi:
1. Hiperparametre Ayarli Random Forest (n_estimators, max_depth, min_samples_split)
2. Hiperparametre Ayarli XGBoost (learning_rate, n_estimators, max_depth, subsample)
3. Tam degerlendirme metrikleri (Dogruluk, Kesinlik, Duyarlilik, F1-Skoru, AUC)
4. Her iki model icin ROC Egrileri
5. Tum metriklerle karsilastirma tablosu
6. En iyi model analizi ve dengesiz veri isleme

**Veri Kaynagi:** `data/processed/creditcard_clean.csv` (Data_Cleaning.ipynb dosyasinda on islenmis)