# **DSW Data Challenge**

# Predictive Modeling for Churn Labels in Telco Customer Data using Machine Learning

## **Import Libraries & Data Understanding**

### **Import Libraries and File**

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import chi2_contingency
from tabulate import tabulate
from sklearn.preprocessing import OneHotEncoder, LabelEncoder
from imblearn.over_sampling import SMOTE
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, f1_score,recall_score
from sklearn.metrics import precision_score, average_precision_score, roc_auc_score, roc_curve, auc
from xgboost import XGBClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier

Kode di atas mengimpor sejumlah pustaka dan modul yang diperlukan untuk analisis dan pemodelan data. Ini termasuk pustaka untuk manipulasi data, visualisasi, serta penggunaan berbagai algoritma machine learning seperti RandomForest, XGBoost, dan Keras. Selain itu, ada pustaka untuk mengatasi ketidakseimbangan kelas data dengan metode SMOTE.

In [None]:
from google.colab import files
upload = files.upload()

Kode di atas menggunakan pustaka `google.colab` untuk mengunggah file ke lingkungan Google Colab.

In [None]:
df = pd.read_excel('Telco_customer_churn_adapted_v2.xlsx')

Baris kode ini menggunakan pustaka `pandas` untuk membaca file Excel dengan nama 'Telco_customer_churn_adapted_v2.xlsx' ke dalam sebuah DataFrame yang disimpan dalam variabel 'df'.

In [None]:
df

Baris kode ini mencetak (menampilkan) DataFrame 'df' yang telah Anda baca sebelumnya. Dengan melihat isi DataFrame ini, Anda dapat memeriksa dan memahami struktur data yang akan digunakan dalam proyek Anda, termasuk kolom-kolom, nilai-nilai, dan informasi dasar lainnya yang terkandung dalam data tersebut.

### **Checking Data Types**

In [None]:
df.info()

Dengan menggunakan perintah `df.info()`, Anda dapat melihat informasi tentang DataFrame 'df'. Ini akan mencakup informasi seperti jumlah entri, jumlah kolom, tipe data masing-masing kolom, serta apakah ada nilai-nilai yang hilang (missing values) dalam DataFrame tersebut. Informasi ini akan membantu Anda memahami karakteristik data yang sedang Anda kerjakan dan memutuskan langkah-langkah selanjutnya dalam analisis atau pemrosesan data.

### **Checking Data Statistics**

In [None]:
df.describe()

Pemanggilan df.describe() digunakan untuk menghasilkan statistik deskriptif ringkas tentang kolom-kolom numerik dalam DataFrame 'df'. Ini mencakup statistik seperti rata-rata (mean), standar deviasi (std), nilai minimum (min), kuartil bawah (25%), median (50%), kuartil atas (75%), dan nilai maksimum (max) untuk setiap kolom numerik.

### **Checking Null and Duplicated Values**

In [None]:
df.isnull().sum()

Dengan menggunakan df.isnull().sum(), Anda melakukan pemeriksaan terhadap DataFrame 'df' untuk mengidentifikasi apakah ada nilai-nilai yang hilang (null) dalam setiap kolom. Dan ternyata hasilnya menunjukkan bahwa tidak ada nilai null dalam DataFrame ini, yang berarti data sudah bersih dan tidak memerlukan langkah-langkah khusus untuk mengatasi masalah data yang hilang.

In [None]:
df.duplicated().sum()

Pemeriksaan df.duplicated().sum() digunakan untuk mengidentifikasi apakah ada baris duplikat dalam DataFrame 'df'. Dan hasilnya adalah 0, yang berarti tidak ada baris yang identik (duplikat) dalam DataFrame ini.

### **Feature Engineering**

In [None]:
df['Total Charges'] = df['Tenure Months'] * df['Monthly Purchase (Thou. IDR)']

In [None]:
df

### **Checking Outliers**

In [None]:
df.plot(kind='box', layout=(3, 5), figsize=(20, 10), subplots=True, sharex=False, sharey=False)
plt.show()

Kode ini membuat diagram box plot untuk setiap fitur dalam DataFrame 'df'. Box plot digunakan untuk mengidentifikasi adanya pencilan (outliers) dalam data.

## **Exploratory Data Analysis (EDA)**

### **Make New Dataframe for Correlation and Encode Target Feature**

In [None]:
corr_df = df.copy()

In [None]:
corr_df['Churn Label'].replace(to_replace='Yes', value=1, inplace=True)
corr_df['Churn Label'].replace(to_replace='No',  value=0, inplace=True)

Kode di atas tampaknya mencoba mengganti nilai dalam kolom 'Churn Label' dengan 'Yes' menjadi 1 dan 'No' menjadi 0 dalam DataFrame 'corr_df'.

### **Categorical Features Correlation to Churn Label**

In [None]:
def contingency_table(feature1, feature2):
  return pd.crosstab(df[feature1], df[feature2])

Kode di atas adalah sebuah fungsi yang menghasilkan tabel kontingensi antara dua fitur (kolom) dalam DataFrame 'df'.

In [None]:
categorical_data = df.select_dtypes(include='object').columns.to_list()
chi_values = []
p_values = []
table_data = []

for i in categorical_data:
    contingency_feature = contingency_table('Churn Label', i)
    chi2, p, _, _ = chi2_contingency(contingency_feature)
    table = tabulate(contingency_feature, headers='keys', tablefmt='pretty')
    table_data.append({
        'Feature': i,
        'Chi-Squared Value': chi2,
        'P-Value': p,
        'Contingency Table': table
    })
    chi_values.append(chi2)
    p_values.append(p)

print(tabulate(table_data, headers='keys', tablefmt='pretty'))

Kode di atas menggambarkan langkah-langkah untuk melakukan analisis chi-squared (uji chi-kuadrat) untuk menguji hubungan antara variabel 'Churn Label' (variabel target) dan variabel kategoris lainnya dalam DataFrame

In [None]:
# Data
features = [item['Feature'] for item in table_data]
chi_values = [item['Chi-Squared Value'] for item in table_data]
p_values = [item['P-Value'] for item in table_data]

Baris kode di atas membuat tiga list: features, chi_values, dan p_values yang berisi informasi hasil analisis chi-squared (uji chi-kuadrat) yang telah dilakukan di cell sebelumnya pada data.

In [None]:
# Plot Chi-squared values
plt.figure(figsize=(8, 4))
plt.barh(features, chi_values, color='skyblue')
plt.xlabel('Chi-Squared Value')
plt.title('Chi-Squared Values for Categorical Features')
plt.gca().invert_yaxis()
plt.show()

# Plot P-values
plt.figure(figsize=(8, 4))
plt.barh(features, p_values, color='lightcoral')
plt.xlabel('P-Value')
plt.title('P-Values for Categorical Features')
plt.gca().invert_yaxis()
plt.show()

Kode di atas adalah bagian dari visualisasi hasil analisis chi-squared (uji chi-kuadrat) yang telah dilakukan pada fitur-fitur kategorikal. Kode ini menghasilkan dua grafik untuk memvisualisasikan nilai chi-kuadrat dan nilai p (p-value) dari setiap fitur kategorikal.

### **Numerical Features Correlation to Churn Label**

In [None]:
num_corr_df = corr_df.corr()['Churn Label'].sort_values(ascending=True)[1:]
fig,ax = plt.subplots(1, figsize=(8, 4))
sns.barplot(
    y=num_corr_df.index,
    x=num_corr_df.values,
    ax=ax
)
plt.title('Korelasi fitur numerik terhadap Churn Label')
plt.show()

Kode di atas digunakan untuk menghitung dan memvisualisasikan korelasi antara fitur-fitur numerik dalam DataFrame dengan variabel target 'Churn Label'.

### **Categorical and Numerical Features Correlation to Churn Label with One Hot Encoding**

In [None]:
ohe_df = pd.get_dummies(corr_df)
ohe_df.head()

Baris kode di atas digunakan untuk membuat variabel dummy (one-hot encoding) dari DataFrame 'corr_df' yang mengandung fitur-fitur kategorikal. One-hot encoding mengubah fitur-fitur kategoris menjadi representasi biner (0 dan 1).

In [None]:
plt.figure(figsize=(28, 26))
sns.heatmap(ohe_df.corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.show()

Kode di atas digunakan untuk membuat sebuah heatmap (peta panas) korelasi antara fitur-fitur dalam DataFrame 'ohe_df' yang telah dihasilkan setelah one-hot encoding. Heatmap adalah alat visual yang berguna untuk memahami sejauh mana fitur-fitur tersebut berkorelasi satu sama lain.

In [None]:
fig = px.bar(x=ohe_df.corr()['Churn Label'].sort_values(ascending=True).index,
             y=ohe_df.corr()['Churn Label'].sort_values(ascending=True).values,
             color=ohe_df.corr()['Churn Label'].sort_values(ascending=True).values)
fig.show()

Kode di atas mencoba membuat grafik bar untuk menampilkan korelasi antara fitur-fitur dalam DataFrame 'ohe_df' dengan variabel target 'Churn Label'.

### **Categorical and Numerical Features Correlation to Churn Label with Label Encoding**

In [None]:
def label_encoder(dataframe_series):
    if dataframe_series.dtype == 'object':
        return LabelEncoder().fit_transform(dataframe_series)
    return dataframe_series

In [None]:
le_df = corr_df.apply(lambda x: label_encoder(x))
le_df.head()

Kode di atas mendefinisikan sebuah fungsi label_encoder yang akan mengubah nilai-nilai dalam sebuah kolom (Series) dari DataFrame menjadi representasi numerik menggunakan LabelEncoder. Kemudian, Anda menerapkan fungsi ini pada setiap kolom dalam DataFrame 'corr_df' dengan menggunakan metode .apply().

In [None]:
plt.figure(figsize=(16, 12))
sns.heatmap(le_df.corr(), annot=True, cmap='coolwarm', linewidths=0.5)
plt.show()

Kode di atas digunakan untuk membuat sebuah heatmap (peta panas) korelasi antara fitur-fitur dalam DataFrame 'le_df' yang telah dihasilkan setelah one-hot encoding. Heatmap adalah alat visual yang berguna untuk memahami sejauh mana fitur-fitur tersebut berkorelasi satu sama lain.

In [None]:
fig = px.bar(le_df.corr()['Churn Label'].sort_values(ascending=True), color='value')
fig.show()

Kode di atas mencoba membuat grafik bar untuk menampilkan korelasi antara fitur-fitur dalam DataFrame 'le_df' dengan variabel target 'Churn Label'.

### **Feature Importance using Random Forest Model**

In [None]:
X = le_df.drop(['Churn Label'], axis=1)
y = le_df['Churn Label']
clf = RandomForestClassifier()
clf.fit(X, y)
feature_importance = clf.feature_importances_
print(feature_importance)

In [None]:
# Membuat DataFrame untuk memudahkan visualisasi
feature_importance_df = pd.DataFrame({'Feature': X.columns, 'Importance': feature_importance})
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=True)

# Plot bar chart fitur penting menggunakan Plotly Express
fig = px.bar(feature_importance_df, x='Importance', y='Feature', orientation='h', title='Feature Importance in Random Forest Model', color='Importance')
fig.update_xaxes(title_text='Feature Importance')
fig.update_yaxes(title_text='Feature')
fig.show()

### **Checking Data Distribution (Univariate Analysis)**

In [None]:
print(df.groupby('Tenure Months')['Customer ID'].nunique())
px.histogram(df, x='Tenure Months', title='Distribusi Fitur Tenure Months')

In [None]:
print(df.groupby('Location')['Customer ID'].nunique())
px.histogram(df, x='Location', title='Distribusi Fitur Location')

In [None]:
print(df.groupby('Device Class')['Customer ID'].nunique())
px.histogram(df, x='Device Class', title='Distribusi Fitur Device Class')

In [None]:
print(df.groupby('Games Product')['Customer ID'].nunique())
px.histogram(df, x='Games Product', title='Distribusi Fitur Games Product')

In [None]:
print(df.groupby('Music Product')['Customer ID'].nunique())
px.histogram(df, x='Music Product', title='Distribusi Fitur Music Product')

In [None]:
print(df.groupby('Education Product')['Customer ID'].nunique())
px.histogram(df, x='Education Product', title='Distribusi Fitur Education Product')

In [None]:
print(df.groupby('Call Center')['Customer ID'].nunique())
px.histogram(df, x='Call Center', title='Distribusi Fitur Call Center')

In [None]:
print(df.groupby('Video Product')['Customer ID'].nunique())
px.histogram(df, x='Video Product', title='Distribusi Fitur Video Product')

In [None]:
print(df.groupby('Use MyApp')['Customer ID'].nunique())
px.histogram(df, x='Use MyApp', title='Distribusi Fitur Use MyApp')

In [None]:
print(df.groupby('Payment Method')['Customer ID'].nunique())
px.histogram(df, x='Payment Method', title='Distribusi Fitur Payment Method')

In [None]:
print(df.groupby('Monthly Purchase (Thou. IDR)')['Customer ID'].nunique())
px.histogram(df, x='Monthly Purchase (Thou. IDR)', title='Distribusi Fitur Monthly Purchase (Thou. IDR)')

In [None]:
print(df.groupby('Churn Label')['Customer ID'].nunique())
px.histogram(df, x='Churn Label', title='Distribusi Fitur Churn Label')

In [None]:
print(df.groupby('CLTV (Predicted Thou. IDR)')['Customer ID'].nunique())
px.histogram(df, x='CLTV (Predicted Thou. IDR)', title='Distribusi Fitur CLTV (Predicted Thou. IDR)')

### **Bivariate Analysis**

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Tenure Months']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Tenure Months', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Tenure Months Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Location']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Location', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Location Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Device Class']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Device Class', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Device Class Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Games Product']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Games Product', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Games Product Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Music Product']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Music Product', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Music Product Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Education Product']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Education Product', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Education Product Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Call Center']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Call Center', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Call Center Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Video Product']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Video Product', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Video Product Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Use MyApp']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Use MyApp', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Use MyApp Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Payment Method']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Payment Method', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Payment Method Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'Monthly Purchase (Thou. IDR)']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='Monthly Purchase (Thou. IDR)', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs Monthly Purchase (Thou. IDR) Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
# Create a DataFrame with the data
data = df.groupby(['Churn Label', 'CLTV (Predicted Thou. IDR)']).size().reset_index(name='Count')

# Create an interactive bar chart
fig = px.bar(data, x='CLTV (Predicted Thou. IDR)', y='Count', color='Churn Label', barmode='group',
             title='Churn Label vs CLTV (Predicted Thou. IDR) Graph',
             labels={'Churn Label': 'Churn Label'}
             )

# Show the interactive plot
fig.show()

In [None]:
data = df.groupby(['Device Class', 'Games Product']).size().reset_index(name='Count')

fig = px.bar(data, x='Games Product', y='Count', color='Device Class',
             title='Device Class vs Games Product',
             labels={'Device Class': 'Device Class'},
             barmode='stack'
      )

fig.show()

In [None]:
data = df.groupby(['Device Class', 'Payment Method']).size().reset_index(name='Count')

fig = px.bar(data, x='Payment Method', y='Count', color='Device Class',
             title='Device Class vs Payment Method',
             labels={'Device Class': 'Device Class'},
             barmode='stack'
      )

fig.show()

In [None]:
fig = px.histogram(df, x='Monthly Purchase (Thou. IDR)', color='Device Class',marginal='box')
fig.show()

In [None]:
fig = px.histogram(df, x='Total Charges', color='Churn Label', marginal='box')
fig.show()

In [None]:
df.groupby('Churn Label')['Total Charges'].sum()

In [None]:
df.groupby('Churn Label')['Tenure Months'].sum()

In [None]:
fig = px.histogram(df, x='Tenure Months', color='Churn Label',marginal='box')
fig.show()

In [None]:
df.groupby('Churn Label')['Tenure Months'].quantile([.50,.75,.90,.95])

In [None]:
df.groupby('Churn Label')['Tenure Months'].mean()

In [None]:
# Membuat line plot
fig = px.line(df, x='Age', color='HeartDisease', markers=True)

# Menambahkan box plot marginal di sepanjang sumbu x
fig.update_layout(
    xaxis=dict(
        showline=True,
        showgrid=False,
        showticklabels=True
    ),
    yaxis=dict(
        showline=True,
        showgrid=False,
        showticklabels=True
    )
)

# Menambahkan label dan judul
fig.update_layout(
    xaxis_title='Age',
    yaxis_title='Frequency',
    title='Line Plot of Age by Heart Disease'
)

# Menampilkan plot
fig.show()

In [None]:
df.groupby('Churn Label')['Monthly Purchase (Thou. IDR)'].quantile([.50,.75,.90,.95])

In [None]:
df.groupby('Churn Label')['Monthly Purchase (Thou. IDR)'].mean()

In [None]:
fig = px.histogram(df, x='CLTV (Predicted Thou. IDR)', color='Churn Label',marginal='box')
fig.show()

In [None]:
df.groupby('Churn Label')['CLTV (Predicted Thou. IDR)'].sum()

In [None]:
df.groupby('Churn Label')['CLTV (Predicted Thou. IDR)'].quantile([.50,.75,.90,.95])

In [None]:
df.groupby('Churn Label')['CLTV (Predicted Thou. IDR)'].mean()

## **Modeling**

### **Feature Selection**

In [None]:
fix_df = df.drop(['Customer ID', 'Location', 'Longitude', 'Latitude'], axis=1)

In [None]:
fix_df.info()

In [None]:
fix_df.head()

### **Encoding**

In [None]:
fix_df['Churn Label'].replace(to_replace='Yes', value=1, inplace=True)
fix_df['Churn Label'].replace(to_replace='No',  value=0, inplace=True)

In [None]:
def encode_data(dataframe_series):
    if dataframe_series.dtype=='object':
        dataframe_series = LabelEncoder().fit_transform(dataframe_series)
    return dataframe_series

In [None]:
fix_df = fix_df.apply(lambda x: encode_data(x))
fix_df.head(10)

### **Oversampling**

In [None]:
fig = px.bar(x=fix_df.corr()['Churn Label'].sort_values(ascending=True).index,
             y=fix_df.corr()['Churn Label'].sort_values(ascending=True).values,
             color=fix_df.corr()['Churn Label'].sort_values(ascending=True).values)
fig.show()

In [None]:
fix_df.groupby('Churn Label')['Churn Label'].count()

In [None]:
over = SMOTE(sampling_strategy=1)

X = fix_df.drop("Churn Label", axis=1).values
y = fix_df['Churn Label'].values

In [None]:
X, y = over.fit_resample(X, y)

### **Train Test Split**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=2, test_size=0.2)

### **Making Model Function**

In [None]:
def model(method, X_train, y_train, X_test, y_test):
    method.fit(X_train, y_train)

    predictions = method.predict(X_test)
    c_matrix = confusion_matrix(y_test, predictions)

    percentages = (c_matrix / np.sum(c_matrix, axis=1)[:, np.newaxis]).round(2) * 100

    TP = c_matrix[1, 1]  # True Positives
    TN = c_matrix[0, 0]  # True Negatives
    FP = c_matrix[0, 1]  # False Positives
    FN = c_matrix[1, 0]  # False Negatives

    labels = [
        [f'TN: {TN} ({percentages[0, 0]:.2f}%)', f'FP: {FP} ({percentages[0, 1]:.2f}%)'],
        [f'FN: {FN} ({percentages[1, 0]:.2f}%)', f'TP: {TP} ({percentages[1, 1]:.2f}%)']
    ]
    labels = np.asarray(labels)

    sns.heatmap(c_matrix, annot=labels, fmt='', cmap='Blues')

    print(f'ROC AUC: {roc_auc_score(y_test, predictions):.2%}')
    print(f'Model accuracy: {accuracy_score(y_test, predictions):.2%}')
    print(classification_report(y_test, predictions))

### **XGBoost**

In [None]:
xgb = XGBClassifier(learning_rate=0.01, max_depth=16, n_estimators=1000, random_state=55)

In [None]:
model(xgb, X_train, y_train, X_test, y_test)

### **K-Nearest Neighbor (KNN)**

In [None]:
knn = KNeighborsClassifier(n_neighbors=10)
model(knn, X_train, y_train, X_test, y_test)

### **Random Forest**

In [None]:
RF = RandomForestClassifier(n_estimators=50, max_depth=16, random_state=55)
model(RF, X_train, y_train, X_test, y_test)

### **AdaBoost**

In [None]:
boosting = AdaBoostClassifier(learning_rate=0.01, random_state=55)
model(boosting, X_train, y_train, X_test, y_test)

### **Predictive AI Model**

In [None]:
model = XGBClassifier(learning_rate=0.01, max_depth=16, n_estimators=1000, random_state=55)
model.fit(X_train, y_train)

In [None]:
print('Apakah kamu memakai Device Class?')
print('0. High End')
print('1. Low End')
print('2. Mid End')
deviceClass = int(input('Jawab: '))

print('Berapa lama Anda berlangganan (dalam bulan)?')
tenureMonths = int(input('Jawab: '))

print('Apakah kamu memakai Games Product?')
print('0. No')
print('1. Yes')
gamesProduct = int(input('Jawab: '))

print('Apakah kamu memakai Music Product?')
print('0. No')
print('1. Yes')
musicProduct = int(input('Jawab: '))

print('Apakah kamu memakai Education Product?')
print('0. No')
print('1. Yes')
educationProduct = int(input('Jawab: '))

print('Apakah kamu menggunakan Call Center?')
print('0. No')
print('1. Yes')
callCenter = int(input('Jawab: '))

print('Apakah kamu memakai Video Product?')
print('0. No')
print('1. Yes')
videoProduct = int(input('Jawab: '))

print('Apakah kamu memakai MyApp?')
print('0. No')
print('1. Yes')
myApp = int(input('Jawab: '))

print('Berapa total pengeluaran bulanan Anda (dalam ribuan IDR)?')
monthlyPurchase = int(input('Jawab: '))

print('Apa Payment Method yang Anda gunakan?')
print('0. Credit')
print('1. Debit')
print('2. Digital Wallet')
print('3. Pulsa')
paymentMethod = int(input('Jawab: '))

print('Berapa total CLTV (dalam ribuan IDR) Anda?')
cltv = int(input('Jawab: '))

totalCharges = tenureMonths * monthlyPurchase

In [None]:
# Prepare input data from user
user_input = {
    'Device Class': deviceClass,
    'Tenure Months': tenureMonths,
    'Games Product': gamesProduct,
    'Music Product': musicProduct,
    'Education Product': educationProduct,
    'Call Center': callCenter,
    'Video Product': videoProduct,
    'Use MyApp': myApp,
    'Monthly Purchase (Thou. IDR)': monthlyPurchase,
    'Payment Method': paymentMethod,
    'CLTV (Predicted Thou. IDR)': cltv,
    'Total Charges': totalCharges
}

# Create a DataFrame from the user input
user_df = pd.DataFrame([user_input])

In [None]:
# Use the model to make predictions
predicted_churn = model.predict(user_df)
predicted_churn_prob = model.predict_proba(user_df)

print("Probabilitas Churn 'Yes':", predicted_churn_prob[0][1])
print("Probabilitas Churn 'No':", predicted_churn_prob[0][0])

if predicted_churn == [1]:
    print("Berdasarkan input Anda, kemungkinan pelanggan akan churn: 'Yes'")
else:
    print("Berdasarkan input Anda, kemungkinan pelanggan akan churn: 'No'")