**Data Mining Superstore: Explorasi, Clustering & Klasifikasi**

Langkah 1 : Upload And Load Data

In [None]:
import pandas as pd
df = pd.read_excel("Datasets Superstore.xlsx")
df.head()

Langkah 2 : Eksplorasi Data

In [None]:
df.info()
print("\nMissing values:\n", df.isnull().sum())
df.describe()
df.columns

Langkah 3 : Visualisasi

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

df.groupby('Category')['Sales'].sum().plot(kind='bar', title='Penjualan per Kategori')
plt.ylabel('Total Sales')
plt.show()

sns.scatterplot(data=df, x='Discount', y='Profit')
plt.title('Diskon vs Profit')
plt.show()

Langkah 4 : Klasifikasi Untung/rugi(Decission Tree)

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

df['label'] = df['Profit'].apply(lambda x: 1 if x > 0 else 0)
features = df[['Sales', 'Discount', 'Quantity']].fillna(0)
labels = df['label']

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.3, random_state=42)
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Langkah 4.1 : Klasifikasi Untung/Rugi(KNN)

In [None]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

df['label'] = df['Profit'].apply(lambda x: 1 if x > 0 else 0)
features = df[['Sales', 'Discount', 'Quantity']].fillna(0)
labels = df['label']

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.3, random_state=42)
model = KNeighborsClassifier(n_neighbors=5) 
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


Langkah 4.2 : Klasifikasi Untung/Rugi(Random Forest)

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

df['label'] = df['Profit'].apply(lambda x: 1 if x > 0 else 0)
features = df[['Sales', 'Discount', 'Quantity']].fillna(0)
labels = df['label']

X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.3, random_state=42)
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

print("Classification Report:\n", classification_report(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))


**KESIMPULAN**

Ketiga metode menghasilkan akurasi berbeda karena cara kerja dan sensitivitas mereka terhadap data juga berbeda. Decision Tree cukup fleksibel dalam menangkap pola data dan menghasilkan akurasi 93%, tapi bisa overfitting kalau tidak dikontrol. KNN menghasilkan akurasi lebih rendah (85%) karena sangat bergantung pada jarak antar data, jadi performanya menurun jika data tidak terstandarisasi atau mengandung outlier. Sementara itu, Random Forest mampu mengurangi overfitting dengan menggabungkan banyak pohon, sehingga lebih stabil dan menghasilkan akurasi tertinggi (94%). Jadi perbedaan akurasi ini wajar karena masing-masing algoritma punya kekuatan dan kelemahan tersendiri dalam menangani data.