<a href="https://colab.research.google.com/github/rndy44/preprocesingdata/blob/main/preprocessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [20]:
# ===================================================================
# BAGIAN 1: MEMUAT DATA DARI GOOGLE DRIVE
# ===================================================================

# Import library yang dibutuhkan
from google.colab import drive
import pandas as pd

# 1. Hubungkan (mount) Google Drive ke Colab
drive.mount('/content/drive')

# 2. Path absolut yang benar ke file Excel Anda
# (Sesuai dengan struktur folder yang sudah kita diskusikan)
file_path = '/content/drive/MyDrive/mesin learning/Dataset/Online Retail.xlsx'

# 3. Muat dataset menggunakan path tersebut
try:
    df = pd.read_excel(file_path)
    print("✅ BERHASIL! Dataset berhasil dimuat.")
    print("--------------------------------------------------")
except FileNotFoundError:
    print(f"❌ ERROR: File masih tidak ditemukan di '{file_path}'.")
    print("   Pastikan path sudah benar dan file ada di lokasi tersebut.")
    # Hentikan eksekusi lebih lanjut jika file tidak ditemukan
    df = None

# ===================================================================
# BAGIAN 2: PREPROCESSING DATA (HANYA JIKA DATA BERHASIL DIMUAT)
# ===================================================================

if df is not None:
    # Tampilkan 5 baris pertama untuk melihat data awal
    print("\nBerikut 5 baris pertama dari data mentah:")
    print(df.head())
    print("\nInfo data awal:")
    df.info()

    # --- TAHAP 1: Membersihkan Data yang Hilang (Missing Values) ---
    print("\n1. Menghapus baris dengan CustomerID yang kosong...")
    df.dropna(subset=['CustomerID'], inplace=True)
    print("   Ukuran data setelah menghapus baris kosong:", df.shape)


    # --- TAHAP 2: Membersihkan Data Tidak Valid ---
    print("\n2. Menghapus baris dengan Quantity negatif...")
    df = df[df['Quantity'] > 0]
    print("   Ukuran data setelah membersihkan nilai negatif:", df.shape)


    # --- TAHAP 3: Mengubah Tipe Data ---
    print("\n3. Mengubah tipe data CustomerID menjadi string...")
    df['CustomerID'] = df['CustomerID'].astype(int).astype(str)


    # --- TAHAP 4: Membuat Kolom Baru (Feature Engineering) ---
    print("\n4. Membuat kolom baru 'TotalPrice' (Quantity * UnitPrice)...")
    df['TotalPrice'] = df['Quantity'] * df['UnitPrice']


    # --- HASIL AKHIR ---
    print("\n--------------------------------------------------")
    print("✅ PREPROCESSING SELESAI!")
    print("\nBerikut 5 baris data setelah dibersihkan:")
    print(df.head())
    print("\nInfo data akhir:")
    df.info()

    # Opsional: Simpan hasil bersih ke file CSV baru di Google Drive Anda
    cleaned_file_path = '/content/drive/MyDrive/mesin learning/Dataset/Online_Retail_Cleaned.csv'
    df.to_csv(cleaned_file_path, index=False)
    print(f"\nData yang sudah bersih disimpan di: {cleaned_file_path}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
✅ BERHASIL! Dataset berhasil dimuat.
--------------------------------------------------

Berikut 5 baris pertama dari data mentah:
  InvoiceNo StockCode                          Description  Quantity  \
0    536365    85123A   WHITE HANGING HEART T-LIGHT HOLDER         6   
1    536365     71053                  WHITE METAL LANTERN         6   
2    536365    84406B       CREAM CUPID HEARTS COAT HANGER         8   
3    536365    84029G  KNITTED UNION FLAG HOT WATER BOTTLE         6   
4    536365    84029E       RED WOOLLY HOTTIE WHITE HEART.         6   

          InvoiceDate  UnitPrice  CustomerID         Country  
0 2010-12-01 08:26:00       2.55     17850.0  United Kingdom  
1 2010-12-01 08:26:00       3.39     17850.0  United Kingdom  
2 2010-12-01 08:26:00       2.75     17850.0  United Kingdom  
3 2010-12-01 08:26:00       3.39     17850.0  United Ki

Before running the code to load the data, please upload the `Online Retail.xlsx` file to your Colab environment. You can do this by clicking the folder icon on the left sidebar, then clicking the "Upload to session storage" icon and selecting the file from your computer.