<a href="./advanced_algoritms.ipynb" target="_self">
  <button style="
    padding:10px 18px;
    font-size:16px;
    background-color:#2563eb;
    color:white;
    border:none;
    border-radius:8px;
    cursor:pointer;">
    ‚û°Ô∏è Go to Advanced Algoritms
  </button>
</a>

# üè® Hotel Booking Cancellation Prediction

Ushbu loyiha mehmonxona bron qilish bekor qilishini bashorat qilish uchun tayyorlangan. Loyihada **data preprocessing**, **feature engineering**, **feature selection** va **SMOTE bilan oversampling** bosqichlari bajarilgan.

---

## 1Ô∏è‚É£ Feature Selection va Engineering

- **Manba:** `Feature_Selection` papkada joylashgan CSV fayllar:
  - `X_train_selected.csv`
  - `X_test_selected.csv`

- **Qilingan ishlar:**
  - **String ‚Üí Numeric** aylantirish (masalan, oylar `"arrival_date_month"` ‚Üí raqamlar `1‚Äì12`)
  - **Missing values** uchun oddiy imputation:
    - Numeric: o‚Äòrtacha qiymat (`mean`)
    - Categorical: eng ko‚Äòp uchraydigan qiymat (`most_frequent`)
  - **Encoding:**
    - Categorical featurelar uchun **Label Encoding / One-Hot Encoding** ishlatildi
  - **Feature Selection:**  
    - **LassoCV** bilan eng muhim featurelar tanlandi (`coef_ != 0`)
    - Natija:
      - Tanlangan featurelar soni kamaytirildi
      - Eng informativ ustunlar ajratildi

- **Saqlangan joy:** `Feature_Selection` papkasi

---

## 2Ô∏è‚É£ SMOTE bilan Oversampling

- **Manba:** Feature selected train dataset
  - `X_train_selected.csv`
  - `y_train.csv`

- **Qilingan ishlar:**
  - **SMOTE** (`Synthetic Minority Over-sampling Technique`) yordamida kam uchraydigan class (`is_canceled=1`) soni synthetically ko‚Äòpaytirildi
  - Oversamplingdan so‚Äòng dataset balanslandi (minority va majority class soni tenglashdi)
  
- **Natija:**
  - Balanslangan dataset:
    - `X_train_selected_smote.csv`
    - `y_train_smote.csv`

- **Saqlangan joy:** `SMOTE_Data` papkasi

- **üí° Eslatma:** SMOTE faqat train datasetga qo‚Äòllanadi. Test dataset **balanssiz** qoldiriladi, haqiqiy test sharoitlarini saqlash uchun.

---

## 3Ô∏è‚É£ Foydalanilgan fayllar papkasi

- Data/
- ‚îú‚îÄ Preprosessed/
- ‚îÇ ‚îú‚îÄ X_train.csv
- ‚îÇ ‚îú‚îÄ X_test.csv
- ‚îÇ ‚îú‚îÄ y_train.csv
- ‚îÇ ‚îî‚îÄ y_test.csv
- ‚îú‚îÄ Feature_Selection/
- ‚îÇ ‚îú‚îÄ X_train_selected.csv
- ‚îÇ ‚îî‚îÄ X_test_selected.csv
- ‚îî‚îÄ SMOTE_Data/
- ‚îú‚îÄ X_train_selected_smote.csv
- ‚îî‚îÄ y_train_smote.csv

In [5]:
import pandas as pd
import logging
import os

# =========================
# LOG FAYL YO'LI
# =========================
log_path = r"C:\Users\Rasulbek907\Desktop\Hotel Booking Cancellation Prediction\Log\data_loader.log"
os.makedirs(os.path.dirname(log_path), exist_ok=True)

logging.basicConfig(
    filename=log_path,
    filemode="a",
    format="%(asctime)s - %(levelname)s - %(message)s",
    level=logging.INFO
)

logging.info("===== FEATURE SELECTED DATA LOADER BOSHLANDI =====")

# =========================
# DATA PATHS
# =========================
FE_PATH = r"C:\Users\Rasulbek907\Desktop\Hotel Booking Cancellation Prediction\Data\Feature_Selection"
PREP_PATH = r"C:\Users\Rasulbek907\Desktop\Hotel Booking Cancellation Prediction\Data\Preprosessed"

PATHS = {
    "X_train": "X_train_selected.csv",
    "X_test":  "X_test_selected.csv",
    "y_train": os.path.join(PREP_PATH, "y_train.csv"),
    "y_test":  os.path.join(PREP_PATH, "y_test.csv")
}

# =========================
# DATA LOAD
# =========================
try:
    X_train = pd.read_csv(os.path.join(FE_PATH, PATHS["X_train"]))
    X_test  = pd.read_csv(os.path.join(FE_PATH, PATHS["X_test"]))
    y_train = pd.read_csv(PATHS["y_train"]).values.ravel()  # 1D array
    y_test  = pd.read_csv(PATHS["y_test"]).values.ravel()

    logging.info("Feature selected datasetlar muvaffaqiyatli yuklandi")
    logging.info(f"X_train shape: {X_train.shape}")
    logging.info(f"X_test  shape: {X_test.shape}")
    logging.info(f"y_train shape: {y_train.shape}")
    logging.info(f"y_test  shape: {y_test.shape}")

except Exception as e:
    logging.error(f"Datasetlarni yuklashda xatolik: {e}")
    raise

# =========================
# SHAPE TEKSHIRUV
# =========================
if X_train.shape[0] != len(y_train):
    logging.error("X_train va y_train satr soni mos emas")
    raise ValueError("Train set mismatch")

if X_test.shape[0] != len(y_test):
    logging.error("X_test va y_test satr soni mos emas")
    raise ValueError("Test set mismatch")

# =========================
# TARGET LEAKAGE TEKSHIRUV
# =========================
if hasattr(y_train, 'columns') and set(y_train.columns) & set(X_train.columns):
    logging.error("Target X_train ichiga kirib ketgan!")
    raise ValueError("Target leakage detected")

logging.info("DLP tekshiruvlar muvaffaqiyatli o‚Äòtdi")
logging.info("===== FEATURE SELECTED DATA LOADER YAKUNLANDI =====")

print("‚úÖ Feature selected datasetlar muvaffaqiyatli yuklandi va tekshirildi")

‚úÖ Feature selected datasetlar muvaffaqiyatli yuklandi va tekshirildi


In [None]:
from imblearn.over_sampling import SMOTE
import pandas as pd
import os

# =========================
# DATA PATHS
# =========================
FE_PATH = r"C:\Users\Rasulbek907\Desktop\Hotel Booking Cancellation Prediction\Data\Feature_Selection"
PREP_PATH = r"C:\Users\Rasulbek907\Desktop\Hotel Booking Cancellation Prediction\Data\Preprosessed"

X_train_file = os.path.join(FE_PATH, "X_train_selected.csv")
y_train_file = os.path.join(PREP_PATH, "y_train.csv")

# =========================
# LOAD DATA
# =========================
X_train_selected = pd.read_csv(X_train_file)
y_train = pd.read_csv(y_train_file).values.ravel()  # 1D array

# =========================
# SMOTE SAVE PATH
# =========================
SAVE_PATH = r"C:\Users\Rasulbek907\Desktop\Hotel Booking Cancellation Prediction\Data\SMOTE_Data"
os.makedirs(SAVE_PATH, exist_ok=True)

# =========================
# SMOTE SAMPLING
# =========================
print("üöÄ SMOTE bilan oversampling boshlanmoqda...")

smote = SMOTE(random_state=42)
X_train_res, y_train_res = smote.fit_resample(X_train_selected, y_train)

print("‚úÖ SMOTE tugadi")
print(f"Oldingi train shape: {X_train_selected.shape}")
print(f"SMOTE keyingi train shape: {X_train_res.shape}")

# =========================
# CSV GA SAQLASH
# =========================
X_train_res.to_csv(os.path.join(SAVE_PATH, "X_train_selected_smote.csv"), index=False)
pd.DataFrame(y_train_res, columns=['is_canceled']).to_csv(
    os.path.join(SAVE_PATH, "y_train_smote.csv"), index=False
)

print(f"‚úÖ SMOTE bilan balanslangan train dataset CSV ga saqlandi: {SAVE_PATH}")

üöÄ SMOTE bilan oversampling boshlanmoqda...
‚úÖ SMOTE tugadi
Oldingi train shape: (95512, 34)
SMOTE keyingi train shape: (120518, 34)
‚úÖ SMOTE bilan balanslangan train dataset CSV ga saqlandi: C:\Users\Rasulbek907\Desktop\Hotel Booking Cancellation Prediction\Data\SMOTE_Data
