# ML da PIPELINE -  bu hamma bosqichlarni tartibga soluvchi va avtomotlashtiruvchi zanjir hisoblanadi(workflow)

# PIPELINE nima uchun muhim?

1) kodlar tartibli boladi(Har bir boshqich alohida bloqlarda boladi)
2) takrorlanadigan jarayonlar avtomatlashtiriladi(masalan: scaler, encoder, har safar ozi ishlaydi)
3) data Leakage ning oldi olinadi (scaler faqat trainingga orgatiladi -- testda qollaniladi)
4) Modelni diploying qilish osonlashadi (prod da ham ayni pipeline ishlatiladi)
5) Hyperparametr tuning soddalashadi (GridSearchCV - Pipeline juda qulay )

# Pipeline turlari

1) Manual 
2) Auto
3) Manual +Auto Combination

# Pipelinening mavjud tartiblari

1) Data Preprocessing Pipeline :: missing values imputation // encoding // scaling // feature engineering // balancing // outlier removal
2) Modeling Pipeline  :: preprocessing + mmodelni birlashtirish
3) MLOps Pipeline (Production pipeline)  :: data ingestion // data validation// feature store // model training // model tuning // continiuous training (CT) // continiuous deployment (CD) // Minetoring drifting (tizimlari :: Airflow // Perfect // MLflow // Kuberflow)

# SCIKIT LEARN + To'liq PIPELINE amaliyot

In [20]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.impute import SimpleImputer


df = pd.read_csv(
    r"C:\Users\Jahongir\desktop\practise\Data\Row_Data\employee_promotion.csv"
)

# TARGET
y = df["recruitment_channel"]
X = df.drop("recruitment_channel", axis=1)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

numeric_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="median")),
    ("scaler", StandardScaler())
])

categorical_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="most_frequent")),
    ("onehot", OneHotEncoder(handle_unknown="ignore"))
])

numeric_cols = X.select_dtypes(include=["int64", "float64"]).columns
cat_cols = X.select_dtypes(include=["object", "category"]).columns


preprocess = ColumnTransformer(
    transformers=[
        ("num", numeric_transformer, numeric_cols),
        ("cat", categorical_transformer, cat_cols)
    ]
)

pipe = Pipeline([
    ("prep", preprocess),
    ("model", GradientBoostingClassifier(random_state=42))
])

params = {
    "model__n_estimators": [50, 100],
    "model__learning_rate": [0.01, 0.1]
}

grid = GridSearchCV(
    pipe,
    params,
    cv=5,
    scoring="accuracy",
    n_jobs=-1
)

grid.fit(X_train, y_train)


print("Best Params:", grid.best_params_)
print("Best CV Score:", grid.best_score_)
print("Test Accuracy:", grid.score(X_test, y_test))


Best Params: {'model__learning_rate': 0.1, 'model__n_estimators': 50}
Best CV Score: 0.5544405586540021
Test Accuracy: 0.558474730888524


In [None]:
# REAL LOYIHALARDA  PIPELINE ARXITEKTURASI


>>>> Raw Data 

>>>> Data Cleaning (missing value, duplicate)

>>>> Feature engineering

>>>> Training Pipeline

>>>> Modul Registry (MLFlow)

>>>> Deployment(Docker, FastApi)


>>>> Monitoring + Drift Detection



# PIPELINE ishlatiladigan joylar


| Soha | Misol (Pipeline) |
|------|------------------|
| Fraud Detection | kelayotgan tranzaksiya → pipeline → model |
| Recommendation System | user event → pipeline → ranking |
| NLP (Natural Language Processing) | text cleaning → tokenizer → model |
| CV (Computer Vision) | image resize → augmentation → model |
| Banking | scoring pipeline |
| Industry | predictive maintenance |


# ML jarayonini avtomatlashtirish va tartibga solishni ta'minlaydigan tizim

In [21]:
# Manual Pipeline -- bu qo'lda yaratiladigan pipeline
# AutoML Pipeline -- modelni avtomatik tanlaydigan pipeline


In [23]:
# # Manual Pipeline -- bunda dasturchi hamma jarayonlarni qolda belgilaydi

# 1) preprocessing
# 2) Feature Engineering
# 3) Scaling
# 4) Encoding
# 5) Modeni tanlash
# 6) Hyperparametr tuning
# 7) training 
# 8) Prediction

In [25]:
# #  Manual Pipeline nima uchun afzal?

# 1) Toliq nazorat dasturchida 
# 2) xoxlagan transformatsiyani qollay olish imkoniyati
# 3) kaggle real loyihalarida keng qollaniladi
# 4) jarayon davomida debugging oson kechadi
# 5) 

In [27]:
# # Kamchiliklari ---

# 1) kop vaqt ketadi
# 2) kod uzun boladi 
# 3) tuning qolda qilinadi


# Manual Pipeline Amaliyot

In [28]:
# preprocessingni bloqini yaratish 

In [29]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer

numeric_cols = [
    "no_of_trainings",
    "employee_id",
    "age",
    "previous_year_rating",
    "length_of_service",
    "awards_won",
    "avg_training_score"
]

cat_cols = [
    "department",
    "region",
    "education",
    "gender"
]

preprocess = ColumnTransformer([
    ("num", StandardScaler(), numeric_cols),
    ("cat", OneHotEncoder(handle_unknown="ignore"), cat_cols)
])



In [30]:
# pipeline uchun ML model yuklash

In [33]:
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier

pipe = Pipeline([

    ("prep", preprocess),
    ("model", RandomForestClassifier())
])

In [34]:
# train bosqichi


In [None]:
pipe.fit(x_train, y_train)

In [36]:
# predict

In [None]:
pred = pipe.predict(x_test)

In [38]:
# tuning

In [None]:
from sklearn.model_selection import GridSearchCV

params = {
    "model__n_estimators": [ 50, 100, 200],
    "model__max_depth": [3,5,10]
}

grid = GridSearchCV(pipe, params, cv=5)
grid.fit(x_train, y_train)

print(grid.best_params_)
print(grid.best_score_)

# AVTO PIPELINE

In [40]:
# # AVTO Pipeline -- bunda  hamma ML jarayonlarni Toliq avtomatlashtiradi

# 1) preprocessing + AVTOMATIK
# 2) Feature Engineering + AVTOMATIK
# 3) Scaling + AVTOMATIK
# 4) Encoding + AVTOMATIK
# 5) Modeni tanlash + AVTOMATIK
# 6) Hyperparametr tuning + AVTOMATIK
# 7) training + AVTOMATIK
# 8) Prediction + AVTOMATIK


# biz faqat data beramiz qolgan hamma jarayonni AUTO PIPELINE BAJARADI

In [43]:
# # AUTOPIPELINE NING BIR NECHTA TURLARI MAVJUD

# 1) AVTO_SKLEARN
# 2) TPOT
# 3) H2O AUTO ML
# 4) MLJAR
# 5) GOOGLE VERTEX AUTO ML
# 6) AZURE AUTO ML


# ENG KOP OMMABOP BOLGANI AUTOSKLEARN VA TPOT

In [44]:
# AUTO ML PIPELINE -- AUTO - SKLEARN BILAN ISHLASH

In [None]:
import autosklearn.classification as ask
from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2)

automl = ask.AutoSklearnClassifier(
    time_left_for_this_task = 120,
    per_run_time = 30
)


automl.fit(x_train, y_train)
pred = automl.predict(x_test)

print("Accuracy:", accuracy_score(y_test, pred))

print(automl.leaderboard())



# Auto ML TPOT 

In [47]:
!pip install Tpot

Collecting tpot
  Downloading TPOT-1.1.0-py3-none-any.whl.metadata (1.9 kB)
Collecting update-checker>=0.16 (from tpot)
  Downloading update_checker-0.18.0-py3-none-any.whl.metadata (2.3 kB)
Collecting tqdm>=4.36.1 (from tpot)
  Downloading tqdm-4.67.3-py3-none-any.whl.metadata (57 kB)
Collecting stopit>=1.1.1 (from tpot)
  Downloading stopit-1.1.2.tar.gz (18 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting optuna>=3.0.5 (from tpot)
  Downloading optuna-4.7.0-py3-none-any.whl.metadata (17 kB)
Collecting networkx>=3.0 (from tpot)
  Downloading networkx-3.6.1-py3-none-any.whl.metadata (6.8 kB)
Collecting dask>=2024.4.2 (from tpot)
  Downloading dask-2026.1.2-py3-none-any.whl.metadata (3.8 kB)
Coll


[notice] A new release of pip is available: 25.3 -> 26.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [None]:
from tpot import TPOTClassifeir
tpot = TPOTClassifeir(
    generations = 5,
    population_size = 20,
    cv = 5,
    verbosity = 2

)
tpot.fit(x_train, y_train)
pritn(tpot.score(x_test, y_test))

tpot.export("best_pipline.py")


In [48]:
# TPOT generatic algoritm ishlatadi shuning uchun kuchli


# Manual PIPLINE va  Auto PIPLINE --- Taqqoslash

| Mezoni | Manual Pipeline | Auto-ML Pipeline |
|------|----------------|------------------|
| Preprocessing | qo‘lda | avtomatik |
| Feature engineering | qo‘lda | avtomatik |
| Algoritm tanlash | qo‘lda | avtomatik |
| Hyperparameter tuning | qo‘lda | avtomatik |
| Model tanlash | qo‘lda | avtomatik |
| Kontrolle | 100% | pastroq |
| Yaratish tezligi | sekin | juda tez |
| Katta dataset | yaxshi | sekinlashadi |
| Professional ML Engineer uchun | juda mos | uncha emas |
| Newbie uchun | qiyin | juda qulay |


In [49]:
# Tpot bu --- Tree based Pipeline Optimization Tool

# Qaysi Pipline qayerda va qachon ishlatish qulay

# 1) manual pipeline 
# --- modelni productionga qoyilayotgan paytda
# --- maxsus preprocessing bolsa
# --- feature engineering bolsa
# --- kaggle top compitetion
# --- ml engineer sifatida

# 2) auto pipline
# --- tezda baseline tayyorlash 
# --- ML bilimi kam bolsa
# --- 100 ta modelni birma-bir tekshirmaslik uchun 
# --- tadqiqot bosqichida 
# --- analitik yoki BI mutaxassislar uchun 

