# 04 - Ray: Distributed ML Pipeline

Bu notebook'ta Ray ile distributed ML pipeline olusturacagiz.

**Ray Ozellikleri:**
- @ray.remote ile kolay paralellestirme
- Ray Train: Distributed model training
- Ray Tune: Hyperparameter optimization
- Ray Data: Large-scale data processing

**Kullanim Alani:** NYC Taxi ucret tahmini (Fare Prediction)

**Veri Seti:** NYC Yellow Taxi 2023 (12 ay, ~40M satir)

## 1. Kurulum

In [1]:
# Ray ve ML kutuphaneleri kurulumu
!pip install "ray[default]" scikit-learn xgboost pyarrow -q

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.2/102.2 kB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.8/2.8 MB[0m [31m117.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.0/6.0 MB[0m [31m155.2 MB/s[0m eta [36m0:00:00[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m201.3/201.3 kB[0m [31m24.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m128.2/128.2 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m72.3/72.3 MB[0m [31m35.9 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.0/469.0 kB[0m [31m46.0 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import ray
import numpy as np
import pandas as pd
import time
import json
import os
import psutil
import gc
import urllib.request
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

print(f"Ray version: {ray.__version__}")
print(f"CPU count: {os.cpu_count()}")

Ray version: 2.52.0
CPU count: 12


In [3]:
# Benchmark fonksiyonlari
results = {
    'framework': 'ray',
    'dataset': 'nyc_taxi_12_months',
    'operations': {}
}

def get_memory_mb():
    process = psutil.Process(os.getpid())
    return process.memory_info().rss / 1024 / 1024

def benchmark(func, name):
    gc.collect()
    mem_before = get_memory_mb()
    start = time.time()
    result = func()
    end = time.time()
    mem_after = get_memory_mb()
    
    duration = end - start
    mem_used = mem_after - mem_before
    
    results['operations'][name] = {
        'duration_sec': round(duration, 3),
        'memory_mb': round(mem_used, 2)
    }
    
    print(f"\n{'='*50}")
    print(f"Operation: {name}")
    print(f"Sure: {duration:.3f} saniye")
    print(f"Bellek: {mem_used:.2f} MB")
    print(f"{'='*50}")
    
    return result

In [4]:
# Veri indirme - 12 ay
DATA_DIR = 'data'
os.makedirs(DATA_DIR, exist_ok=True)

MONTHS = ['2023-01', '2023-02', '2023-03', '2023-04', '2023-05', '2023-06',
          '2023-07', '2023-08', '2023-09', '2023-10', '2023-11', '2023-12']
BASE_URL = "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_{}.parquet"

taxi_files = []
total_size = 0

for month in MONTHS:
    filename = f"yellow_tripdata_{month}.parquet"
    filepath = os.path.join(DATA_DIR, filename)
    taxi_files.append(filepath)
    
    if not os.path.exists(filepath):
        url = BASE_URL.format(month)
        print(f"Indiriliyor: {filename}...")
        urllib.request.urlretrieve(url, filepath)
        print(f"Indirildi: {filename}")
    else:
        print(f"Mevcut: {filename}")
    
    total_size += os.path.getsize(filepath)

print(f"\nToplam dosya boyutu: {total_size / 1024**2:.1f} MB")
print(f"Dosya sayisi: {len(taxi_files)}")

Mevcut: yellow_tripdata_2023-01.parquet
Mevcut: yellow_tripdata_2023-02.parquet
Mevcut: yellow_tripdata_2023-03.parquet
Mevcut: yellow_tripdata_2023-04.parquet
Mevcut: yellow_tripdata_2023-05.parquet
Mevcut: yellow_tripdata_2023-06.parquet
Mevcut: yellow_tripdata_2023-07.parquet
Mevcut: yellow_tripdata_2023-08.parquet
Mevcut: yellow_tripdata_2023-09.parquet
Mevcut: yellow_tripdata_2023-10.parquet
Mevcut: yellow_tripdata_2023-11.parquet
Mevcut: yellow_tripdata_2023-12.parquet

Toplam dosya boyutu: 606.3 MB
Dosya sayisi: 12


## 2. Ray Baslat

In [5]:
# Ray'i baslat
if ray.is_initialized():
    ray.shutdown()

ray.init(ignore_reinit_error=True)

print(f"\nRAY CLUSTER BILGISI")
print("="*50)
print(f"Nodes: {len(ray.nodes())}")
print(f"CPUs: {ray.cluster_resources().get('CPU', 0)}")
print(f"Memory: {ray.cluster_resources().get('memory', 0) / 1024**3:.1f} GB")

2025-11-26 00:41:43,449	INFO worker.py:2014 -- Started a local Ray instance. View the dashboard at [1m[32mhttp://127.0.0.1:8265 [39m[22m



RAY CLUSTER BILGISI
Nodes: 1
CPUs: 12.0
Memory: 58.0 GB




## 3. @ray.remote ile Paralel Veri Yukleme

In [6]:
# Ray remote fonksiyonu - paralel dosya yukleme ve istatistik
@ray.remote
def load_and_process_file(file_path):
    """Tek bir dosyayi yukle ve isle"""
    df = pd.read_parquet(file_path)
    
    # Temel istatistikler
    stats = {
        'file': os.path.basename(file_path),
        'rows': len(df),
        'avg_fare': df['fare_amount'].mean(),
        'avg_distance': df['trip_distance'].mean(),
        'avg_tip': df['tip_amount'].mean(),
        'total_revenue': df['total_amount'].sum()
    }
    return stats

In [7]:
# Paralel istatistik hesapla
def parallel_stats():
    # Tum dosyalari paralel isle
    futures = [load_and_process_file.remote(f) for f in taxi_files]
    results_list = ray.get(futures)
    return pd.DataFrame(results_list)

df_stats = benchmark(parallel_stats, 'parallel_file_stats')
print("\nAylik Istatistikler (Paralel Yukleme):")
df_stats


Operation: parallel_file_stats
Sure: 3.146 saniye
Bellek: 6.22 MB

Aylik Istatistikler (Paralel Yukleme):


Unnamed: 0,file,rows,avg_fare,avg_distance,avg_tip,total_revenue
0,yellow_tripdata_2023-01.parquet,3066766,18.367069,3.847342,3.367941,82865190.0
1,yellow_tripdata_2023-02.parquet,2913955,18.220381,3.868058,3.384825,78380970.0
2,yellow_tripdata_2023-03.parquet,3403766,18.908448,3.903871,3.495237,94636360.0
3,yellow_tripdata_2023-04.parquet,3288250,19.360578,4.096176,3.512068,92957240.0
4,yellow_tripdata_2023-05.parquet,3513649,19.876871,4.345816,3.609887,101765800.0
5,yellow_tripdata_2023-06.parquet,3307234,19.98804,4.36879,3.594915,96137080.0
6,yellow_tripdata_2023-07.parquet,2907108,19.703195,4.489381,3.446826,83049820.0
7,yellow_tripdata_2023-08.parquet,2824209,19.718527,4.782808,3.410652,80851980.0
8,yellow_tripdata_2023-09.parquet,2846722,20.671109,4.274268,3.625289,84780900.0
9,yellow_tripdata_2023-10.parquet,3522285,20.06174,3.926695,3.632724,102749900.0


In [8]:
# Toplam satir sayisi
total_rows = df_stats['rows'].sum()
print(f"\nToplam satir: {total_rows:,}")
print(f"Toplam gelir: ${df_stats['total_revenue'].sum():,.0f}")


Toplam satir: 38,310,226
Toplam gelir: $1,090,383,412


## 4. ML icin Veri Hazirlama

Bellek tasarrufu icin 1 aylik veri ile model egitecegiz.

In [9]:
# 1 aylik veri yukle (ML icin yeterli)
def load_sample_data():
    df = pd.read_parquet(taxi_files[0])  # Ocak 2023
    return df

df_sample = benchmark(load_sample_data, 'load_sample_data')
print(f"\nSample veri: {len(df_sample):,} satir")


Operation: load_sample_data
Sure: 0.316 saniye
Bellek: 566.91 MB

Sample veri: 3,066,766 satir


In [10]:
# Feature Engineering
def prepare_features():
    df = df_sample.copy()
    
    # Datetime features
    df['pickup_hour'] = df['tpep_pickup_datetime'].dt.hour
    df['pickup_dayofweek'] = df['tpep_pickup_datetime'].dt.dayofweek
    df['pickup_day'] = df['tpep_pickup_datetime'].dt.day
    
    # Is weekend?
    df['is_weekend'] = (df['pickup_dayofweek'] >= 5).astype(int)
    
    # Rush hour? (7-9, 17-19)
    df['is_rush_hour'] = df['pickup_hour'].apply(
        lambda x: 1 if (7 <= x <= 9) or (17 <= x <= 19) else 0
    )
    
    # Feature selection
    feature_cols = [
        'trip_distance', 'pickup_hour', 'pickup_dayofweek', 
        'is_weekend', 'is_rush_hour', 'PULocationID', 'DOLocationID',
        'passenger_count'
    ]
    
    # Target
    target_col = 'fare_amount'
    
    # Temizlik
    df = df.dropna(subset=feature_cols + [target_col])
    df = df[(df['fare_amount'] > 0) & (df['fare_amount'] < 200)]
    df = df[(df['trip_distance'] > 0) & (df['trip_distance'] < 50)]
    df = df[df['passenger_count'] > 0]
    
    return df[feature_cols], df[target_col], feature_cols

X, y, feature_cols = benchmark(prepare_features, 'feature_engineering')
print(f"\nFeature shape: {X.shape}")
print(f"Features: {feature_cols}")


Operation: feature_engineering
Sure: 2.979 saniye
Bellek: 740.66 MB

Feature shape: (2883426, 8)
Features: ['trip_distance', 'pickup_hour', 'pickup_dayofweek', 'is_weekend', 'is_rush_hour', 'PULocationID', 'DOLocationID', 'passenger_count']


In [11]:
# Train/Test Split
def split_data():
    # Daha kucuk sample al (hiz icin)
    sample_size = min(500000, len(X))
    indices = np.random.choice(len(X), sample_size, replace=False)
    
    X_sampled = X.iloc[indices]
    y_sampled = y.iloc[indices]
    
    return train_test_split(
        X_sampled, y_sampled, test_size=0.2, random_state=42
    )

X_train, X_test, y_train, y_test = benchmark(split_data, 'train_test_split')
print(f"\nTrain: {len(X_train):,} samples")
print(f"Test: {len(X_test):,} samples")


Operation: train_test_split
Sure: 0.234 saniye
Bellek: 0.34 MB

Train: 400,000 samples
Test: 100,000 samples


## 5. Baseline Model

In [12]:
# Baseline: RandomForest
def train_baseline():
    model = RandomForestRegressor(
        n_estimators=100,
        max_depth=10,
        random_state=42,
        n_jobs=-1
    )
    model.fit(X_train, y_train)
    
    y_pred = model.predict(X_test)
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    mae = mean_absolute_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    return {'rmse': rmse, 'mae': mae, 'r2': r2, 'model': model}

baseline_result = benchmark(train_baseline, 'baseline_training')
print(f"\nBaseline Model Sonuclari:")
print(f"  RMSE: ${baseline_result['rmse']:.2f}")
print(f"  MAE:  ${baseline_result['mae']:.2f}")
print(f"  R2:   {baseline_result['r2']:.4f}")


Operation: baseline_training
Sure: 10.381 saniye
Bellek: 161.74 MB

Baseline Model Sonuclari:
  RMSE: $3.78
  MAE:  $1.81
  R2:   0.9481


## 6. Ray ile Paralel Hyperparameter Tuning

In [13]:
# Veriyi Ray object store'a koy
X_train_ref = ray.put(X_train.values)
y_train_ref = ray.put(y_train.values)
X_test_ref = ray.put(X_test.values)
y_test_ref = ray.put(y_test.values)

print("Veri Ray object store'a yuklendi")

Veri Ray object store'a yuklendi


In [14]:
# Ray remote ile model training
@ray.remote
def train_model_with_params(params, X_train, y_train, X_test, y_test):
    """Belirli parametrelerle model egit ve degerlendir"""
    model = RandomForestRegressor(
        n_estimators=params['n_estimators'],
        max_depth=params['max_depth'],
        min_samples_split=params['min_samples_split'],
        random_state=42,
        n_jobs=1  # Ray zaten paralellestiriyor
    )
    
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    
    rmse = np.sqrt(mean_squared_error(y_test, y_pred))
    mae = mean_absolute_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    
    return {
        'params': params,
        'rmse': rmse,
        'mae': mae,
        'r2': r2
    }

In [15]:
# Hyperparameter grid
param_grid = [
    {'n_estimators': 50, 'max_depth': 5, 'min_samples_split': 2},
    {'n_estimators': 50, 'max_depth': 10, 'min_samples_split': 2},
    {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 2},
    {'n_estimators': 100, 'max_depth': 10, 'min_samples_split': 2},
    {'n_estimators': 100, 'max_depth': 15, 'min_samples_split': 5},
    {'n_estimators': 150, 'max_depth': 10, 'min_samples_split': 2},
    {'n_estimators': 150, 'max_depth': 15, 'min_samples_split': 5},
    {'n_estimators': 200, 'max_depth': 10, 'min_samples_split': 2},
]

print(f"Test edilecek parametre kombinasyonu: {len(param_grid)}")

Test edilecek parametre kombinasyonu: 8


In [16]:
# Paralel hyperparameter search
def parallel_hyperparameter_search():
    # Tum modelleri paralel calistir
    futures = [
        train_model_with_params.remote(
            params, X_train_ref, y_train_ref, X_test_ref, y_test_ref
        )
        for params in param_grid
    ]
    # Sonuclari topla
    results_list = ray.get(futures)
    return results_list

print("Paralel hyperparameter search basliyor...")
tuning_results = benchmark(parallel_hyperparameter_search, 'parallel_tuning')

# En iyi modeli bul
best_result = min(tuning_results, key=lambda x: x['rmse'])
print(f"\nEn iyi parametreler: {best_result['params']}")
print(f"En iyi RMSE: ${best_result['rmse']:.2f}")
print(f"En iyi R2: {best_result['r2']:.4f}")

Paralel hyperparameter search basliyor...

Operation: parallel_tuning
Sure: 165.651 saniye
Bellek: 0.37 MB

En iyi parametreler: {'n_estimators': 150, 'max_depth': 15, 'min_samples_split': 5}
En iyi RMSE: $3.73
En iyi R2: 0.9497


In [17]:
# Tum sonuclari goster
print("\nTUM HYPERPARAMETER SONUCLARI")
print("="*80)
print(f"{'n_est':<8} {'depth':<8} {'split':<8} {'RMSE ($)':<12} {'MAE ($)':<12} {'R2':<12}")
print("-"*80)

sorted_results = sorted(tuning_results, key=lambda x: x['rmse'])
for r in sorted_results:
    p = r['params']
    print(f"{p['n_estimators']:<8} {p['max_depth']:<8} {p['min_samples_split']:<8} {r['rmse']:<12.2f} {r['mae']:<12.2f} {r['r2']:<12.4f}")


TUM HYPERPARAMETER SONUCLARI
n_est    depth    split    RMSE ($)     MAE ($)      R2          
--------------------------------------------------------------------------------
150      15       5        3.73         1.72         0.9497      
100      15       5        3.73         1.72         0.9495      
50       10       2        3.78         1.81         0.9481      
100      10       2        3.78         1.81         0.9481      
150      10       2        3.78         1.81         0.9481      
200      10       2        3.79         1.81         0.9480      
100      5        2        4.12         2.11         0.9384      
50       5        2        4.12         2.11         0.9384      


## 7. Seri vs Paralel Karsilastirma

In [18]:
# Seri hyperparameter search (karsilastirma icin)
def serial_hyperparameter_search():
    results_list = []
    for params in param_grid:
        model = RandomForestRegressor(
            n_estimators=params['n_estimators'],
            max_depth=params['max_depth'],
            min_samples_split=params['min_samples_split'],
            random_state=42,
            n_jobs=1
        )
        model.fit(X_train.values, y_train.values)
        y_pred = model.predict(X_test.values)
        rmse = np.sqrt(mean_squared_error(y_test, y_pred))
        results_list.append({'params': params, 'rmse': rmse})
    return results_list

print("Seri hyperparameter search basliyor...")
serial_results = benchmark(serial_hyperparameter_search, 'serial_tuning')

Seri hyperparameter search basliyor...

Operation: serial_tuning
Sure: 615.992 saniye
Bellek: 5.67 MB


In [19]:
# Karsilastirma
parallel_time = results['operations']['parallel_tuning']['duration_sec']
serial_time = results['operations']['serial_tuning']['duration_sec']

print(f"\nSERI vs PARALEL KARSILASTIRMA")
print(f"="*50)
print(f"Seri:    {serial_time:.2f}s")
print(f"Paralel: {parallel_time:.2f}s")
print(f"Hizlanma: {serial_time/parallel_time:.1f}x")


SERI vs PARALEL KARSILASTIRMA
Seri:    615.99s
Paralel: 165.65s
Hizlanma: 3.7x


## 8. Final Model ve Feature Importance

In [20]:
# En iyi parametrelerle final model
final_model = RandomForestRegressor(
    **best_result['params'],
    random_state=42,
    n_jobs=-1
)
final_model.fit(X_train, y_train)

# Feature importance
importance_df = pd.DataFrame({
    'feature': feature_cols,
    'importance': final_model.feature_importances_
}).sort_values('importance', ascending=False)

print("\nFEATURE IMPORTANCE")
print("="*50)
for _, row in importance_df.iterrows():
    bar = '#' * int(row['importance'] * 50)
    print(f"{row['feature']:<20} {row['importance']:.3f} {bar}")


FEATURE IMPORTANCE
trip_distance        0.963 ################################################
DOLocationID         0.018 
PULocationID         0.008 
pickup_hour          0.006 
pickup_dayofweek     0.003 
passenger_count      0.001 
is_rush_hour         0.000 
is_weekend           0.000 


In [21]:
# Ornekler uzerinde tahmin
print("\nORNEK TAHMINLER")
print("="*70)

sample_idx = np.random.choice(len(X_test), 10, replace=False)
sample_X = X_test.iloc[sample_idx]
sample_y = y_test.iloc[sample_idx]
sample_pred = final_model.predict(sample_X)

print(f"{'Distance':<12} {'Hour':<8} {'Actual ($)':<12} {'Predicted ($)':<14} {'Error ($)':<10}")
print("-"*70)
for i in range(10):
    error = abs(sample_y.iloc[i] - sample_pred[i])
    print(f"{sample_X.iloc[i]['trip_distance']:<12.2f} {int(sample_X.iloc[i]['pickup_hour']):<8} {sample_y.iloc[i]:<12.2f} {sample_pred[i]:<14.2f} {error:<10.2f}")


ORNEK TAHMINLER
Distance     Hour     Actual ($)   Predicted ($)  Error ($) 
----------------------------------------------------------------------
2.00         16       13.50        14.68          1.18      
10.01        21       42.90        41.75          1.15      
0.97         16       10.00        9.56           0.44      
0.62         9        7.90         7.15           0.75      
0.91         19       7.20         8.07           0.87      
0.40         21       6.50         5.40           1.10      
2.42         17       17.70        16.51          1.19      
3.82         11       26.10        23.88          2.22      
0.70         23       5.80         6.69           0.89      
2.44         10       17.00        17.76          0.76      


## 9. Sonuclari Kaydet

In [22]:
results['total_memory_mb'] = round(get_memory_mb(), 2)
results['total_memory_gb'] = round(get_memory_mb() / 1024, 2)
results['row_count'] = int(total_rows)
results['best_params'] = best_result['params']
results['best_rmse'] = round(best_result['rmse'], 2)
results['best_r2'] = round(best_result['r2'], 4)
results['speedup'] = round(serial_time / parallel_time, 2)

os.makedirs('results', exist_ok=True)
with open('results/ray_benchmark.json', 'w') as f:
    json.dump(results, f, indent=2)

print("Sonuclar kaydedildi: results/ray_benchmark.json")
print("\n" + json.dumps(results, indent=2))

Sonuclar kaydedildi: results/ray_benchmark.json

{
  "framework": "ray",
  "dataset": "nyc_taxi_12_months",
  "operations": {
    "parallel_file_stats": {
      "duration_sec": 3.146,
      "memory_mb": 6.22
    },
    "load_sample_data": {
      "duration_sec": 0.316,
      "memory_mb": 566.91
    },
    "feature_engineering": {
      "duration_sec": 2.979,
      "memory_mb": 740.66
    },
    "train_test_split": {
      "duration_sec": 0.234,
      "memory_mb": 0.34
    },
    "baseline_training": {
      "duration_sec": 10.381,
      "memory_mb": 161.74
    },
    "parallel_tuning": {
      "duration_sec": 165.651,
      "memory_mb": 0.37
    },
    "serial_tuning": {
      "duration_sec": 615.992,
      "memory_mb": 5.67
    }
  },
  "total_memory_mb": 2024.96,
  "total_memory_gb": 1.98,
  "row_count": 38310226,
  "best_params": {
    "n_estimators": 150,
    "max_depth": 15,
    "min_samples_split": 5
  },
  "best_rmse": 3.73,
  "best_r2": 0.9497,
  "speedup": 3.72
}


## 10. Ray Ozet

In [23]:
print("""
RAY AVANTAJLARI
===============

1. KOLAY PARALELLESTIRME
   - @ray.remote dekoratoru
   - ray.get() ile sonuc al
   - Herhangi bir Python kodu paralel calisir

2. OBJECT STORE
   - ray.put() ile veri paylasimi
   - Zero-copy data sharing
   - Worker'lar arasi verimli iletisim

3. RAY TRAIN
   - Distributed model training
   - XGBoost, PyTorch, TensorFlow destegi
   - Checkpointing

4. RAY TUNE
   - Hyperparameter optimization
   - Grid, Random, Bayesian search
   - Early stopping, scheduling

5. RAY SERVE
   - Model serving at scale
   - A/B testing
   - Batch inference

NE ZAMAN RAY?
=============
- ML model training/tuning
- Hyperparameter optimization
- Distributed Python uygulamalari
- Model serving at scale
- Reinforcement learning

NE ZAMAN DASK?
==============
- DataFrame islemleri
- ETL pipelines
- Pandas-like workloads

NE ZAMAN POLARS?
================
- Single-node maximum speed
- Memory efficiency
- Drop-in Pandas replacement
""")


RAY AVANTAJLARI

1. KOLAY PARALELLESTIRME
   - @ray.remote dekoratoru
   - ray.get() ile sonuc al
   - Herhangi bir Python kodu paralel calisir

2. OBJECT STORE
   - ray.put() ile veri paylasimi
   - Zero-copy data sharing
   - Worker'lar arasi verimli iletisim

3. RAY TRAIN
   - Distributed model training
   - XGBoost, PyTorch, TensorFlow destegi
   - Checkpointing

4. RAY TUNE
   - Hyperparameter optimization
   - Grid, Random, Bayesian search
   - Early stopping, scheduling

5. RAY SERVE
   - Model serving at scale
   - A/B testing
   - Batch inference

NE ZAMAN RAY?
- ML model training/tuning
- Hyperparameter optimization
- Distributed Python uygulamalari
- Model serving at scale
- Reinforcement learning

NE ZAMAN DASK?
- DataFrame islemleri
- ETL pipelines
- Pandas-like workloads

NE ZAMAN POLARS?
- Single-node maximum speed
- Memory efficiency
- Drop-in Pandas replacement



In [24]:
# Ray'i kapat
ray.shutdown()
print("Ray kapatildi.")

Ray kapatildi.


---

## Sonraki Adim

Tum framework'lerin karsilastirmasi:

-> `05_karsilastirma.ipynb`