# 01 - Pandas Baseline: Limitleri ve Performans Analizi

Bu notebook'ta Pandas'in limitlerini **NYC Yellow Taxi** veri seti uzerinde gorecegiz.

**NYC Taxi Dataset (12 Ay - Tam Yil):**
- Kaynak: NYC Taxi & Limousine Commission
- Donem: 2023 Ocak - Aralik (tam yil)
- Toplam: ~40M+ satir, ~8GB+ bellek
- Gercek dunya big data senaryosu

**Amac:**
- Pandas'in tek thread, in-memory limitlerini gostermek
- Baseline olcumler almak (sure, bellek)


## 1. Kurulum ve Veri Indirme

In [1]:
# Gerekli kutuphaneler
import pandas as pd
import numpy as np
import time
import json
import os
import psutil
import gc
import urllib.request
from IPython.display import display, HTML

print(f"Pandas version: {pd.__version__}")

Pandas version: 2.2.2


In [2]:
# Benchmark sonuclarini saklayacagimiz sozluk
results = {
    'framework': 'pandas',
    'dataset': 'nyc_taxi_12_months',
    'operations': {}
}

def get_memory_mb():
    """Mevcut bellek kullanimini MB cinsinden dondur"""
    process = psutil.Process(os.getpid())
    return process.memory_info().rss / 1024 / 1024

def benchmark(func, name):
    """Bir fonksiyonun calisma suresini ve bellek kullanimini olc"""
    gc.collect()
    mem_before = get_memory_mb()
    start = time.time()
    result = func()
    end = time.time()
    mem_after = get_memory_mb()
    
    duration = end - start
    mem_used = mem_after - mem_before
    
    results['operations'][name] = {
        'duration_sec': round(duration, 3),
        'memory_mb': round(mem_used, 2)
    }
    
    print(f"\n{'='*50}")
    print(f"Operation: {name}")
    print(f"Sure: {duration:.3f} saniye")
    print(f"Bellek: {mem_used:.2f} MB")
    print(f"{'='*50}")
    
    return result

In [3]:
# NYC Taxi verisi - 12 ay (2023 Ocak - Aralik)
DATA_DIR = 'data'
os.makedirs(DATA_DIR, exist_ok=True)

# 12 aylik dosyalar (tam yil)
MONTHS = ['2023-01', '2023-02', '2023-03', '2023-04', '2023-05', '2023-06',
          '2023-07', '2023-08', '2023-09', '2023-10', '2023-11', '2023-12']
BASE_URL = "https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_{}.parquet"

taxi_files = []
total_size = 0

for month in MONTHS:
    filename = f"yellow_tripdata_{month}.parquet"
    filepath = os.path.join(DATA_DIR, filename)
    taxi_files.append(filepath)
    
    if not os.path.exists(filepath):
        url = BASE_URL.format(month)
        print(f"Indiriliyor: {filename}...")
        urllib.request.urlretrieve(url, filepath)
        print(f"Indirildi: {filename}")
    else:
        print(f"Mevcut: {filename}")
    
    total_size += os.path.getsize(filepath)

print(f"\nToplam dosya boyutu: {total_size / 1024**2:.1f} MB")
print(f"Dosya sayisi: {len(taxi_files)}")

Indiriliyor: yellow_tripdata_2023-01.parquet...
Indirildi: yellow_tripdata_2023-01.parquet
Indiriliyor: yellow_tripdata_2023-02.parquet...
Indirildi: yellow_tripdata_2023-02.parquet
Indiriliyor: yellow_tripdata_2023-03.parquet...
Indirildi: yellow_tripdata_2023-03.parquet
Indiriliyor: yellow_tripdata_2023-04.parquet...
Indirildi: yellow_tripdata_2023-04.parquet
Indiriliyor: yellow_tripdata_2023-05.parquet...
Indirildi: yellow_tripdata_2023-05.parquet
Indiriliyor: yellow_tripdata_2023-06.parquet...
Indirildi: yellow_tripdata_2023-06.parquet
Indiriliyor: yellow_tripdata_2023-07.parquet...
Indirildi: yellow_tripdata_2023-07.parquet
Indiriliyor: yellow_tripdata_2023-08.parquet...
Indirildi: yellow_tripdata_2023-08.parquet
Indiriliyor: yellow_tripdata_2023-09.parquet...
Indirildi: yellow_tripdata_2023-09.parquet
Indiriliyor: yellow_tripdata_2023-10.parquet...
Indirildi: yellow_tripdata_2023-10.parquet
Indiriliyor: yellow_tripdata_2023-11.parquet...
Indirildi: yellow_tripdata_2023-11.parquet

## 2. Veri Yukleme (Pandas)


In [4]:
# Bellek durumunu kontrol et
print("BELLEK DURUMU (Yukleme Oncesi)")
print("="*50)
print(f"Kullanilan: {get_memory_mb():.0f} MB")
print(f"Toplam sistem RAM: {psutil.virtual_memory().total / 1024**3:.1f} GB")
print(f"Kullanilabilir: {psutil.virtual_memory().available / 1024**3:.1f} GB")
print(f"\nTahmini gerekli bellek: ~8-12 GB")

BELLEK DURUMU (Yukleme Oncesi)
Kullanilan: 177 MB
Toplam sistem RAM: 83.5 GB
Kullanilabilir: 81.7 GB

Tahmini gerekli bellek: ~8-12 GB


In [5]:
# 12 aylik veriyi yukle
def load_all_taxi_data():
    dfs = []
    for i, filepath in enumerate(taxi_files):
        print(f"Yukleniyor: {os.path.basename(filepath)} ({i+1}/{len(taxi_files)})...")
        df_month = pd.read_parquet(filepath)
        dfs.append(df_month)
        print(f"  Satir: {len(df_month):,}, Bellek: {get_memory_mb():.0f} MB")
    
    print("\nBirlestiriliyor...")
    return pd.concat(dfs, ignore_index=True)

try:
    df = benchmark(load_all_taxi_data, 'load_data')
    print(f"\nToplam satir: {len(df):,}")
    print(f"Sutun sayisi: {len(df.columns)}")
    print(f"Bellek kullanimi: {df.memory_usage(deep=True).sum() / 1024**3:.2f} GB")
    DATA_LOADED = True
except MemoryError as e:
    print(f"\n{'!'*60}")
    print("OUTOFMEMORY HATASI!")
    print(f"{'!'*60}")
    print(f"\nPandas 12 aylik veriyi yukleyemedi!")
    print(f"Bu tam da gostermek istedigimiz limit.")
    print(f"\nCozum: Dask veya Polars kullanin!")
    DATA_LOADED = False

Yukleniyor: yellow_tripdata_2023-01.parquet (1/12)...
  Satir: 3,066,766, Bellek: 747 MB
Yukleniyor: yellow_tripdata_2023-02.parquet (2/12)...
  Satir: 2,913,955, Bellek: 1232 MB
Yukleniyor: yellow_tripdata_2023-03.parquet (3/12)...
  Satir: 3,403,766, Bellek: 1764 MB
Yukleniyor: yellow_tripdata_2023-04.parquet (4/12)...
  Satir: 3,288,250, Bellek: 2264 MB
Yukleniyor: yellow_tripdata_2023-05.parquet (5/12)...
  Satir: 3,513,649, Bellek: 2780 MB
Yukleniyor: yellow_tripdata_2023-06.parquet (6/12)...
  Satir: 3,307,234, Bellek: 3273 MB
Yukleniyor: yellow_tripdata_2023-07.parquet (7/12)...
  Satir: 2,907,108, Bellek: 3693 MB
Yukleniyor: yellow_tripdata_2023-08.parquet (8/12)...
  Satir: 2,824,209, Bellek: 4235 MB
Yukleniyor: yellow_tripdata_2023-09.parquet (9/12)...
  Satir: 2,846,722, Bellek: 4658 MB
Yukleniyor: yellow_tripdata_2023-10.parquet (10/12)...
  Satir: 3,522,285, Bellek: 5175 MB
Yukleniyor: yellow_tripdata_2023-11.parquet (11/12)...
  Satir: 3,339,715, Bellek: 5648 MB
Yukleniyo

In [6]:
if not DATA_LOADED:
    print("\n1 aylik veri ile devam ediliyor (demo icin)...")
    df = pd.read_parquet(taxi_files[0])
    results['dataset'] = 'nyc_taxi_1_month_fallback'
    print(f"Satir: {len(df):,}")

In [7]:
# Veri ozeti
df.head()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,airport_fee,Airport_fee
0,2,2023-01-01 00:32:10,2023-01-01 00:40:36,1.0,0.97,1.0,N,161,141,2,9.3,1.0,0.5,0.0,0.0,1.0,14.3,2.5,0.0,
1,2,2023-01-01 00:55:08,2023-01-01 01:01:27,1.0,1.1,1.0,N,43,237,1,7.9,1.0,0.5,4.0,0.0,1.0,16.9,2.5,0.0,
2,2,2023-01-01 00:25:04,2023-01-01 00:37:49,1.0,2.51,1.0,N,48,238,1,14.9,1.0,0.5,15.0,0.0,1.0,34.9,2.5,0.0,
3,1,2023-01-01 00:03:48,2023-01-01 00:13:25,0.0,1.9,1.0,N,138,7,1,12.1,7.25,0.5,0.0,0.0,1.0,20.85,0.0,1.25,
4,2,2023-01-01 00:10:29,2023-01-01 00:21:19,1.0,1.43,1.0,N,107,79,1,11.4,1.0,0.5,3.28,0.0,1.0,19.68,2.5,0.0,


In [8]:
# Veri tipleri
print("\nSUTUN BILGILERI")
print("="*60)
df.info()


SUTUN BILGILERI
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38310226 entries, 0 to 38310225
Data columns (total 20 columns):
 #   Column                 Dtype         
---  ------                 -----         
 0   VendorID               int64         
 1   tpep_pickup_datetime   datetime64[us]
 2   tpep_dropoff_datetime  datetime64[us]
 3   passenger_count        float64       
 4   trip_distance          float64       
 5   RatecodeID             float64       
 6   store_and_fwd_flag     object        
 7   PULocationID           int64         
 8   DOLocationID           int64         
 9   payment_type           int64         
 10  fare_amount            float64       
 11  extra                  float64       
 12  mta_tax                float64       
 13  tip_amount             float64       
 14  tolls_amount           float64       
 15  improvement_surcharge  float64       
 16  total_amount           float64       
 17  congestion_surcharge   float64       
 18  air

## 3. Temel Islemler

### 3.1 Filtreleme

In [9]:
def filter_trips():
    return df[
        (df['trip_distance'] > 5) &
        (df['fare_amount'] > 20) &
        (df['fare_amount'] < 500)
    ]

df_filtered = benchmark(filter_trips, 'filter_trips')
print(f"\nFiltrelenmis: {len(df_filtered):,} / {len(df):,}")
print(f"Oran: {len(df_filtered)/len(df)*100:.2f}%")


Operation: filter_trips
Sure: 1.267 saniye
Bellek: 1067.39 MB

Filtrelenmis: 6,542,088 / 38,310,226
Oran: 17.08%


### 3.2 GroupBy - Saatlik Analiz

In [10]:
def groupby_hour():
    df['pickup_hour'] = df['tpep_pickup_datetime'].dt.hour
    return df.groupby('pickup_hour').agg({
        'fare_amount': 'mean',
        'trip_distance': 'mean',
        'tip_amount': 'mean',
        'VendorID': 'count'
    }).rename(columns={'VendorID': 'trip_count'})

df_hourly = benchmark(groupby_hour, 'groupby_hour')
df_hourly


Operation: groupby_hour
Sure: 2.443 saniye
Bellek: 146.48 MB


Unnamed: 0_level_0,fare_amount,trip_distance,tip_amount,trip_count
pickup_hour,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,19.85742,4.031465,3.496476,1088628
1,18.010486,3.631036,3.17027,731321
2,16.919905,4.401059,2.939176,483366
3,17.911575,4.760022,3.011689,319641
4,23.55361,10.257194,3.652719,217492
5,27.447236,11.625794,4.242081,226411
6,22.958906,8.691477,3.617328,532181
7,19.473654,5.978029,3.34163,1044241
8,18.611942,4.447673,3.294112,1446062
9,18.616237,4.139941,3.305051,1632601


### 3.3 GroupBy - Aylik Analiz

In [11]:
def groupby_month():
    df['pickup_month'] = df['tpep_pickup_datetime'].dt.to_period('M')
    return df.groupby('pickup_month').agg({
        'fare_amount': ['mean', 'sum'],
        'trip_distance': 'mean',
        'tip_amount': 'sum',
        'VendorID': 'count'
    })

df_monthly = benchmark(groupby_month, 'groupby_month')
df_monthly


Operation: groupby_month
Sure: 3.600 saniye
Bellek: 292.59 MB


Unnamed: 0_level_0,fare_amount,fare_amount,trip_distance,tip_amount,VendorID
Unnamed: 0_level_1,mean,sum,mean,sum,count
pickup_month,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
2001-01,45.516667,273.1,10.793333,23.46,6
2002-12,42.254545,464.8,11.468182,53.17,11
2003-01,64.083333,384.5,13.215,27.2,6
2008-12,35.391304,814.0,7.849565,113.12,23
2009-01,14.753333,221.3,2.808,22.03,15
2014-11,33.1,33.1,6.43,8.73,1
2022-10,59.909091,659.0,0.98,90.19,11
2022-12,16.628,415.7,3.312,106.98,25
2023-01,18.366913,56326288.54,3.847351,10328466.23,3066726
2023-02,18.220242,53093841.17,3.867976,9863234.47,2914003


### 3.4 Odeme Tipi Analizi

In [12]:
def payment_analysis():
    payment_map = {
        1: 'Credit Card',
        2: 'Cash',
        3: 'No Charge',
        4: 'Dispute',
        5: 'Unknown',
        6: 'Voided'
    }
    df['payment_name'] = df['payment_type'].map(payment_map)
    
    return df.groupby('payment_name').agg({
        'fare_amount': 'mean',
        'tip_amount': 'mean',
        'total_amount': 'sum',
        'VendorID': 'count'
    }).rename(columns={'VendorID': 'count'}).sort_values('count', ascending=False)

df_payment = benchmark(payment_analysis, 'payment_analysis')
df_payment


Operation: payment_analysis
Sure: 4.316 saniye
Bellek: 292.29 MB


Unnamed: 0_level_0,fare_amount,tip_amount,total_amount,count
payment_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Credit Card,19.814779,4.396301,890779100.0,29856932
Cash,19.388154,0.001941,157050200.0,6405059
Dispute,1.724659,0.05158,1130853.0,498015
No Charge,8.127843,0.035321,2540632.0,240862
Unknown,0.0,0.0,0.0,2


### 3.5 Location Analizi

In [13]:
def location_analysis():
    return df.groupby('PULocationID').agg({
        'VendorID': 'count',
        'fare_amount': 'mean',
        'trip_distance': 'mean',
        'tip_amount': 'mean'
    }).rename(columns={'VendorID': 'trip_count'}).sort_values('trip_count', ascending=False)

df_locations = benchmark(location_analysis, 'location_analysis')
print("\nEn Populer 15 Pickup Lokasyonu:")
df_locations.head(15)


Operation: location_analysis
Sure: 1.295 saniye
Bellek: 0.00 MB

En Populer 15 Pickup Lokasyonu:


Unnamed: 0_level_0,trip_count,fare_amount,trip_distance,tip_amount
PULocationID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
132,1992304,61.080875,15.829447,8.688623
237,1791795,13.009022,1.92618,2.665287
161,1766041,16.473684,2.637109,3.230466
236,1596584,13.584188,2.137991,2.766769
162,1353753,15.879083,2.648804,3.152426
138,1305259,42.688226,9.857485,8.664628
186,1305113,16.984974,2.449244,3.162617
230,1270681,19.279069,3.472265,3.477988
142,1256024,14.334973,2.802366,2.894276
170,1131673,15.845585,2.605052,3.063634


### 3.6 Sorting - En Pahali Yolculuklar

In [14]:
def top_expensive_trips():
    return df.nlargest(1000, 'total_amount')[[
        'tpep_pickup_datetime', 'trip_distance',
        'fare_amount', 'tip_amount', 'total_amount',
        'PULocationID', 'DOLocationID'
    ]]

df_expensive = benchmark(top_expensive_trips, 'top_expensive_trips')
print("\nEn Pahali 10 Yolculuk:")
df_expensive.head(10)


Operation: top_expensive_trips
Sure: 9.214 saniye
Bellek: 0.25 MB

En Pahali 10 Yolculuk:


Unnamed: 0,tpep_pickup_datetime,trip_distance,fare_amount,tip_amount,total_amount,PULocationID,DOLocationID
17439335,2023-06-12 13:33:06,1.5,386983.63,0.0,386987.63,100,50
25349953,2023-09-02 15:15:39,21.3,187502.96,0.0,187513.9,239,132
25555738,2023-09-05 10:16:13,0.7,143163.45,0.0,143167.45,249,90
26249203,2023-09-11 14:54:55,0.0,19152.9,0.0,29156.9,43,264
27892250,2023-09-30 17:58:34,0.0,12015.47,0.0,12015.47,163,264
31549066,2023-10-23 20:43:13,0.0,6339.0,0.0,6339.0,48,125
14388207,2023-05-16 10:12:28,40.81,6300.9,0.0,6304.9,239,264
36222775,2023-12-12 07:51:03,0.0,95.16,4174.0,4269.16,264,264
9952970,2023-04-06 14:08:51,12.58,2449.5,0.0,2451.0,216,265
37232285,2023-12-20 18:49:49,6.7,2320.11,0.0,2372.79,233,40


### 3.7 Rolling Statistics (Hareketli Ortalama)

In [15]:
def daily_rolling_stats():
    # Gunluk toplam
    daily = df.set_index('tpep_pickup_datetime').resample('D').agg({
        'fare_amount': 'sum',
        'trip_distance': 'sum',
        'VendorID': 'count'
    }).rename(columns={'VendorID': 'trip_count'})
    
    # 7 gunluk hareketli ortalama
    daily['fare_7d_avg'] = daily['fare_amount'].rolling(7).mean()
    daily['trips_7d_avg'] = daily['trip_count'].rolling(7).mean()
    
    return daily

df_daily = benchmark(daily_rolling_stats, 'daily_rolling_stats')
df_daily.tail(10)


Operation: daily_rolling_stats
Sure: 34.781 saniye
Bellek: -29.23 MB


Unnamed: 0_level_0,fare_amount,trip_distance,trip_count,fare_7d_avg,trips_7d_avg
tpep_pickup_datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-12-25,936450.2,232377.3,45466,1787275.0,94814.142857
2023-12-26,1427927.72,287616.43,68261,1663229.0,87131.0
2023-12-27,1709622.23,309827.57,81293,1586211.0,81787.0
2023-12-28,1770609.03,314216.55,84419,1521062.0,77119.142857
2023-12-29,1770287.27,317479.94,86171,1478444.0,73517.0
2023-12-30,1680648.99,320304.65,82501,1494440.0,73401.142857
2023-12-31,1472304.08,281323.76,76955,1538264.0,75009.428571
2024-01-01,17.2,2.41,2,1404488.0,68514.571429
2024-01-02,0.0,0.0,0,1200498.0,58763.0
2024-01-03,354.09,38.28,4,956317.2,47150.285714


## 4. Pandas'in Limitleri - Ozet

In [16]:
print("\n" + "="*70)
print("PANDAS LIMITLERI - 12 AYLIK NYC TAXI VERISI")
print("="*70)

print(f"\n1. BELLEK KULLANIMI")
print(f"   DataFrame: {df.memory_usage(deep=True).sum() / 1024**3:.2f} GB")
print(f"   Process: {get_memory_mb() / 1024:.2f} GB")
print(f"   Satir: {len(df):,}")

print(f"\n2. PERFORMANS SORUNLARI")
total_time = sum(r['duration_sec'] for r in results['operations'].values())
print(f"   Toplam islem suresi: {total_time:.1f} saniye")
print(f"   Tek thread kullanimi")
print(f"   GIL nedeniyle paralellestirme yok")

print(f"\n3. OLCEKLEME SORUNU")
print(f"   2 yillik veri = ~80M satir = ~16GB+ bellek")
print(f"   Colab limiti: ~12GB")
print(f"   Sonuc: IMKANSIZ!")

print(f"\n4. COZUM")
print(f"   -> Polars: Hiz (5-10x)")
print(f"   -> Dask: Out-of-core, distributed")
print(f"   -> Ray: ML pipeline")


PANDAS LIMITLERI - 12 AYLIK NYC TAXI VERISI

1. BELLEK KULLANIMI
   DataFrame: 9.67 GB
   Process: 8.98 GB
   Satir: 38,310,226

2. PERFORMANS SORUNLARI
   Toplam islem suresi: 62.1 saniye
   Tek thread kullanimi
   GIL nedeniyle paralellestirme yok

3. OLCEKLEME SORUNU
   2 yillik veri = ~80M satir = ~16GB+ bellek
   Colab limiti: ~12GB
   Sonuc: IMKANSIZ!

4. COZUM
   -> Polars: Hiz (5-10x)
   -> Dask: Out-of-core, distributed
   -> Ray: ML pipeline


## 5. Sonuclari Kaydet

In [17]:
results['total_memory_mb'] = round(get_memory_mb(), 2)
results['total_memory_gb'] = round(get_memory_mb() / 1024, 2)
results['row_count'] = len(df)
results['file_size_mb'] = round(total_size / 1024**2, 2)

os.makedirs('results', exist_ok=True)
with open('results/pandas_benchmark.json', 'w') as f:
    json.dump(results, f, indent=2)

print("Sonuclar kaydedildi: results/pandas_benchmark.json")
print("\n" + json.dumps(results, indent=2))

Sonuclar kaydedildi: results/pandas_benchmark.json

{
  "framework": "pandas",
  "dataset": "nyc_taxi_12_months",
  "operations": {
    "load_data": {
      "duration_sec": 5.228,
      "memory_mb": 7269.83
    },
    "filter_trips": {
      "duration_sec": 1.232,
      "memory_mb": 1048.85
    },
    "groupby_hour": {
      "duration_sec": 2.443,
      "memory_mb": 146.48
    },
    "groupby_month": {
      "duration_sec": 3.6,
      "memory_mb": 292.59
    },
    "payment_analysis": {
      "duration_sec": 4.316,
      "memory_mb": 292.29
    },
    "location_analysis": {
      "duration_sec": 1.295,
      "memory_mb": 0.0
    },
    "top_expensive_trips": {
      "duration_sec": 9.214,
      "memory_mb": 0.25
    },
    "daily_rolling_stats": {
      "duration_sec": 34.781,
      "memory_mb": -29.23
    }
  },
  "total_memory_mb": 9199.43,
  "total_memory_gb": 8.98,
  "row_count": 38310226,
  "file_size_mb": 606.29
}


## 6. Ozet Tablo

In [18]:
summary_data = []
for op, metrics in results['operations'].items():
    summary_data.append({
        'Islem': op,
        'Sure (s)': metrics['duration_sec'],
        'Bellek (MB)': metrics['memory_mb']
    })

df_summary = pd.DataFrame(summary_data)
print("\n" + "="*60)
print("PANDAS BENCHMARK OZETI (NYC Taxi - 12 Ay)")
print("="*60)
display(df_summary)

total_time = sum(m['duration_sec'] for m in results['operations'].values())
print(f"\nToplam sure: {total_time:.1f} saniye")
print(f"Toplam bellek: {results['total_memory_gb']:.2f} GB")
print(f"Satir sayisi: {results['row_count']:,}")


PANDAS BENCHMARK OZETI (NYC Taxi - 12 Ay)


Unnamed: 0,Islem,Sure (s),Bellek (MB)
0,load_data,5.228,7269.83
1,filter_trips,1.232,1048.85
2,groupby_hour,2.443,146.48
3,groupby_month,3.6,292.59
4,payment_analysis,4.316,292.29
5,location_analysis,1.295,0.0
6,top_expensive_trips,9.214,0.25
7,daily_rolling_stats,34.781,-29.23



Toplam sure: 62.1 saniye
Toplam bellek: 8.98 GB
Satir sayisi: 38,310,226


---

## Sonraki Adim

Ayni 12 aylik veriyi **Polars** ile isleyip hiz farkini gorecegiz!

-> `02_polars_demo.ipynb`