- VendorID
- A code indicating the TPEP provider that provided the record.
- 1 = Creative Mobile Technologies, LLC
- 2 = Curb Mobility, LLC
- 6 = Myle Technologies Inc
- 7 = Helix
- payment_type
- A numeric code signifying how the passenger paid for the trip.
- 0 = Flex Fare trip
- 1 = Credit card
- 2 = Cash
- 3 = No charge
- 4 = Dispute
- 5 = Unknown
- - 6 = Voided trip
- tip_amount Tip amount – This field is automatically populated for credit card tips. Cash
- tips are not included.
- total_amount The total amount charged to passengers. Does not include cash tips.
- https://www.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf



silver_uber_fare - Veri Temizleme ve Dönüşüm Adımları
- - bronze_uber_fare tablosu okundu
- - Gereksiz kolonlar silindi (EventProcessedUtcTime, PartitionId, EventEnqueuedUtcTime)
- - Veri tipleri dönüştürüldü (string, float, timestamp)
- - pickup_datetime → timestamp formatına çevrildi
- - payment_type kodlara dönüştürüldü (0-6 arası)
- - tip_amount anomali kontrolü yapıldı → is_tip_anomaly flag'i eklendi
- - total_amount hesaplandı ve üzerine yazıldı (fare + tip + surcharge + mta_tax + tolls)
- - calculated_total ve total_mismatch kolonları silindi
- - Negatif değer kontrolü yapıldı, has_negative_values kolonunda True yoksa kolon silindi
- - trip_id SHA2 hash ile oluşturuldu (medallion + hack_license + pickup_datetime)
- - IQR analizi ile outlier tespiti yapıldı (fare_amount, tip_amount, total_amount)
- - is_fare_outlier, is_tip_outlier, is_total_outlier kolonları eklendi
- - month kolonuna göre partition için ay bilgisi çıkarıldı
- - Veriler delta formatında, partitionBy("month") ile silver_uber_fare tablosuna yazıldı


In [1]:
df_fare = spark.table("bronze.bronze_uber_fare_partioned")
df_fare.printSchema()
df_fare.show(5)
#tablo okundu


StatementMeta(, bd81cb7c-3f6c-4746-9b6a-1568d7f6beb0, 3, Finished, Available, Finished)

root
 |-- medallion: long (nullable = true)
 |-- hack_license: long (nullable = true)
 |-- vendor_id: string (nullable = true)
 |-- pickup_datetime: string (nullable = true)
 |-- payment_type: string (nullable = true)
 |-- fare_amount: double (nullable = true)
 |-- surcharge: double (nullable = true)
 |-- mta_tax: double (nullable = true)
 |-- tip_amount: double (nullable = true)
 |-- tolls_amount: double (nullable = true)
 |-- total_amount: double (nullable = true)
 |-- EventProcessedUtcTime: timestamp (nullable = true)
 |-- PartitionId: long (nullable = true)
 |-- EventEnqueuedUtcTime: timestamp (nullable = true)
 |-- ingest_date: date (nullable = true)

+----------+------------+---------+-------------------+------------+-----------+---------+-------+----------+------------+------------+---------------------+-----------+--------------------+-----------+
| medallion|hack_license|vendor_id|    pickup_datetime|payment_type|fare_amount|surcharge|mta_tax|tip_amount|tolls_amount|total_amou

In [3]:
df_fare = df_fare.drop("EventProcessedUtcTime", "EventEnqueuedUtcTime", "PartitionId")
#streamden gelen gereksiz kolonlar silindi

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 5, Finished, Available, Finished)

In [4]:
from pyspark.sql.functions import to_timestamp

df_fare = df_fare.withColumn("pickup_datetime", to_timestamp("pickup_datetime"))
#time-stamp e donusum yapıldı


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 6, Finished, Available, Finished)

In [5]:
from pyspark.sql.functions import col, sum as spark_sum

# Sayısal kolonlar
numeric_columns = [
    "fare_amount", "surcharge", "mta_tax",
    "tip_amount", "tolls_amount", "total_amount"
]

# Null değer sayımını göster
df_fare.select([spark_sum(col(c).isNull().cast("int")).alias(c + "_nulls") for c in numeric_columns]).show()
#numerik kolonlarda null deger var mı ,bakıldı


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 7, Finished, Available, Finished)

+-----------------+---------------+-------------+----------------+------------------+------------------+
|fare_amount_nulls|surcharge_nulls|mta_tax_nulls|tip_amount_nulls|tolls_amount_nulls|total_amount_nulls|
+-----------------+---------------+-------------+----------------+------------------+------------------+
|                0|              0|            0|               0|                 0|                 0|
+-----------------+---------------+-------------+----------------+------------------+------------------+



In [6]:
from pyspark.sql.functions import when

df_fare = df_fare.withColumn(
    "has_negative_values",
    when(
        (col("fare_amount") < 0) |
        (col("tip_amount") < 0) |
        (col("surcharge") < 0) |
        (col("mta_tax") < 0) |
        (col("tolls_amount") < 0) |
        (col("total_amount") < 0),
        True
    ).otherwise(False)
)
#negatif deger kontrolu yapıldı

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 8, Finished, Available, Finished)

In [7]:
df_fare.filter(col("has_negative_values") == True).show()


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 9, Finished, Available, Finished)

+---------+------------+---------+---------------+------------+-----------+---------+-------+----------+------------+------------+-----------+-------------------+
|medallion|hack_license|vendor_id|pickup_datetime|payment_type|fare_amount|surcharge|mta_tax|tip_amount|tolls_amount|total_amount|ingest_date|has_negative_values|
+---------+------------+---------+---------------+------------+-----------+---------+-------+----------+------------+------------+-----------+-------------------+
+---------+------------+---------+---------------+------------+-----------+---------+-------+----------+------------+------------+-----------+-------------------+



In [8]:
df_fare.filter((col("payment_type") == "CSH") & (col("tip_amount") > 0)) \
       .select("pickup_datetime", "payment_type", "fare_amount", "tip_amount", "total_amount") \
       .show(20, truncate=False)
#cash odemelerde tip amount a bakıldı.

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 10, Finished, Available, Finished)

+-------------------+------------+-----------+----------+------------+
|pickup_datetime    |payment_type|fare_amount|tip_amount|total_amount|
+-------------------+------------+-----------+----------+------------+
|2013-12-02 19:00:36|CSH         |6.5        |1.0       |9.0         |
|2013-12-02 19:16:45|CSH         |31.0       |6.5       |39.0        |
|2013-12-02 19:41:22|CSH         |11.0       |2.5       |15.0        |
|2013-12-02 20:25:41|CSH         |31.5       |10.0      |42.5        |
|2013-12-03 09:03:37|CSH         |12.0       |2.5       |15.0        |
|2013-12-19 16:14:51|CSH         |52.0       |11.56     |69.39       |
|2013-12-19 17:39:17|CSH         |12.5       |2.0       |16.0        |
|2013-12-19 20:45:05|CSH         |7.5        |1.7       |10.2        |
|2013-12-19 21:01:45|CSH         |6.0        |1.0       |8.0         |
|2013-12-19 21:09:05|CSH         |6.0        |1.4       |8.4         |
|2013-12-20 00:40:33|CSH         |6.0        |1.0       |8.0         |
|2013-

In [9]:
from pyspark.sql.functions import lit

df_fare = df_fare.withColumn(
    "tip_amount",
    when(col("payment_type") == "CSH", lit(0.0)).otherwise(col("tip_amount"))
)
#cash odemelerde tip amount sıfır yapıldı.

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 11, Finished, Available, Finished)

In [10]:
# CSH ödemelerinde tip_amount sıfır mı?
df_fare.filter(col("payment_type") == "CSH") \
       .select("payment_type", "tip_amount") \
       .distinct() \
       .show()


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 12, Finished, Available, Finished)

+------------+----------+
|payment_type|tip_amount|
+------------+----------+
|         CSH|       0.0|
+------------+----------+



In [11]:
from pyspark.sql.functions import col
from pyspark.sql.types import FloatType, StringType

df_fare = df_fare \
    .withColumn("medallion", col("medallion").cast(StringType())) \
    .withColumn("hack_license", col("hack_license").cast(StringType())) \
    .withColumn("fare_amount", col("fare_amount").cast(FloatType())) \
    .withColumn("surcharge", col("surcharge").cast(FloatType())) \
    .withColumn("mta_tax", col("mta_tax").cast(FloatType())) \
    .withColumn("tip_amount", col("tip_amount").cast(FloatType())) \
    .withColumn("tolls_amount", col("tolls_amount").cast(FloatType())) \
    .withColumn("total_amount", col("total_amount").cast(FloatType()))
#veri tipi donusumleri

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 13, Finished, Available, Finished)

In [12]:
from pyspark.sql.functions import col, count

df_fare.groupBy("medallion", "hack_license", "pickup_datetime") \
    .agg(count("*").alias("row_count")) \
    .filter(col("row_count") > 1) \
    .show(20, truncate=False)
#3 kolon bazında unique kontrolu

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 14, Finished, Available, Finished)

+---------+------------+---------------+---------+
|medallion|hack_license|pickup_datetime|row_count|
+---------+------------+---------------+---------+
+---------+------------+---------------+---------+



In [13]:
numeric_columns = [
    "fare_amount", "surcharge", "mta_tax",
    "tip_amount", "tolls_amount", "total_amount"
]


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 15, Finished, Available, Finished)

In [14]:
df_fare.select(numeric_columns).describe().show()
#numeric kolonların .describe ile analizi

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 16, Finished, Available, Finished)

+-------+------------------+-------------------+--------------------+------------------+------------------+------------------+
|summary|       fare_amount|          surcharge|             mta_tax|        tip_amount|      tolls_amount|      total_amount|
+-------+------------------+-------------------+--------------------+------------------+------------------+------------------+
|  count|          43083367|           43083367|            43083367|          43083367|          43083367|          43083367|
|   mean|12.666198806676242|0.31803246645042893| 0.49808463205765696|1.4286403628630662|0.2732496817003479| 15.18473377926524|
| stddev|10.399346129058674| 0.3597583464743384|0.030887139054312434|2.2841854230584353|1.2567056654652873|12.524633696934872|
|    min|               2.5|                0.0|                 0.0|               0.0|               0.0|               2.5|
|    max|             500.0|               15.0|                 0.5|             200.0|              20.0|    

In [15]:
from pyspark.sql.functions import col

# Şüpheli tip kayıtlarını seç
df_tip_anomaly_check = df_fare.filter(
    (col("tip_amount") > col("fare_amount")) | 
    (col("tip_amount") > (col("total_amount") / 2))
)

# Bu kayıtları göster
df_tip_anomaly_check.select(
    "fare_amount", "tip_amount", "total_amount", "payment_type", "pickup_datetime"
).orderBy(col("tip_amount").desc()).show(20, truncate=False)
#verilen tiplerdeki anormallige bakıldı.

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 17, Finished, Available, Finished)

+-----------+----------+------------+------------+-------------------+
|fare_amount|tip_amount|total_amount|payment_type|pickup_datetime    |
+-----------+----------+------------+------------+-------------------+
|52.0       |200.0     |252.5       |CRD         |2013-12-05 04:49:10|
|25.0       |200.0     |226.0       |CRD         |2013-10-07 05:57:00|
|7.0        |200.0     |208.0       |CRD         |2013-09-10 04:46:01|
|6.0        |200.0     |206.5       |CRD         |2013-09-16 12:51:22|
|2.5        |200.0     |203.0       |CRD         |2013-09-11 12:04:03|
|13.5       |200.0     |215.0       |CRD         |2013-09-26 16:09:00|
|23.0       |200.0     |224.0       |CRD         |2013-09-07 21:54:51|
|13.5       |200.0     |214.5       |CRD         |2013-09-28 02:28:20|
|6.5        |200.0     |207.5       |CRD         |2013-09-10 05:04:00|
|2.5        |200.0     |203.0       |CRD         |2013-10-24 13:35:00|
|6.5        |200.0     |207.5       |CRD         |2013-10-05 03:08:27|
|5.5  

In [16]:
from pyspark.sql.functions import when, lit

df_fare = df_fare.withColumn(
    "is_tip_anomaly",
    when((col("tip_amount") > col("fare_amount")) | (col("tip_amount") > (col("total_amount") / 2)), lit(True)).otherwise(lit(False))
)
#tip, taksi ucertinden veya total ucretin yarısında yuksekse anomali olarak gosterildi

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 18, Finished, Available, Finished)

In [17]:
from pyspark.sql.functions import col, when, lit

def mark_iqr_outliers(df, column_name, flag_column_name):
    q1, q3 = df.approxQuantile(column_name, [0.25, 0.75], 0.01)
    iqr = q3 - q1
    lower_bound = q1 - 1.5 * iqr
    upper_bound = q3 + 1.5 * iqr
    print(f"\n🧮 {column_name} için IQR Outlier Sınırları:")
    print(f"Q1: {q1}, Q3: {q3}, IQR: {iqr}")
    print(f"Alt sınır: {lower_bound}, Üst sınır: {upper_bound}")
    
    return df.withColumn(
        flag_column_name,
        when((col(column_name) < lower_bound) | (col(column_name) > upper_bound), lit(True)).otherwise(lit(False))
    )
    '''
#Kullanılan metod:
Q1 = 1. çeyrek, Q3 = 3. çeyrek

IQR = Q3 - Q1

Outlier: < Q1 - 1.5×IQR veya > Q3 + 1.5×IQR
'''

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 19, Finished, Available, Finished)

In [18]:
df_fare = mark_iqr_outliers(df_fare, "fare_amount", "is_fare_outlier")
df_fare = mark_iqr_outliers(df_fare, "tip_amount", "is_tip_outlier")
df_fare = mark_iqr_outliers(df_fare, "total_amount", "is_total_outlier")


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 20, Finished, Available, Finished)


🧮 fare_amount için IQR Outlier Sınırları:
Q1: 6.5, Q3: 14.5, IQR: 8.0
Alt sınır: -5.5, Üst sınır: 26.5

🧮 tip_amount için IQR Outlier Sınırları:
Q1: 0.0, Q3: 2.0, IQR: 2.0
Alt sınır: -3.0, Üst sınır: 5.0

🧮 total_amount için IQR Outlier Sınırları:
Q1: 8.079999923706055, Q3: 17.0, IQR: 8.920000076293945
Alt sınır: -5.300000190734863, Üst sınır: 30.380000114440918


In [19]:
df_fare.filter(col("is_fare_outlier") == True) \
       .select("fare_amount", "tip_amount", "total_amount", "pickup_datetime") \
       .orderBy(col("fare_amount").desc()) \
       .show(10, truncate=False)

df_fare.filter(col("is_tip_outlier") == True) \
       .select("fare_amount", "tip_amount", "total_amount", "pickup_datetime") \
       .orderBy(col("tip_amount").desc()) \
       .show(10, truncate=False)

df_fare.filter(col("is_total_outlier") == True) \
       .select("fare_amount", "tip_amount", "total_amount", "pickup_datetime") \
       .orderBy(col("total_amount").desc()) \
       .show(10, truncate=False)


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 21, Finished, Available, Finished)

+-----------+----------+------------+-------------------+
|fare_amount|tip_amount|total_amount|pickup_datetime    |
+-----------+----------+------------+-------------------+
|500.0      |0.0       |500.0       |2013-09-18 05:22:33|
|500.0      |0.0       |500.0       |2013-12-31 22:18:35|
|500.0      |0.0       |500.0       |2013-12-29 04:49:07|
|500.0      |0.0       |500.0       |2013-12-31 21:12:26|
|500.0      |0.0       |500.0       |2013-10-20 03:38:04|
|500.0      |0.0       |500.0       |2013-09-07 14:47:10|
|500.0      |0.0       |500.0       |2013-10-26 15:13:20|
|500.0      |0.0       |500.0       |2013-09-10 15:28:11|
|500.0      |0.0       |500.0       |2013-09-14 16:00:48|
|500.0      |0.0       |500.0       |2013-09-07 18:51:00|
+-----------+----------+------------+-------------------+
only showing top 10 rows

+-----------+----------+------------+-------------------+
|fare_amount|tip_amount|total_amount|pickup_datetime    |
+-----------+----------+------------+---------

Kullanılan sınırlar:
📌 fare_amount:
Q1 = 7.5, Q3 = 15.5 → IQR = 8.0

Alt sınır = -4.5 (önemsiz çünkü negatif fare yok)

Üst sınır = 27.5
📍 Yani fare_amount > 27.5 → is_fare_outlier = True

📌 tip_amount:
Q1 = 0.0, Q3 = 2.5 → IQR = 2.5

Üst sınır = 6.25
📍 Yani tip_amount > 6.25 → is_tip_outlier = True

📌 total_amount:
Q1 = 10.5, Q3 = 18.5 → IQR = 8.0

Üst sınır = 30.5
📍 Yani total_amount > 30.5 → is_total_outlier = True

In [20]:
numeric_columns = [
    "fare_amount", "surcharge", "mta_tax",
    "tip_amount", "tolls_amount", "total_amount"
]


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 22, Finished, Available, Finished)

In [21]:
df_fare.select(numeric_columns).describe().show()


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 23, Finished, Available, Finished)

+-------+------------------+-------------------+--------------------+------------------+------------------+------------------+
|summary|       fare_amount|          surcharge|             mta_tax|        tip_amount|      tolls_amount|      total_amount|
+-------+------------------+-------------------+--------------------+------------------+------------------+------------------+
|  count|          43083367|           43083367|            43083367|          43083367|          43083367|          43083367|
|   mean|12.666198806676242|0.31803246645042893| 0.49808463205765696|1.4286403628630662|0.2732496817003479| 15.18473377926524|
| stddev|10.399346129058674| 0.3597583464743384|0.030887139054312434|2.2841854230584353|1.2567056654652873|12.524633696934869|
|    min|               2.5|                0.0|                 0.0|               0.0|               0.0|               2.5|
|    max|             500.0|               15.0|                 0.5|             200.0|              20.0|    

In [22]:
from pyspark.sql.functions import sha2, concat_ws, col

df_fare = df_fare.withColumn(
    "trip_id",
    sha2(concat_ws("_", col("medallion"), col("hack_license"), col("pickup_datetime")), 256)
)
#trip ıd 

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 24, Finished, Available, Finished)

In [23]:
df_fare.select("trip_id", "medallion", "hack_license", "pickup_datetime").show(5, truncate=False)


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 25, Finished, Available, Finished)

+----------------------------------------------------------------+----------+------------+-------------------+
|trip_id                                                         |medallion |hack_license|pickup_datetime    |
+----------------------------------------------------------------+----------+------------+-------------------+
|af9e663c1b95c06ba065e229bf3fd946ac3ce477bf856ba91368ad983e9aad06|2013003977|2013017276  |2013-12-02 16:56:56|
|07549d63ffb319ad924dc08c16e2aed2535dc7500b18c0cb523ca20fc613df3a|2013007612|2013000105  |2013-12-02 16:56:56|
|d93b0ed763306f815b8b03d66726c103537c1cbadc1d8214ccb5f3a9cc8f94d9|2013011681|2013017948  |2013-12-02 16:56:57|
|b9dd1c82e0291673bd36aaaac363c18aec11ef695ace5fd97f9481e87471aaf1|2013005282|2013038741  |2013-12-02 16:56:57|
|b46a48f8b18b6355e9dd71fd660e4c0f7b29c26a726f156cd06c3f79833f2e86|2013012888|2013023942  |2013-12-02 16:56:58|
+----------------------------------------------------------------+----------+------------+-------------------+
o

In [24]:
from pyspark.sql.functions import when

df_fare = df_fare.withColumn(
    "payment_type",
    when(col("payment_type") == "FLEX", 0)
    .when(col("payment_type") == "CRD", 1)
    .when(col("payment_type") == "CSH", 2)
    .when(col("payment_type") == "NOC", 3)
    .when(col("payment_type") == "DIS", 4)
    .when(col("payment_type") == "UNK", 5)
    .when(col("payment_type") == "VOID", 6)
    .otherwise(None)  # veya lit(5) → Unknown gibi düşünebilirsin
)
#odeme tipi numaralandırılması

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 26, Finished, Available, Finished)

In [25]:
df_fare.select("payment_type").distinct().show()


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 27, Finished, Available, Finished)

+------------+
|payment_type|
+------------+
|           1|
|           3|
|           5|
|           4|
|           2|
+------------+



In [26]:
from pyspark.sql.functions import month

df_fare = df_fare.withColumn("month", month("pickup_datetime"))
#silver a ay partionlu yazım ıcın ay kolonu 

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 28, Finished, Available, Finished)

In [2]:

# Kolonu düşür (drop)
df_fare_cleaned = df_fare.drop("has_negative_values")


StatementMeta(, bd81cb7c-3f6c-4746-9b6a-1568d7f6beb0, 4, Finished, Available, Finished)

In [30]:

df_fare_cleaned.write.format("delta") \
    .mode("overwrite") \
    .partitionBy("month") \
    .saveAsTable("silver.silver_uber_farev3")
#silver a aktarım


StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 32, Finished, Available, Finished)

In [31]:
df_fare_cleaned.printSchema()

StatementMeta(, 50d5abc7-92b9-45ef-b015-71ff490a5f5a, 33, Finished, Available, Finished)

root
 |-- medallion: string (nullable = true)
 |-- hack_license: string (nullable = true)
 |-- vendor_id: string (nullable = true)
 |-- pickup_datetime: timestamp (nullable = true)
 |-- payment_type: integer (nullable = true)
 |-- fare_amount: float (nullable = true)
 |-- surcharge: float (nullable = true)
 |-- mta_tax: float (nullable = true)
 |-- tip_amount: float (nullable = true)
 |-- tolls_amount: float (nullable = true)
 |-- total_amount: float (nullable = true)
 |-- ingest_date: date (nullable = true)
 |-- is_tip_anomaly: boolean (nullable = false)
 |-- is_fare_outlier: boolean (nullable = false)
 |-- is_tip_outlier: boolean (nullable = false)
 |-- is_total_outlier: boolean (nullable = false)
 |-- trip_id: string (nullable = true)
 |-- month: integer (nullable = true)



In [4]:
df_fare = spark.table("silver.silver_uber_farev3")
total_rows = df_fare.count()
print(f"Toplam satır sayısı: {total_rows}")


StatementMeta(, bd81cb7c-3f6c-4746-9b6a-1568d7f6beb0, 6, Finished, Available, Finished)

Toplam satır sayısı: 43083367


In [5]:
unique_trip_ids = df_fare.select("trip_id").distinct().count()
print(f"Farklı trip_id sayısı: {unique_trip_ids}")


StatementMeta(, bd81cb7c-3f6c-4746-9b6a-1568d7f6beb0, 7, Finished, Available, Finished)

Farklı trip_id sayısı: 43083367


In [6]:
duplicate_count = total_rows - unique_trip_ids
print(f"Duplicate trip_id satır sayısı: {duplicate_count}")


StatementMeta(, bd81cb7c-3f6c-4746-9b6a-1568d7f6beb0, 8, Finished, Available, Finished)

Duplicate trip_id satır sayısı: 0
