In [5]:
import sys
!{sys.executable} -m pip install openpyxl

import pandas as pd
import numpy as np




[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: python.exe -m pip install --upgrade pip


In [11]:
SOURCE_FILE = "./data/dane_puste_przebiegi_cleaned.xlsx"

EMPTY_PASSAGE_FILE = "./data/output/empty_passage_edges.xlsx"

## Loda data

In [8]:
df = pd.read_excel(SOURCE_FILE)
df

Unnamed: 0,full_number,client,tractor,route,load_city,load_postal_code,load_country,unload_city,unload_postal_code,unload_country,pickup_planned,pickup_actual,delivery_planned,delivery_actual,empty_km_map,loaded_km_map,total_km_map
0,26692/2025,129.0,WGM9815L,(DE) N√ºrnberg - (AT) W√∂rgl - (DE) N√ºrnberg,n√ºrnberg,90475,DE,n√ºrnberg,90475,DE,2025-09-08 00:30:00,2025-09-08 00:46:00,2025-09-08 22:00:00,2025-09-08 22:15:00,0.000,553.013,553.013
1,27082/2025,105.0,PZ4S023,(PL) Jarosty - (CZ) Praha 5,jarosty,97-310,PL,praha 5,,CZ,2025-09-08 01:10:00,2025-09-07 23:55:00,2025-09-09 03:15:00,2025-09-09 01:47:27,0.000,540.058,540.058
2,27028/2025,61.0,PZ4R993,(DE) Euskirchen - (FR) amiens,euskirchen,53881,DE,amiens,80013,FR,2025-09-08 05:00:00,2025-09-08 16:32:00,2025-09-08 15:00:00,2025-09-09 11:16:00,0.000,409.626,409.626
3,27026/2025,47.0,WGM8283K,(CZ) Mod≈ôice - (CZ) Pelh≈ôimov - (CZ) VRSKMAN,mod≈ôice,66442,CZ,vrskman,43111,CZ,2025-09-08 05:30:00,2025-09-08 05:48:28,2025-09-08 12:30:00,2025-09-08 14:04:21,318.425,351.447,669.872
4,27049/2025,105.0,WGM5118H,(PL) Chlastawa - (DE) Berlin,chlastawa,66-210,PL,berlin,13597,DE,2025-09-08 06:00:00,2025-09-09 00:34:00,2025-09-09 06:00:00,2025-09-09 05:17:00,0.789,216.089,216.878
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1638,29273/2025,61.0,WGM9817L,(FR) amiens - (DE) Langenbach,amiens,80013,FR,langenbach,85416,DE,2025-09-27 10:00:00,2025-09-27 13:31:00,2025-09-30 15:30:00,NaT,0.000,913.536,913.536
1639,29398/2025,105.0,WGM5121H,(PL) Osieczna - (CZ) Praha - CERNY MOST,osieczna,64-113,PL,praha - cerny most,19800,CZ,2025-09-27 10:00:00,2025-09-27 20:10:04,2025-09-29 20:45:00,NaT,0.973,372.595,373.568
1640,29444/2025,90.0,PZ911JJ,(PL) OSIECZNA - (PL) Wiskitki,osieczna,64-113,PL,wiskitki,,PL,2025-09-27 12:00:00,2025-09-26 14:09:12,2025-09-27 17:00:00,2025-09-27 20:12:12,0.000,311.320,311.320
1641,29515/2025,61.0,PO2PY72,(PL) OSIECZNA - (PL) Sochaczew,osieczna,64-113,PL,sochaczew,96-500,PL,2025-09-27 14:00:00,NaT,2025-10-01 03:00:00,NaT,0.000,316.000,316.000


## üìÑ Logic for Detecting Empty Passages

An **empty run** is a movement where a vehicle travels **without cargo** between two consecutive transport orders.

### 1. Input Data

| Column             | Description                                                                 |
|--------------------|-----------------------------------------------------------------------------|
| `tractor`          | Tractor / vehicle ID                                                         |
| `load_city`        | Loading city of the current order                                            |
| `load_country`     | Loading country                                                              |
| `unload_city`      | Unloading city of the current order                                          |
| `unload_country`   | Unloading country                                                            |
| `pickup_planned`   | Planned pickup date and time                                                 |
| `delivery_planned` | Planned delivery date and time                                               |
| `empty_km_map`     | Recorded empty kilometers reported for the next order                        |

### 2. Sorting the Transport Orders

The dataset is sorted by:

1. `tractor` (vehicle ID)  
2. `pickup_planned` (planned loading time)

This ensures that all orders for a given vehicle are in correct chronological order, enabling pairwise analysis of consecutive tasks.

### 3. Detecting Empty Runs

For each tractor, the script analyzes consecutive pairs of orders:

```
(prev) order i-1 ‚Üí (curr) order i
```

A pair is considered an empty run if it meets the following conditions:
- **Location change**

    The unloading location of the previous order differs from the loading location of the next order.  
    This indicates that the vehicle traveled between the two locations without performing a transport operation.

- **Empty kilometers are recorded**

    `empty_km_map > 0`

    This confirms that the system registered an empty movement.

- **Time validation**

    If both timestamps are available, time between the orders is computed:

    ```
    time_diff = curr.pickup_planned ‚àí prev.delivery_planned
    ```

### 4. Output Data

Each detected empty run is stored with the following structure:

| Field            | Description                                                                            |
|------------------|----------------------------------------------------------------------------------------|
| `tractor`        | Vehicle ID                                                                             |
| `start_city`     | Unloading city of the previous order                                                   |
| `unload_country` | Country of unloading                                                                   |
| `end_city`       | Loading city of the next order                                                         |
| `load_country`   | Country of loading                                                                     |
| `start_date`     | Datetime of the previous unloading                                                     |
| `end_date`       | Datetime of the next loading                                                           |
| `time`           | Time difference between the two orders                                                 |
| `empty_km_map`   | Number of recorded empty kilometers                                                   |


## ‚öôÔ∏è Matrix of empty passage

In [None]:
# 2Convert to datetime format
df["pickup_planned"] = pd.to_datetime(df["pickup_planned"], errors="coerce")
df["delivery_planned"] = pd.to_datetime(df["delivery_planned"], errors="coerce")

# 2Sort by tractor and pickup planned date
df = df.sort_values(["tractor", "pickup_planned"]).reset_index(drop=True)

edges = []

for tractor, group in df.groupby("tractor"):
    group = group.sort_values("pickup_planned").reset_index(drop=True)
    
    # Iterate by pair of rows (1..N-1)
    for i in range(1, len(group)):
        prev = group.iloc[i - 1]
        curr = group.iloc[i]
        
        # Check if previous unload city ‚â† next load city
        if prev["unload_city"] != curr["load_city"]:
            # Check if empty_km_map > 0
            if pd.notna(curr["empty_km_map"]) and curr["empty_km_map"] > 0:
                # Calculate time between unload and next load
                time_diff = None
                if pd.notna(prev["delivery_planned"]) and pd.notna(curr["pickup_planned"]):
                    time_diff = curr["pickup_planned"] - prev["delivery_planned"]
                
                # Add entry to list
                edges.append({
                    "tractor": tractor,
                    "start_city": prev["unload_city"],
                    "unload_country": prev["unload_country"],
                    "end_city": curr["load_city"],
                    "load_country": curr["load_country"],
                    "start_date": prev["delivery_planned"],
                    "end_date": curr["pickup_planned"],
                    "time": time_diff,
                    "empty_km_map": curr["empty_km_map"]
                })

print(f"\nNumber of detected empty runs: {len(edges)}")


Number of detected empty runs: 1035


In [10]:
edges_df = pd.DataFrame(edges)
edges_df["time"] = edges_df["time"].astype(str)

edges_df


Unnamed: 0,tractor,start_city,unload_country,end_city,load_country,start_date,end_date,time,empty_km_map
0,PO2PY63,landsberg,DE,s√ºlzetal,DE,2025-09-12 18:00:00,2025-09-13 08:00:00,0 days 14:00:00,92.733
1,PO2PY63,,DE,euskirchen,DE,2025-09-15 09:00:00,2025-09-17 13:00:00,2 days 04:00:00,44.906
2,PO2PY63,meineweh-schleinitz,DE,bitterfeld-wolfen,DE,2025-09-18 17:00:00,2025-09-19 00:00:00,0 days 07:00:00,78.261
3,PO2PY63,hardenberg,NL,georgsmarienh√ºtte,DE,2025-09-22 07:30:00,2025-09-22 14:00:00,0 days 06:30:00,121.448
4,PO2PY63,,NL,geertruidenberg,NL,2025-09-24 08:00:00,2025-09-24 11:00:00,0 days 03:00:00,26.527
...,...,...,...,...,...,...,...,...,...
1030,WGM9818L,sochaczew,PL,jarosty,PL,2025-09-16 12:00:00,2025-09-16 17:00:00,0 days 05:00:00,124.451
1031,WGM9818L,budapest,HU,gy√°l,HU,2025-09-18 02:30:00,2025-09-18 17:30:00,0 days 15:00:00,44.083
1032,WGM9818L,usti nad labem,CZ,rakovn√≠k,CZ,2025-09-19 14:00:00,2025-09-19 17:00:00,0 days 03:00:00,114.713
1033,WGM9818L,r√ºsselsheim am main,DE,worms,DE,2025-09-22 06:30:00,2025-09-22 12:00:00,0 days 05:30:00,64.882


In [12]:

edges_df.to_excel(EMPTY_PASSAGE_FILE, index=False)
print(f"Zapisano wynik do: {EMPTY_PASSAGE_FILE}")

Zapisano wynik do: ./data/output/empty_passage_edges.xlsx
