# Simple Guide: Using Route-Tankerkoenig Integration

This notebook shows:
1. How to get model-ready data
2. What the output looks like
3. How to access specific data
4. Simple examples for the model

**Two ways to get data:**
- Method 1: Full pipeline (user addresses → model data) - Use when ORS API works
- Method 2: Test data (example stations) - Use when ORS API is down

## Setup

In [None]:
import pandas as pd

# Option A: If running from src/integration
from route_tankerkoenig_integration import (
    get_fuel_prices_for_route,       # Method 1: Full pipeline
    integrate_route_with_prices      # Method 2: With test data
)

# Option B: If running from project root
'''
from src.integration.route_tankerkoenig_integration import (
    get_fuel_prices_for_route,      # Method 1: Full pipeline
    integrate_route_with_prices     # Method 2: With test data
)

'''



--- Check GEOCODING RESULTS ---
Start: Wilhelmstraße, Tübingen, BW, Germany
 → lat = 48.529871, lon = 9.073020
End: Charlottenstraße 45, Reutlingen, BW, Germany
 → lat = 48.495020, lon = 9.220132

--- Check ROUTE ---
Route: 15.2 km
Duration: 22 min
Number of points unthinned list: 272 coordinates along the route
Simplified route from 272 to 45 points

 Delete later 
 Status code: 200 Reason: OK

--- Stations with ETA ---


## Method 1: Full Pipeline (When ORS Works)

User provides addresses, get model data automatically:

In [None]:
# User inputs
start_locality = "Tübingen"
end_locality = "Reutlingen"
start_address = "Wilhelmstraße 7" # Optional (town center if not given)
end_address = "Charlottenstraße 45" # Optional (town center if not given)

# Main function call
try:
    model_input = get_fuel_prices_for_route(
        start_locality=start_locality,
        end_locality=end_locality,
        start_address=start_address,
        end_address=end_address,
        use_realtime=False  # True for live API prices; False for price today = price yesterday
    )
    
    print(f"✓ Success! Found {len(model_input)} stations")
    
except Exception as e:
    print(f"✗ Failed: {e}")
    print("Using Method 2 (test data) instead...")
    model_input = None 


COMPLETE FUEL PRICE PIPELINE
Route: Tübingen → Reutlingen
Mode: HISTORICAL (demo)

Step 1: Geocoding addresses...
  Start: Wilhelmstraße, Tübingen, BW, Germany (48.52987, 9.07302)
  End: Charlottenstraße 45, Reutlingen, BW, Germany (48.49502, 9.22013)

Step 2: Calculating route...
  Distance: 15.2 km
  Duration: 22 min

Step 3: Finding fuel stations along route...
  Route has 272 coordinate points
  Buffer: 300m, Route length: 15.2km
Simplified route from 272 to 45 points

 Delete later 
 Status code: 200 Reason: OK
  Found 0 stations

  Possible reasons:
    1. ORS POI database doesn't have stations for this area
    2. Buffer distance is too small (current: 300m)
    3. Route is very short
    4. ORS API rate limit reached

  Try running route_stations.py directly to verify it works.
✓ Success! Found 0 stations


## Method 2: Test Data (When ORS is Down)

Use example station coordinates:

In [3]:
# Example stations (Alternative when OSM is not working; from when route_stations.py worked)
example_stations_with_eta = [
    {
        "name": "Aral",
        "lat": 48.51279,
        "lon": 9.07341,
        "distance": 45.7,
        "distance_along_m": 3058,
        "fraction_of_route": 0.201,
        "eta": "2025-11-23T14:35:49"
    },
    {
        "name": "Esso",
        "lat": 48.49228,
        "lon": 9.20297,
        "distance": 33.8,
        "distance_along_m": 13494,
        "fraction_of_route": 0.887,
        "eta": "2025-11-23T14:50:46"
    },
    {
        "name": "Jet",
        "lat": 48.49178,
        "lon": 9.19813,
        "distance": 29.3,
        "distance_along_m": 13140,
        "fraction_of_route": 0.864,
        "eta": "2025-11-23T14:50:16"
    }
]

# Skip routing, go straight to price enrichment to get model input
model_input = integrate_route_with_prices(
    stations_with_eta=example_stations_with_eta,
    use_realtime=False
)

print(f"✓ Using test data: {len(model_input)} stations")


ROUTE-TANKERKOENIG INTEGRATION
Mode: HISTORICAL (yesterday = current)
Input stations: 3
Loading all stations from Supabase...
  Loaded 5,000 stations...
  Loaded 10,000 stations...
  Loaded 15,000 stations...
  Filtered out 34 stations with invalid coordinates
Loaded 17,651 valid stations from Supabase

Matching stations to Tankerkoenig database...
Matched: 3, Unmatched: 0

Fetching historical prices from Supabase...
Retrieved historical prices for 3 stations

Using yesterday's prices as current prices (demo mode)...

INTEGRATION COMPLETE
Total stations processed: 3
Stations with complete price data: 3
✓ Using test data: 3 stations


## What the Output Looks Like

In [4]:
# model_input is a list of dictionaries
print(f"Type: {type(model_input)}")
print(f"Number of stations: {len(model_input)}")

# Look at first station
print("\nFirst station (complete data):")
print("=" * 60)
first_station = model_input[0]
for key, value in first_station.items():
    print(f"{key:25} {value}")

Type: <class 'list'>
Number of stations: 3

First station (complete data):
osm_name                  Aral
lat                       48.51279
lon                       9.07341
distance_along_m          3058
fraction_of_route         0.201
eta                       2025-11-23T14:35:49
station_uuid              ca944555-1ee4-4228-979d-8313690ae950
tk_name                   Aral Tankstelle
brand                     ARAL
city                      Tübingen
tk_latitude               48.51295
tk_longitude              9.073625
match_distance_m          23.85
time_cell                 29
price_lag_1d_e5           1.759
price_lag_1d_e10          1.699
price_lag_1d_diesel       1.699
price_lag_2d_e5           1.759
price_lag_2d_e10          1.699
price_lag_2d_diesel       1.729
price_lag_3d_e5           1.769
price_lag_3d_e10          1.709
price_lag_3d_diesel       1.729
price_lag_7d_e5           1.779
price_lag_7d_e10          1.719
price_lag_7d_diesel       1.689
price_current_e5          1.75

## Accessing Specific Data

Simple examples:

In [5]:
# Example 1: Get first station
first_station = model_input[0]
print(f"First station: {first_station['tk_name']}")

# Example 2: Get second station's brand
second_station = model_input[1]
brand = second_station['brand']
print(f"Second station brand: {brand}")

# Example 3: Get E5 lag prices for third station
third_station = model_input[2]
lag_1d = third_station['price_lag_1d_e5']
lag_7d = third_station['price_lag_7d_e5']
print(f"Third station E5: 1d={lag_1d}, 7d={lag_7d}")

# Example 4: Loop through all stations
print("\nAll stations:")
for i, station in enumerate(model_input):
    print(f"{i+1}. {station['tk_name']} ({station['brand']})")

First station: Aral Tankstelle
Second station brand: ESSO
Third station E5: 1d=1.729, 7d=1.739

All stations:
1. Aral Tankstelle (ARAL)
2. Esso Tankstelle (ESSO)
3. JET REUTLINGEN KONRAD-ADENAUER-STR. 63 (JET)


## Convert to DataFrame (Optional)

In [6]:
# Convert to pandas DataFrame
df = pd.DataFrame(model_input)

print(f"DataFrame shape: {df.shape}")
print(f"\nColumns: {df.columns.tolist()}")

# Display the full DataFrame
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.width', None)  # Don't wrap


# Show first few rows
df.head()

DataFrame shape: (3, 30)

Columns: ['osm_name', 'lat', 'lon', 'distance_along_m', 'fraction_of_route', 'eta', 'station_uuid', 'tk_name', 'brand', 'city', 'tk_latitude', 'tk_longitude', 'match_distance_m', 'time_cell', 'price_lag_1d_e5', 'price_lag_1d_e10', 'price_lag_1d_diesel', 'price_lag_2d_e5', 'price_lag_2d_e10', 'price_lag_2d_diesel', 'price_lag_3d_e5', 'price_lag_3d_e10', 'price_lag_3d_diesel', 'price_lag_7d_e5', 'price_lag_7d_e10', 'price_lag_7d_diesel', 'price_current_e5', 'price_current_e10', 'price_current_diesel', 'is_open']


Unnamed: 0,osm_name,lat,lon,distance_along_m,fraction_of_route,eta,station_uuid,tk_name,brand,city,tk_latitude,tk_longitude,match_distance_m,time_cell,price_lag_1d_e5,price_lag_1d_e10,price_lag_1d_diesel,price_lag_2d_e5,price_lag_2d_e10,price_lag_2d_diesel,price_lag_3d_e5,price_lag_3d_e10,price_lag_3d_diesel,price_lag_7d_e5,price_lag_7d_e10,price_lag_7d_diesel,price_current_e5,price_current_e10,price_current_diesel,is_open
0,Aral,48.51279,9.07341,3058,0.201,2025-11-23T14:35:49,ca944555-1ee4-4228-979d-8313690ae950,Aral Tankstelle,ARAL,Tübingen,48.51295,9.073625,23.85,29,1.759,1.699,1.699,1.759,1.699,1.729,1.769,1.709,1.729,1.779,1.719,1.689,1.759,1.699,1.699,
1,Esso,48.49228,9.20297,13494,0.887,2025-11-23T14:50:46,e8d55212-b30f-449e-b65c-913a7b2002b1,Esso Tankstelle,ESSO,REUTLINGEN,48.492396,9.202997,13.05,29,1.739,1.679,1.639,1.759,1.699,1.659,1.759,1.699,1.669,1.749,1.689,1.619,1.739,1.679,1.639,
2,Jet,48.49178,9.19813,13140,0.864,2025-11-23T14:50:16,51d4b6c4-a095-1aa0-e100-80009459e03a,JET REUTLINGEN KONRAD-ADENAUER-STR. 63,JET,REUTLINGEN,48.49176,9.19802,8.43,29,1.729,1.669,1.619,1.749,1.689,1.649,1.749,1.689,1.649,1.739,1.679,1.609,1.729,1.669,1.619,


## Accessing Data in DataFrame

In [7]:
# Get all brands
print("Brands:")
print(df['brand'].tolist())

# Get second station's brand
print(f"\nSecond station brand: {df.loc[1, 'brand']}")

# Get all E5 lag_1d prices
print("\nAll E5 lag_1d prices:")
print(df['price_lag_1d_e5'].tolist())

# Get specific columns
summary = df[['tk_name', 'brand', 'price_lag_1d_e5', 'price_lag_7d_e5']]
print("\nSummary view:")
print(summary)

Brands:
['ARAL', 'ESSO', 'JET']

Second station brand: ESSO

All E5 lag_1d prices:
[1.759, 1.739, 1.729]

Summary view:
                                  tk_name brand  price_lag_1d_e5  \
0                         Aral Tankstelle  ARAL            1.759   
1                         Esso Tankstelle  ESSO            1.739   
2  JET REUTLINGEN KONRAD-ADENAUER-STR. 63   JET            1.729   

   price_lag_7d_e5  
0            1.779  
1            1.749  
2            1.739  


## Model Input Preparation

The model needs 4 columns per fuel type. Here's how to extract them:

In [8]:
# Choose fuel type
fuel_type = "e5"  # or "e10" or "diesel" --> Important here: Renaming of columns to fit the model inputs

# Extract the 4 columns you need
model_features = df[[
    f'price_lag_1d_{fuel_type}',
    f'price_lag_2d_{fuel_type}',
    f'price_lag_3d_{fuel_type}',
    f'price_lag_7d_{fuel_type}'
]].copy()

# Rename columns to match your training data
model_features.columns = [
    'price_lag_1d',
    'price_lag_2d',
    'price_lag_3d',
    'price_lag_7d'
]

print(f"Model input for {fuel_type}:")
print(model_features)

Model input for e5:
   price_lag_1d  price_lag_2d  price_lag_3d  price_lag_7d
0         1.759         1.759         1.769         1.779
1         1.739         1.759         1.759         1.749
2         1.729         1.749         1.749         1.739


## Helper Function for Model Input

To make it easier, here's a simple function that just renames the columns automatically

In [None]:
def prepare_model_input(model_input_list, fuel_type):
    """
    Convert integration output to model-ready DataFrame depending on which fuel type the model needs (E5, E10, Diesel).
    
    Args:
        model_input_list: Output from get_fuel_prices_for_route()
        fuel_type: "e5", "e10", or "diesel"
    
    Returns:
        DataFrame with 4 columns: price_lag_1d, price_lag_2d, price_lag_3d, price_lag_7d
    """
    df = pd.DataFrame(model_input_list)
    
    model_features = df[[
        f'price_lag_1d_{fuel_type}',
        f'price_lag_2d_{fuel_type}',
        f'price_lag_3d_{fuel_type}',
        f'price_lag_7d_{fuel_type}'
    ]].copy()
    
    model_features.columns = [
        'price_lag_1d',
        'price_lag_2d',
        'price_lag_3d',
        'price_lag_7d'
    ]
    
    return model_features

# Usage:
X = prepare_model_input(model_input, "e5") # change fuel type --> updated X values
print("Ready for model.predict(X):")
print(X)

## Summary

### Quick Reference:

```python
# Get data
model_input = get_fuel_prices_for_route(...)  # List of dicts

# Access as list
first_station = model_input[0]
brand = first_station['brand']

# Or convert to DataFrame
df = pd.DataFrame(model_input)
brand = df.loc[0, 'brand']

# For the model
X = prepare_model_input(model_input, "e5")
predictions = model.predict(X)
```

### Available Fields:

**Station Info:**
- `station_uuid`, `tk_name`, `brand`, `city`
- `lat`, `lon`, `eta`

**Model Features (for each fuel type):**
- `price_lag_1d_{fuel}` - Yesterday's price
- `price_lag_2d_{fuel}` - 2 days ago
- `price_lag_3d_{fuel}` - 3 days ago
- `price_lag_7d_{fuel}` - 7 days ago

**Current Prices:**
- `price_current_{fuel}` - Today's price (or predicted)
