# Complete Pipeline: User Input --> Model Input

This notebook shows the complete pipeline using one function:

**`get_fuel_prices_for_route()`** - Takes user addresses, returns model-ready data

## Explanations to
1. How to call the main function
2. What the output looks like
3. How to access specific data
4. How to prepare it for the model

---

**Note:** Because the ORS POI API is currently down, this notebook uses example data. Once the API works again, the real function will work exactly the same way

## Setup

In [None]:
import pandas as pd

# Option A: If running from src/integration
from route_tankerkoenig_integration import (
    get_fuel_prices_for_route, # Main function for fully automated pipeline
    integrate_route_with_prices  # Backup for when ORS API is down
)


# Option B: If running from project root
'''
from src.integration.route_tankerkoenig_integration import (
    get_fuel_prices_for_route, # Main function for fully automated pipeline
    integrate_route_with_prices  # Backup for when ORS API is down
)

'''

## Method 1: Complete Pipeline (Use for Streamlit)

This is what we need to use in the Streamlit dashboard:

In [None]:
# User inputs (from Streamlit text fields)
start_locality = "Tübingen"  # REQUIRED
end_locality = "Reutlingen"  # REQUIRED
start_address = "Wilhelmstraße 7"  # Optional (empty string = city center)
end_address = "Charlottenstraße 45"  # Optional (empty string = city center)

# Main function call get_fuel_prices_for_route()
# Internally: Runs route_stations.py --> creates the route --> route output used as input inside get_fuel_prices_for_route() --> returns model_input (list of dictionaries for the model)
try:
    model_input = get_fuel_prices_for_route(
        start_locality=start_locality,
        end_locality=end_locality,
        start_address=start_address,
        end_address=end_address,
        use_realtime=False  # True for live prices in production
    )
    
    print(f"✓ Success! Found {len(model_input)} stations")
    
except Exception as e:
    print(f"✗ Pipeline failed: {e}")
    print("\nUsing example data instead (ORS API might be down)...")
    
    # Fallback to example data
    model_input = None  # We'll use Method 2 below


COMPLETE FUEL PRICE PIPELINE
Route: Tübingen → Reutlingen
Mode: HISTORICAL (demo)

Step 1: Geocoding addresses...
  Start: Wilhelmstraße, Tübingen, BW, Germany (48.52987, 9.07302)
  End: Charlottenstraße 45, Reutlingen, BW, Germany (48.49502, 9.22013)

Step 2: Calculating route...
  Distance: 15.2 km
  Duration: 22 min

Step 3: Finding fuel stations along route...
  Route has 272 coordinate points
  Buffer: 300m, Route length: 15.2km
Simplified route from 272 to 45 points

 Delete later 
 Status code: 200 Reason: OK
  Found 0 stations

  Possible reasons:
    1. ORS POI database doesn't have stations for this area
    2. Buffer distance is too small (current: 300m)
    3. Route is very short
    4. ORS API rate limit reached

  TIP: Try running route_stations.py directly to verify it works.
✓ Success! Found 0 stations


## Method 2: Using Example Data (For testing)

When ORS API is down, use this approach with known coordinates:

In [21]:
# Example coordinates (from when route_stations.py worked)
example_stations_with_eta = [
    {
        "name": "Aral",
        "lat": 48.51279,
        "lon": 9.07341,
        "distance": 45.7,
        "distance_along_m": 3058,
        "fraction_of_route": 0.201,
        "eta": "2025-11-20T14:35:49.644499"
    },
    {
        "name": "Esso",
        "lat": 48.49228,
        "lon": 9.20297,
        "distance": 33.8,
        "distance_along_m": 13494,
        "fraction_of_route": 0.887,
        "eta": "2025-11-20T14:50:46.836178"
    },
    {
        "name": "Jet",
        "lat": 48.49178,
        "lon": 9.19813,
        "distance": 29.3,
        "distance_along_m": 13140,
        "fraction_of_route": 0.864,
        "eta": "2025-11-20T14:50:16.412891"
    }
]

# Run integration (skips route finding, goes straight to price enrichment)
model_input = integrate_route_with_prices(
    stations_with_eta=example_stations_with_eta,
    use_realtime=False
)

print(f"✓ Using example data: {len(model_input)} stations")


ROUTE-TANKERKOENIG INTEGRATION
Mode: HISTORICAL (yesterday = current)
Input stations: 3
Loading all stations from Supabase...
  Loaded 5,000 stations...
  Loaded 10,000 stations...
  Loaded 15,000 stations...
  Filtered out 34 stations with invalid coordinates
Loaded 17,651 valid stations from Supabase

Matching stations to Tankerkoenig database...
Matched: 3, Unmatched: 0

Fetching historical prices from Supabase...
Retrieved historical prices for 3 stations

Using yesterday's prices as current prices (demo mode)...

INTEGRATION COMPLETE
Total stations processed: 3
Stations with complete price data: 3
✓ Using example data: 3 stations


## OUTPUT Format

Both methods produce the same output format:

This is what will be used as input into the model

In [22]:
# Look at the first station
print("First station complete output:")
print("=" * 60)

first_station = model_input[0]
for key, value in first_station.items():
    print(f"{key:25} {value}")

First station complete output:
osm_name                  Aral
lat                       48.51279
lon                       9.07341
distance_along_m          3058
fraction_of_route         0.201
eta                       2025-11-20T14:35:49.644499
station_uuid              ca944555-1ee4-4228-979d-8313690ae950
tk_name                   Aral Tankstelle
brand                     ARAL
city                      Tübingen
tk_latitude               48.51295
tk_longitude              9.073625
match_distance_m          23.85
time_cell                 29
price_yesterday_e5        1.769
price_yesterday_e10       1.709
price_yesterday_diesel    1.729
price_7days_e5            1.789
price_7days_e10           1.729
price_7days_diesel        1.659
price_current_e5          1.769
price_current_e10         1.709
price_current_diesel      1.729
is_open                   None


## Accessing Specific Data Examples

In [23]:
# Example 1: Get brand of first station
first_station = model_input[0]
print(f"First station brand: {first_station['brand']}")

# Example 2: Get brand of second station
second_station = model_input[1]
print(f"Second station brand: {second_station['brand']}")

# Example 3: Get E5 price of third station
third_station = model_input[2]
print(f"Third station E5 price: €{third_station['price_current_e5']}")

# Example 4: Get time cell
print(f"First station time cell: {first_station['time_cell']}")

# Example 5: Compare OSM name vs Tankerkoenig name
print(f"OSM name: {first_station['osm_name']}")
print(f"Tankerkoenig name: {first_station['tk_name']}")

First station brand: ARAL
Second station brand: ESSO
Third station E5 price: €1.749
First station time cell: 29
OSM name: Aral
Tankerkoenig name: Aral Tankstelle


## Convert to DataFrame (Probably easier to Work With)

In [28]:
# Convert to pandas DataFrame
df = pd.DataFrame(model_input)

print(f"DataFrame shape: {df.shape}")
print(f"Number of stations: {len(df)}")
print(f"Number of columns: {len(df.columns)}")

# Display the full DataFrame
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.width', None)  # Don't wrap


# Display
df

DataFrame shape: (3, 24)
Number of stations: 3
Number of columns: 24


Unnamed: 0,osm_name,lat,lon,distance_along_m,fraction_of_route,eta,station_uuid,tk_name,brand,city,tk_latitude,tk_longitude,match_distance_m,time_cell,price_yesterday_e5,price_yesterday_e10,price_yesterday_diesel,price_7days_e5,price_7days_e10,price_7days_diesel,price_current_e5,price_current_e10,price_current_diesel,is_open
0,Aral,48.51279,9.07341,3058,0.201,2025-11-20T14:35:49.644499,ca944555-1ee4-4228-979d-8313690ae950,Aral Tankstelle,ARAL,Tübingen,48.51295,9.073625,23.85,29,1.769,1.709,1.729,1.789,1.729,1.659,1.769,1.709,1.729,
1,Esso,48.49228,9.20297,13494,0.887,2025-11-20T14:50:46.836178,e8d55212-b30f-449e-b65c-913a7b2002b1,Esso Tankstelle,ESSO,REUTLINGEN,48.492396,9.202997,13.05,29,1.759,1.699,1.669,1.749,1.689,1.629,1.759,1.699,1.669,
2,Jet,48.49178,9.19813,13140,0.864,2025-11-20T14:50:16.412891,51d4b6c4-a095-1aa0-e100-80009459e03a,JET REUTLINGEN KONRAD-ADENAUER-STR. 63,JET,REUTLINGEN,48.49176,9.19802,8.43,29,1.749,1.689,1.649,1.739,1.679,1.619,1.749,1.689,1.649,


## Access Data in DataFrame

In [25]:
# Get all brands
print("All brands:")
print(df['brand'].tolist())

# Get brand of second station
print(f"\nSecond station brand: {df.loc[1, 'brand']}")

# Get all E5 prices
print("\nAll current E5 prices:")
print(df['price_current_e5'].tolist())

# Summary view
print("\nSummary:")
summary = df[['tk_name', 'brand', 'time_cell', 'price_current_e5']]
print(summary)

All brands:
['ARAL', 'ESSO', 'JET']

Second station brand: ESSO

All current E5 prices:
[1.769, 1.759, 1.749]

Summary:
                                  tk_name brand  time_cell  price_current_e5
0                         Aral Tankstelle  ARAL         29             1.769
1                         Esso Tankstelle  ESSO         29             1.759
2  JET REUTLINGEN KONRAD-ADENAUER-STR. 63   JET         29             1.749


## Model: Prepare Data for Model

Extract only the columns needed for prediction:

In [None]:
# Model input columns (what the model needs)
model_columns = [
    'brand',                    # Station brand (categorical)
    'time_cell',                # Arrival time (0-47)
    'price_yesterday_e5',       # Yesterday's price at same time
    'price_7days_e5'            # Price 7 days ago at same time
]

# Target column (what we want to predict)
target_column = 'price_current_e5'

# Prepare features (X) and target (y)
X = df[model_columns].copy()
y = df[target_column].copy()

print("Features for model (X):")
print(X)
print("\nTarget to predict (y):")
print(y)

Features for model (X):
  brand  time_cell  price_yesterday_e5  price_7days_e5
0  ARAL         29               1.769           1.789
1  ESSO         29               1.759           1.749
2   JET         29               1.749           1.739

Target to predict (y):
0    1.769
1    1.759
2    1.749
Name: price_current_e5, dtype: float64


## Summary

### Production Workflow:

```python
# Step 1: User enters addresses in Streamlit
start_locality = "Tübingen"
end_locality = "Reutlingen"

# Step 2: ONE function call gets everything
model_input = get_fuel_prices_for_route(
    start_address="",
    start_locality=start_locality,
    end_address="",
    end_locality=end_locality,
    use_realtime=True
)

# Step 3: Convert to DataFrame
df = pd.DataFrame(model_input)

# Step 4: Model uses the output for its predictions
X = df[model_columns]
predictions = model.predict(X)
```