# Kje naj danes kolesarim?
## Geografska analiza rekreativnega kolesarjenja v Sloveniji
---
Postopek pridobivanja in prvotnega urejanja prvotnih podatkov o segmentih iz strave

Podatke pridobivamo iz aplikacije Strava, prek strava api. Request za segmente vrne 10 "top" segmentov na področju podanim z najbolj jugozahodno koordinato in najbol severovzhodno koordinato. Segment mora biti popolnoma vsebovan v območju (začetek in konec) da je štet. Zato naredimo dva prehoda Slovenije: 1. Manjša območja z 33% prekrivanjem med območji, 2. večja območja s 25% prekrivanjem med področji.

skrajne koordinate slovenije: 
- W: 46.29733474309481,13.377298277567231
- S: 45.42310226559324,15.176809675876207
- E: 46.47550563728631,16.595553138466073
- N: 46.8749042171303,16.234607006235194

In [99]:
import numpy as np

def generate_grid(lat_min, lat_max, lon_min, lon_max, 
                  step, overlap):
    """
    Generates bounding boxes over a region with optional overlap.
    Returns a list of tuples: (sw_lat, sw_lon, ne_lat, ne_lon)
    """
    boxes = []

    lat_step = step - overlap
    lon_step = step - overlap

    lat_vals = np.arange(lat_min, lat_max, lat_step)
    lon_vals = np.arange(lon_min, lon_max, lon_step)

    for lat in lat_vals:
        for lon in lon_vals:
            sw_lat = lat
            sw_lon = lon
            ne_lat = min(lat + step, lat_max)
            ne_lon = min(lon + step, lon_max)
            boxes.append((sw_lat, sw_lon, ne_lat, ne_lon))

    return boxes


Smaller boxes:

In [100]:
# Extreme points
lat_min = 45.4231
lat_max = 46.8749
lon_min = 13.3773
lon_max = 16.5955

# Generate grid
grid_small = generate_grid(lat_min, lat_max, lon_min, lon_max, step=0.15, overlap=0.05)

# Print first few boxes
# for box in grid:
#     a, b, c, d = box
#     print(f"SW: ({a}, {b}), NE: ({c}, {d})")

print(len(grid_small), "boxes generated.")

495 boxes generated.


Bigger boxes:

In [101]:
# Generate grid
grid_big = generate_grid(lat_min, lat_max, lon_min, lon_max, step=0.6, overlap=0.15)

# Print first few boxes
# for box in grid:
#     a, b, c, d = box
#     print(f"SW: ({a}, {b}), NE: ({c}, {d})")

print(len(grid_big), "boxes generated.")

32 boxes generated.


In [1]:
import requests
import json
import time

In [103]:
def get_segments_for_area(sw_lat, sw_lon, ne_lat, ne_lon):
    url = "https://www.strava.com/api/v3/segments/explore"
    
    # Define parameters for the API request
    params = {
        "bounds": f"{sw_lat},{sw_lon},{ne_lat},{ne_lon}",
        'activity_type': 'riding'
    }
    headers = {
    'accept': 'application/json',
    'authorization': '****', # Add your access token here
    }

    try:
        # Make the API request
        response = requests.get(url, headers=headers, params=params)
        
        # Check if the request was successful
        if response.status_code == 200:
            return response.json()  # Return the segments data in JSON format
        else:
            print(f"Error: {response.status_code} for bounds {sw_lat}, {sw_lon}, {ne_lat}, {ne_lon}")
            return None
    except requests.exceptions.RequestException as e:
        print(f"Request failed for bounds {sw_lat}, {sw_lon}, {ne_lat}, {ne_lon}: {e}")
        return None

# Function to save segments data to a file
def save_segments_to_file(data, filename):
    with open(filename, 'a') as f:
        json.dump(data, f)
        f.write("\n")  # To separate each entry with a newline

# Function to loop over grid cells and collect data
def fetch_and_save_segments(grid):

    grid_cells = grid
    file_name = f"{'big' if grid == grid_big else 'small'}box_segments_data.txt"
    
    # Loop over all grid cells and fetch segments
    for i, (sw_lat, sw_lon, ne_lat, ne_lon) in enumerate(grid_cells):
        print(f"Fetching segments for grid cell {i+1}/{len(grid_cells)}: SW({sw_lat}, {sw_lon}) - NE({ne_lat}, {ne_lon})")
        
        # Fetch segments for this grid cell
        segments_data = get_segments_for_area(sw_lat, sw_lon, ne_lat, ne_lon)
        
        if segments_data:
            # Save the fetched data to a file
            save_segments_to_file(segments_data, filename=file_name)
        
        # To avoid hitting the API rate limits, add a delay
        time.sleep(10)

# Fetch and save segments for the entire region, divided into 5x5 grid cells
# fetch_and_save_segments(grid_small)
# fetch_and_save_segments(grid_big)


Združevanje obeh file-ov, odstranjevanje segmentov ki se pojavijo večkrat, dobimo vmesni file ki ima vse unikatne segmente na določenem območju (pravokotnik s koordinatami ekstremnih točk slovenije). Pripravljeno za nadaljno čiščenje: --> segmenti popolnoma izven slovenije, prekratki segmenti, predolgi segmenti, ...

In [104]:
def load_segments(file_path):
    segments = []
    with open(file_path, 'r') as f:
        for line in f:
            data = json.loads(line)
            segments.extend(data.get('segments', []))
    return segments

small_segments = load_segments('data/raw/smallbox_segments_data.txt')
big_segments = load_segments('data/raw/bigbox_segments_data.txt')

all_segments = small_segments + big_segments
unique_segments = {seg['id']: seg for seg in all_segments}  # Keep latest if duplicated
filtered_segments = list(unique_segments.values())

# with open('data/intermediate/filtered_segments.json', 'w') as f:
#     json.dump(filtered_segments, f, indent=2)

print(f"Loaded {len(all_segments)} segments -> {len(filtered_segments)} unique segments saved.")


Loaded 5128 segments -> 3190 unique segments saved.


Dodatno čiščenje segmentov: 
1. Prekratki in predolgi: < 100m (kdorkoli lahko naredi segment, sigurno veliko takšnih ki so velikokrat prevoženi vendar nepomembni)
                          > 25km (meni poznano ni v sloveniji zanimivih poti daljših od tega, verjetno samo kakšne trase maratonov dirk itd.)

In [105]:
short_segments = [seg for seg in filtered_segments if seg['distance'] < 300]


print(len(short_segments), "short segments found.")


long_segments = [seg for seg in filtered_segments if seg['distance'] > 25000]

print()
print(len(long_segments), "long segments found.")
for seg in long_segments:
    print(seg['name'], seg['distance'])


#ni bilo smiselno hah

41 short segments found.

8 long segments found.
Kolpski krog 68899.5
Koč.Reka,Gotenica (od table do table) 40733.8
Hinjski krog 32900.6
Kalce - Col - Godovič - Kalce 46540.0
ŠP torkova obvoz Loka-Dobrina 54104.8
Torkova runda (kratka) 53254.8
weitensfeld/prekova/flatnitz/abzw.deutschgr. 63850.6
Dichterstein_Runde 25445.4


Skelpam da segments/explore klic vrača top 10 ne glede na to koliko voženj imajo ampak koloko uporabnikov jim je dalo zvezdico (kar že samo filtrira ven random kratke segmente ki so sicer veliko prevoženi), sicer žal nikjer ne piše točno po kakšnem kriteriju vrača segmente.

In [None]:
import folium

with open("data/intermediate/filtered_segments.json") as f:
    segments = json.load(f)

map_center = [46.1512, 14.9955]
m = folium.Map(location=map_center, zoom_start=8)

for seg in segments:
    start_lat = seg['start_latlng'][0]
    start_lon = seg['start_latlng'][1]
    end_lat = seg['end_latlng'][0]
    end_lon = seg['end_latlng'][1]

    # Izračun sredine segmenta
    mid_lat = (start_lat + end_lat) / 2
    mid_lon = (start_lon + end_lon) / 2

    popup = f"{seg['name']}<br>Length: {seg['distance']:.1f} m<br>Elev: {seg['elev_difference']:.1f} m"
    folium.CircleMarker(
        location=[mid_lat, mid_lon],
        radius=3,
        color='blue',
        fill=True,
        popup=popup
    ).add_to(m)

m.save("segments_map.html")


Čiščenje segmentov popolnoma izven zemljevida slovenije

In [None]:
from shapely.geometry import shape

with open("countries.geojson") as f:
    data = json.load(f)

# Slovenija
slovenia_geom = None
for feature in data["features"]:
    if feature["properties"]["name"] == "Slovenia":
        slovenia_geom = shape(feature["geometry"])
        break

if slovenia_geom is None:
    raise Exception("Slovenia not found in GeoJSON.")


In [None]:
from shapely.geometry import Point

def is_segment_inside_slovenia(segment, slovenia_geom):
    start = Point(segment['start_latlng'][1], segment['start_latlng'][0])
    end = Point(segment['end_latlng'][1], segment['end_latlng'][0])
    
    return slovenia_geom.contains(start) and slovenia_geom.contains(end)

In [None]:
with open("data/intermediate/filtered_segments.json") as f:
    segments = json.load(f)

filtered_segments = []
for seg in segments:
    if is_segment_inside_slovenia(seg, slovenia_geom):
        filtered_segments.append(seg)

with open('data/intermediate/filtered_segments_inside_slovenia.json', 'w') as f:
    json.dump(filtered_segments, f, indent=2)


In [106]:
print(slovenia_geom.bounds)
print(slovenia_geom.area)
print(slovenia_geom.contains(Point(14.9955, 46.1512)))  # Središče Slovenije

(13.365261, 45.423637, 16.515302, 46.863962)
2.365766635100998
True


In [107]:
print(len(filtered_segments), "segments inside Slovenia found.")

3190 segments inside Slovenia found.


In [None]:
map_center = [46.1512, 14.9955]
m = folium.Map(location=map_center, zoom_start=8)

for seg in filtered_segments:
    start_lat = seg['start_latlng'][0]
    start_lon = seg['start_latlng'][1]
    end_lat = seg['end_latlng'][0]
    end_lon = seg['end_latlng'][1]

    # Izračun sredine segmenta
    mid_lat = (start_lat + end_lat) / 2
    mid_lon = (start_lon + end_lon) / 2

    popup = f"{seg['name']}<br>Length: {seg['distance']:.1f} m<br>Elev: {seg['elev_difference']:.1f} m"
    folium.CircleMarker(
        location=[mid_lat, mid_lon],
        radius=3,
        color='blue',
        fill=True,
        popup=popup
    ).add_to(m)

m.save("segments_map2.html")


In [None]:
with open("data/intermediate/filtered_segments.json") as f:
    segments = json.load(f)

with open("data/intermediate/filtered_segments_inside_slovenia.json") as f:
    filtered_segments = json.load(f)

segments_outside_slovenia = [seg for seg in segments if not seg in filtered_segments]
print(len(segments_outside_slovenia), "segments outside Slovenia found.")

map_center = [46.1512, 14.9955]
m = folium.Map(location=map_center, zoom_start=8)

for seg in segments_outside_slovenia:
    start_lat = seg['start_latlng'][0]
    start_lon = seg['start_latlng'][1]
    end_lat = seg['end_latlng'][0]
    end_lon = seg['end_latlng'][1]

    # Izračun sredine segmenta
    mid_lat = (start_lat + end_lat) / 2
    mid_lon = (start_lon + end_lon) / 2

    popup = f"{seg['name']}<br>Length: {seg['distance']:.1f} m<br>Elev: {seg['elev_difference']:.1f} m"
    folium.CircleMarker(
        location=[mid_lat, mid_lon],
        radius=3,
        color='blue',
        fill=True,
        popup=popup
    ).add_to(m)

m.save("segments_map_outside_slovenia.html")

In [108]:
HEADERS2 = {
    'Authorization': f'****',
    'Accept': 'application/json'
}

def get_segment_details(segment_id):
    url = f"https://www.strava.com/api/v3/segments/{segment_id}"
    try:
        response = requests.get(url, headers=HEADERS2)
        if response.status_code == 200:
            return response.json()
        else:
            print(f"Error fetching segment {segment_id}: {response.status_code}")
            return None
    except requests.exceptions.RequestException as e:
        print(f"Request failed for segment {segment_id}: {e}")
        return None

# detailed_segments = []

# with open("data/intermediate/filtered_segments_inside_slovenia.json") as f:
#     base_segments = json.load(f)

# for i, segment in enumerate(base_segments):
#     seg_id = segment["id"]
#     print(f"Fetching {i+1}/{len(base_segments)}: Segment ID {seg_id}")
#     details = get_segment_details(seg_id)
#     if details:
#         detailed_segments.append(details)
#     time.sleep(10)  # zmanjša možnost prekoračitve rate limita

# # Shrani rezultate v novo datoteko
# with open("data/intermediate/segment_details.json", "w") as out_file:
#     json.dump(detailed_segments, out_file, indent=2)

In [109]:
with open("data/intermediate/detailed_segments.json") as f:
    segments = json.load(f)

with open("data/intermediate/filtered_segments_inside_slovenia.json") as f:
    filtered_segments = json.load(f)

detailed_ids = {d_seg['id'] for d_seg in segments}

failed_segments = [seg for seg in filtered_segments if seg["id"] not in detailed_ids]

print(len(detailed_ids), "detailed segments found.")
print(len(failed_segments), "segments not found in detailed segments.")

1526 detailed segments found.
0 segments not found in detailed segments.


1526 segmentov v Sloveniji z dodatnimi podatki kot so število voženj, najhitrejša vožnja, maximalni naklon ,... pripravljeno na dodatno obdelavo:

- brisanje nepotrebnih atributov kot so `resource state`, `private`, `hazardous`, `starred`, `created in updated at`, `star count`, `athlete_segments_stats`, `local legend`, `city`, `state`.
- Dodajanje lastnih (območje/ regija, lastno kategoriziranje, flat/hilly, long/short/medium)

In [110]:
print(segments[0].keys())

dict_keys(['id', 'resource_state', 'name', 'activity_type', 'distance', 'average_grade', 'maximum_grade', 'elevation_high', 'elevation_low', 'start_latlng', 'end_latlng', 'elevation_profile', 'elevation_profiles', 'climb_category', 'city', 'state', 'country', 'private', 'hazardous', 'starred', 'created_at', 'updated_at', 'total_elevation_gain', 'map', 'effort_count', 'athlete_count', 'star_count', 'athlete_segment_stats', 'xoms', 'local_legend'])
