# MVG Rad Bike Ride Data Analysis – Munich

This analysis explores a dataset of public bike rental trips in Munich using Kotlin, DataFrame, and Kandy/Kandy-Geo.

It focuses on revealing patterns in:
- Temporal distribution of rides (by hour, month, season, weekday)
- Most used stations and common routes
        - Spatial distribution of rides across Munich districts
- Ride durations and geographical start/end points

The dataset contains geolocation data, timestamps, and station names. By combining this with a GeoJSON file of Munich’s Stadtbezirke (districts), we’re able to assign rides to administrative areas and derive deeper insights.

The goal of the analysis is to uncover **meaningful patterns in urban mobility**, understand how users interact with the bike-sharing system, and identify spatial or temporal anomalies.

Each step in the analysis is documented separately, including data cleaning, transformation, visualization, and geospatial reasoning.

---


In [19]:

%useLatestDescriptors
%use dataframe, kandy


# Step 1: Read and Clean Data

This step reads the CSV file using a German locale, then converts time and coordinate fields to usable formats.
It also computes ride durations and a route label combining start and end station names.


In [39]:
import java.util.Locale

val df = DataFrame.readCsv(
    fileOrUrl = "data/mvg-rad-data.csv",
    delimiter = ';',
    parserOptions = ParserOptions(locale = java.util.Locale.GERMAN)
)
df.describe()

name,type,count,unique,nulls,top,freq,mean,std,min,p25,median,p75,max
Row,Int,710106,710106,0,1,1,355053.5,204990.089464,1,177526.916667,355053.500000,532580.083333,710106
STARTTIME,kotlinx.datetime.LocalDateTime,710106,301480,0,2023-05-19T18:24,25,,,2023-01-01T00:26,2023-05-06T16:43,2023-07-04T19:35,2023-09-07T20:35,2023-12-31T23:54
ENDTIME,kotlinx.datetime.LocalDateTime,710106,300963,0,2023-03-09T11:26,22,,,2023-01-01T00:42,2023-05-06T17:17,2023-07-04T19:54,2023-09-07T21:04,2024-01-01T16:00
STARTLAT,Double,710106,14295,0,0.000000,7885,47.61,5.045196,0.000000,48.128250,48.143190,48.159040,53.094660
STARTLON,Double,710106,20472,0,11.558320,3653,11.772959,3.141217,-71.178000,11.549040,11.567820,11.584500,141.353220
ENDLAT,Double,710106,14941,0,0.000000,8103,47.589505,5.17064,-55.973800,48.128300,48.143190,48.159020,53.094660
ENDLON,Double,710106,21391,0,0.000000,3843,11.720168,3.034583,-99.259350,11.549040,11.567710,11.584430,141.353220
RENTAL_IS_STATION,Int?,710106,4,44,0,549834,0.255351,0.739268,0,0.000000,0.000000,0.000000,12
RENTAL_STATION_NAME,String?,710106,332,551795,Sandstraße,3570,,,AGROB Nord Ismaning,Giesing,Laimer Platz,Romanplatz,astopark
RETURN_IS_STATION,Int?,710106,4,60,0,583801,0.215537,0.788633,0,0.000000,0.000000,0.000000,12


In [40]:
import org.jetbrains.kotlinx.dataframe.api.*
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter
import java.time.Duration

val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm")


val cleaned = df
    .convert("STARTTIME").with { LocalDateTime.parse(it.toString(), formatter) }
    .convert("ENDTIME").with { LocalDateTime.parse(it.toString(), formatter) }
    .convert("STARTLAT", "STARTLON", "ENDLAT", "ENDLON")
    .with { it.toString().replace(',', '.').toDouble() }

val withDuration = cleaned.add("DURATION_MIN") {
    val start = it["STARTTIME"] as LocalDateTime
    val end = it["ENDTIME"] as LocalDateTime
    Duration.between(start, end).toMinutes().toDouble()
}

val rides = withDuration.add("ROUTE") { row ->
    val start = row["RENTAL_STATION_NAME"]?.toString()?.ifBlank { "Non-station" } ?: "Non-station"
    val end = row["RETURN_STATION_NAME"]?.toString()?.ifBlank { "Non-station" } ?: "Non-station"
    "$start → $end"
}

rides.head()

Row,STARTTIME,ENDTIME,STARTLAT,STARTLON,ENDLAT,ENDLON,RENTAL_IS_STATION,RENTAL_STATION_NAME,RETURN_IS_STATION,RETURN_STATION_NAME,DURATION_MIN,ROUTE
1,2023-01-01T00:26,2023-01-01T00:51,48.13795,11.54569,48.16123,11.55782,0,,1,Barbarastr,25.0,Non-station → Barbarastr
2,2023-01-01T00:30,2023-01-01T00:42,48.12903,11.54431,48.14797,11.53445,0,,0,,12.0,Non-station → Non-station
3,2023-01-01T00:32,2023-01-01T00:45,48.16841,11.55566,48.16467,11.57649,0,,0,,13.0,Non-station → Non-station
4,2023-01-01T00:34,2023-01-01T00:46,48.16843,11.55567,48.16464,11.57648,0,,0,,12.0,Non-station → Non-station
5,2023-01-01T00:35,2023-01-01T00:51,48.17104,11.54878,48.16243,11.53007,0,,0,,16.0,Non-station → Non-station


# Step 2: Histogram of Ride Duration

A histogram is plotted for rides under 60 minutes to understand the distribution of trip durations.


In [44]:
import org.jetbrains.letsPlot.Stat

val capped = rides.filter { (it["DURATION_MIN"] as Number).toDouble() < 60 }

capped.plot {
    histogram("DURATION_MIN", binsOption = BinsOption.byNumber(30)) {
        fillColor(Stat.count) {
            scale = continuous(Color.GREEN..Color.RED)
        }
        borderLine.color = Color.BLACK
    }
    layout {
        title = "Duration of bike rides"
        x {
            scale = continuous(limits = 0.0..60.0)
        }
    }
}

# Step 3: Most Frequent Rental Stations

Top 10 stations where users most frequently rented bikes are shown in a vertical bar chart.


In [23]:
val stationUsage = rides
    .filter { it["RENTAL_STATION_NAME"] != null }
    .groupBy("RENTAL_STATION_NAME")
    .count()
    .sortByDesc("count")
    .take(10)

stationUsage.plot {
    layout.title = "Most Frequent Rental Stations"
    bars {
        x("RENTAL_STATION_NAME") { axis.name = "Station" }
        y("count") { axis.name = "Number of Rentals" }
        fillColor = Color.BLUE
    }
}


# Step 4: Top 10 Most Common Routes

The 10 most frequent start→end station routes are plotted as bars, excluding non-station routes.


In [24]:
val routes = rides.groupBy("ROUTE")
    .count()
    .filter { it["ROUTE"] != "Non-station → Non-station" }
    .sortByDesc("count")
    .take(10)

routes.plot {
    bars {
        x("ROUTE") { axis.name = "Route" }
        y("count") { axis.name = "Count" }
        fillColor = Color.ORANGE
    }
    layout.title = "Top 10 Most Common Routes"
}

In [25]:
val routes = rides.groupBy("ROUTE")
    .count()
    .filter { it["ROUTE"]?.toString()?.let { r ->
        !r.startsWith("Non-station") && !r.endsWith("Non-station")
    } == true }
    .sortByDesc("count")
    .take(10)

routes.plot {
    bars {
        x("ROUTE") { axis.name = "Route" }
        y("count") { axis.name = "Count" }
        fillColor = Color.ORANGE
    }
    layout.title = "Top 10 Most Common Routes"
}

# Step 5: Hourly Usage

A line plot shows ride demand by hour of the day to reveal peak usage times.


In [26]:
import java.time.LocalDateTime

val demandPerHour = rides
    .add("HOUR") { (it["STARTTIME"] as LocalDateTime).hour }
    .groupBy("HOUR")
    .count()
    .sortBy("HOUR")

demandPerHour.plot {
    line {
        x("HOUR") { axis.name = "Hour of Day" }
        y("count") { axis.name = "Number of Rides" }
        color = Color.LIGHT_GREEN
    }
    points {
        x("HOUR")
        y("count")
    }
    layout.title = "Hourly Bike Usage"
}

# Step 6: Geolocation of Start and End Points

Using a filtered subset of rides with valid lat/lon, a scatter plot maps the red (start) and blue (end) points.


In [49]:
val sampleSize = 1000
val filteredRides = rides.shuffle().head(1000).filter { row ->
    val startLon = row["STARTLON"] as Double
    val startLat = row["STARTLAT"] as Double
    val endLon = row["ENDLON"] as Double
    val endLat = row["ENDLAT"] as Double

    val lonMin = 11.3
    val lonMax = 11.8
    val latMin = 48.0
    val latMax = 48.3

    val startValid = startLon != 0.0 && startLat != 0.0 &&
            startLon in lonMin..lonMax && startLat in latMin..latMax
    val endValid = endLon != 0.0 && endLat != 0.0 &&
            endLon in lonMin..lonMax && endLat in latMin..latMax

    startValid && endValid
}



filteredRides.plot {
    points {
        x("STARTLON") { axis.name = "Longitude" }
        y("STARTLAT") { axis.name = "Latitude" }
        color = Color.RED
        alpha = 0.5
    }
    points {
        x("ENDLON")
        y("ENDLAT")
        color = Color.BLUE
        alpha = 0.5
    }
    layout.title = "Geolocation of Start (Red) and End (Blue) Points"
}



# Step 7: Export to GeoJSON

Converts a filtered subset of rides to a GeoJSON format for geospatial analysis or use in map visualizations.


In [28]:
import kotlinx.serialization.Serializable
import kotlinx.serialization.SerialName
import kotlinx.serialization.encodeToString
import kotlinx.serialization.json.Json
import java.io.File
import org.jetbrains.kotlinx.dataframe.DataFrame

@Serializable
@SerialName("FeatureCollection")
data class GeoJsonFeatureCollection(
    val type: String = "FeatureCollection",
    val features: List<GeoJsonFeature>
)

@Serializable
@SerialName("Feature")
data class GeoJsonFeature(
    val type: String = "Feature",
    val geometry: GeoJsonGeometry,
    val properties: Map<String, String>? = null
)

@Serializable
@SerialName("Point")
data class GeoJsonGeometry(
    val type: String = "Point",
    val coordinates: List<Double>
)

fun saveFilteredRidesAsGeoJson(filteredRides: DataFrame<*>, relativeDir: String, filename: String) {
    val features = mutableListOf<GeoJsonFeature>()

    for (row in filteredRides) {
        val startLon = row["STARTLON"] as Double
        val startLat = row["STARTLAT"] as Double
        val endLon = row["ENDLON"] as Double
        val endLat = row["ENDLAT"] as Double

        features.add(
            GeoJsonFeature(
                geometry = GeoJsonGeometry(coordinates = listOf(startLon, startLat)),
                properties = mapOf("point" to "start")
            )
        )
        features.add(
            GeoJsonFeature(
                geometry = GeoJsonGeometry(coordinates = listOf(endLon, endLat)),
                properties = mapOf("point" to "end")
            )
        )
    }

    val featureCollection = GeoJsonFeatureCollection(features = features)
    val jsonString = Json { prettyPrint = true; encodeDefaults = true }.encodeToString(featureCollection)

    val dir = File(relativeDir)
    if (!dir.exists()) dir.mkdirs()  // create folder if it doesn’t exist

    val file = File(dir, filename)
    file.writeText(jsonString)
    println("Saved GeoJSON with ${features.size} points to ${file.absolutePath}")
}

// Save to a relative folder "output" (folder created if missing)
saveFilteredRidesAsGeoJson(filteredRides, "data", "rides.geojson")

// Show preview of saved file
//println(File("output", "rides.geojson").readText().take(500))


Saved GeoJSON with 1926 points to /Users/enriquelopezmanas/Documents/Machine-Learning/mvg-bike-analysis//data/rides.geojson


# Step 8: Map Rides Over Munich Districts

Overlays bike ride start points onto a polygon map of Munich's administrative districts using Kandy-Geo.


In [29]:
%use kandy-geo
val bikeRides =
    GeoDataFrame.readGeoJson("data/rides.geojson")

In [30]:

%use kandy-geo
// Load GeoJSON from your local file path
val munichArea = GeoDataFrame.readGeoJson("data/munich_geojson.json")

munichArea.df.geometry.type().toString()// Plot the polygon(s) of Munich city area
munichArea.plot {
    geoMap() {
        fillColor = Color.LIGHT_BLUE
        borderLine {
            color = Color.RED
            width = 1.5
        }
    }

    layout {
        title = "Munich Metropolitan Area"

    }
}

In [31]:

%use kandy-geo
// Load GeoJSON from your local file path
val munichArea = GeoDataFrame.readGeoJson("data/munich_geojson.json")

munichArea.df.geometry.type().toString()// Plot the polygon(s) of Munich city area
munichArea.plot {
    geoMap() {
        fillColor = Color.LIGHT_BLUE
        borderLine {
            color = Color.RED
            width = 1.5
        }
    }
    withData(bikeRides) {
        geoPoints() {
            size = 1.0
            color = Color.YELLOW
        }
    }
    layout {
        title = "Munich Metropolitan Area"
        size = 700 to 500
    }
}

In [32]:
import java.time.DayOfWeek

val ridesWithDayType = rides.add("IS_WEEKEND") { row ->
    val day = (row["STARTTIME"] as LocalDateTime).dayOfWeek
    day == DayOfWeek.SATURDAY || day == DayOfWeek.SUNDAY
}

val usageByDayType = ridesWithDayType
    .groupBy("IS_WEEKEND")
    .count()

usageByDayType.plot {
    bars {
        x("IS_WEEKEND") { axis.name = "Is Weekend" }
        y("count") { axis.name = "Number of Rides" }
        fillColor = Color.LIGHT_BLUE
    }
    layout.title = "Weekend vs Weekday Usage"
}

# Step 9: Rides Per District

Uses JTS to determine which polygon (district) each ride falls into, then counts and visualizes per district.


In [33]:

import org.jetbrains.kotlinx.dataframe.api.*
        import org.locationtech.jts.geom.GeometryFactory
        import org.locationtech.jts.geom.Coordinate
        import org.locationtech.jts.geom.Point
        import org.locationtech.jts.geom.Geometry

// Prepare geometry tools
val geometryFactory = GeometryFactory()

// Extract district name and geometry from munichArea
val districts: List<Pair<String, Geometry>> = munichArea.df.rows().map { row ->
    val name = row["name"]?.toString() ?: "Unknown"  // or use the proper column for district name
    val geometry = row["geometry"] as Geometry
    name to geometry
}

// Assign each ride to a district
val ridesWithDistrict = rides.add("district") { row ->
    val lat = row["STARTLAT"]?.toString()?.toDoubleOrNull()
    val lon = row["STARTLON"]?.toString()?.toDoubleOrNull()

    if (lat == null || lon == null) return@add "Unknown"

    val point = geometryFactory.createPoint(Coordinate(lon, lat))
    val district = districts.firstOrNull { (_, geom) -> geom.contains(point) }

    district?.first ?: "Unknown"
}


// Count rides per district
val rideCounts = ridesWithDistrict
    .groupBy("district")
    .aggregate {
        count().into("rides")
    }
    .sortBy("rides")

val cleanedRideCounts = rideCounts
    .add("district_clean") {
        val regex = Regex("""^Stadtbezirk \d+\s*""")
        regex.replace(it["district"].toString(), "")
    }
cleanedRideCounts.plot {
    layout.title = "Number of Rides per Munich District"
     layout.size = 900 to 500
    barsH {
        y("district_clean") {
            axis.name = "District"
        }
        x("rides") {
            axis.name = "Number of Rides"
        }
        alpha = 0.75

    }
}


# Step 10: Monthly Usage Trend

Converts STARTTIME into year-month format and counts rides per month, plotted as a line chart.


In [34]:
import org.jetbrains.kotlinx.dataframe.api.*
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter

// Parse STARTTIME to LocalDateTime and extract year-month string
val ridesWithMonth = rides.add("yearMonth") { row ->
    val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm")
    val dateTimeStr = row["STARTTIME"].toString()
    val dateTime = LocalDateTime.parse(dateTimeStr, formatter)
    dateTime.format(DateTimeFormatter.ofPattern("yyyy-MM"))
}

// Group by yearMonth and count
val tripsByMonth = ridesWithMonth
    .groupBy("yearMonth")
    .aggregate {
        count().into("trips")
    }
    .sortBy("yearMonth")

// Plot
tripsByMonth.plot {
    layout.title = "Number of Trips by Month"
    line {
        x("yearMonth") {
            axis.name = "Month"
        }
        y("trips") {
            axis.name = "Number of Trips"
        }
        color = Color.BLUE

    }
}


# Step 11: Seasonal Usage

Maps each ride to a season based on its month, then counts and plots seasonal usage as horizontal bars.


In [35]:
import org.jetbrains.kotlinx.dataframe.api.*
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter
import org.jetbrains.kotlinx.dataframe.DataRow

val formatter = DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm")


// Add season column based on STARTTIME month
val ridesWithSeason = rides.add("season") { row ->
    val dateTimeStr = row["STARTTIME"].toString()
    val dateTime = LocalDateTime.parse(dateTimeStr, formatter)
    val month = dateTime.monthValue

    when (month) {
        12, 1, 2 -> "Winter"
        3, 4, 5 -> "Spring"
        6, 7, 8 -> "Summer"
        9, 10, 11 -> "Autumn"
        else -> "Unknown"
    }
}

val seasonOrder = listOf("Spring", "Summer", "Autumn", "Winter")


val tripsBySeason = ridesWithSeason
    .groupBy("season")
    .aggregate {
        count().into("trips")
    }
    // Add a helper column with the season index for sorting
    .add("seasonIndex") { row ->
        seasonOrder.indexOf(row["season"].toString())
    }
    // Sort by that helper column
    .sortBy("seasonIndex")
    // Drop the helper column if you want
    .remove("seasonIndex")

// Reverse the DataFrame order (so Winter is last and appears top)
val reversedTripsBySeason = tripsBySeason.reverse()

reversedTripsBySeason.plot {
    layout.title = "Number of Trips by Season"
    barsH {
        y("season") {
            axis.name = "Season"
        }
        x("trips") {
            axis.name = "Number of Trips"
        }
        alpha = 0.75
        fillColor("season") {
            scale = categoricalColorHue()
        }
    }
}


// Plot horizontal bars by season
reversedTripsBySeason.plot {
    layout.title = "Number of Trips by Season"
    barsH {
        y("season") {
            axis.name = "Season"

        }
        x("trips") {
            axis.name = "Number of Trips"
        }
        alpha = 0.75
        fillColor("season") {
            scale = categoricalColorHue()
        }
    }
}