<div align="right" style="text-align: right"><i>Peter Norvig, Oct 2017<br>pandas Aug 2020<br>Data updated monthly</i></div>

# Bike Stats Code

Code to support the analysis in the notebook [Bike-Stats.ipynb](Bike-Stats.ipynb).

In [20]:
from IPython.core.display import HTML
from typing import Iterator, Iterable, Tuple, List, Dict
from collections import namedtuple
import matplotlib
import matplotlib.pyplot as plt
import numpy  as np
import pandas as pd
import re

# Reading Data: `rides` and `yearly`

I saved a bunch of my recorded [Strava](https://www.strava.com/athletes/575579) rides, most of them longer than 25 miles, as [`bikerides.tsv`](bikerides.tsv).  The columns are: the date; the year; a title; the elapsed time of the ride; the length of the ride in miles; and the total climbing in feet, e.g.: 

    Mon, 10/5	2020	Half way around the bay on bay trail	6:26:35	80.05	541
    
I parse the file into the pandas dataframe `rides`, adding derived columns for miles per hour, vertical meters climbed per hour (VAM), grade in feet per mile, grade in percent, and kilometers ridden:

In [35]:
def parse_rides(lines):
    """Parse a bikerides.tsv file."""
    return drop_index(add_columns(pd.read_table(lines, comment='#',
                       converters=dict(hours=parse_hours, feet=parse_int))))

def parse_hours(time: str) -> float: 
    """Parse '4:30:00' => 4.5 hours."""
    hrs = sum(int(x) * 60 ** (i - 2) 
              for i, x in enumerate(reversed(time.split(':'))))
    return round(hrs, 2)

def parse_int(field: str) -> int: return int(field.replace(',', ''))

def add_columns(rides) -> pd.DataFrame:
    """Compute new columns from existing ones."""
    mi, hr, ft = rides['miles'], rides['hours'], rides['feet']
    return rides.assign(
        mph=round(mi / hr, 2),
        vam=round(ft / hr / 3.28084),
        fpm=round(ft / mi),
        pct=round(ft / mi * 100 / 5280, 2),
        kms=round(mi * 1.609, 2))

def drop_index(frame) -> pd.DataFrame:
    """Drop the index column."""
    frame.index = [''] * len(frame)
    return frame

In [39]:
rides  = parse_rides(open('bikerides.tsv'))
yearly = parse_rides(open('bikeyears.tsv')).drop(columns=['date', 'title'])

# Reading Data: `segments`

I picked some representative climbing segments ([`bikesegments.csv`](bikesegments.csv)) with the segment length in miles and climb in feet, along with several of my times on the segment. A line like

    Old La Honda, 2.98, 1255, 28:49, 34:03, 36:44
    
means that this segment of Old La Honda Rd is 2.98 miles long, 1255 feet of climbing, and I've selected three times for my rides on that segment: the fastest, middle, and slowest of the times  that Strava shows. (However, I ended up dropping the slowest time in the charts to make them less busy.)

In [23]:
def parse_segments(lines) -> pd.DataFrame:
    """Parse segments into rides. Each ride is a tuple of:
    (segment_title, time,  miles, feet_climb)."""
    records = []
    for segment in lines:
        title, mi, ft, *times = segment.split(',')[:5]
        for time in times:
            records.append((title, parse_hours(time), float(mi), parse_int(ft)))
    return add_columns(pd.DataFrame(records, columns=('title', 'hours', 'miles', 'feet')))

In [24]:
segments = parse_segments(open('bikesegments.csv'))

# Reading Data: `places` and `tiles`

Monthly, I will take my [summary data from wandrer.earth](https://wandrer.earth/athletes/3534/santa-clara-county-california) and enter it in the file [bikeplaces.csv](bikeplaces.csv), in a format where

      San Carlos,99.0,SMC,22.2,26.0,32.9,,37.2,39.0,40.5,,41.4,,,41.7,,,,,,59.5,78.7
means that San Carlos has 99.0 miles of roads, is in San Mateo County (SMC), and in the first month that I had ridden 22.2% of the roads in the first month that I kept track, and 78.7% in the most recent month. In months with no entry, there was no change.

In [25]:
places = pd.read_csv('bikeplaces.csv', comment='#')
months = [m for m in places.columns if '/' in m]
places['maxpct'] = [max(p for p in place[4:] if not pd.isna(p))
                    for place in places.itertuples()]

In [41]:
tiles  = drop_index(pd.DataFrame(columns='date tiles square cluster'.split(),
                                 data=[('Sep 2022', 2481, '11x11', 295)]))

Unnamed: 0,date,tiles,square,cluster
,Sep 2022,2481,11x11,295


# Plotting and Curve-Fitting

In [27]:
plt.rcParams["figure.figsize"] = (12, 6)

def show(X, Y, data, title='', degrees=(2, 3)): 
    """Plot X versus Y and a best fit curve to it, with some bells and whistles."""
    grid(); plt.ylabel(Y); plt.xlabel(X); plt.title(title)
    plt.scatter(X, Y, data=data, c='grey', marker='+')
    X1 = np.linspace(min(data[X]), max(data[X]), 100)
    for degree in degrees:
        F = poly_fit(data[X], data[Y], degree)
        plt.plot(X1, [F(x) for x in X1], '-')
    
def grid(axis='both'): 
    "Turn on the grid."
    plt.minorticks_on() 
    plt.grid(which='major', ls='-', alpha=3/4, axis=axis)
    plt.grid(which='minor', ls=':', alpha=1/2, axis=axis)
    
def poly_fit(X, Y, degree: int) -> callable:
    """The polynomial function that best fits the X,Y vectors."""
    coeffs = np.polyfit(X, Y, degree)[::-1]
    return lambda x: sum(c * x ** i for i, c in enumerate(coeffs)) 

estimator = poly_fit(rides['feet'] / rides['miles'], 
                   rides['miles'] / rides['hours'], 2)

def estimate(miles, feet, estimator=estimator) -> float:
    """Given a ride distance in miles and total climb in feet, estimate time in minutes."""
    return round(60 * miles / estimator(feet / miles))

def top(frame, field, n=20): return frame.sort_values(field, ascending=False).head(n)

# Plotting Wandrer Places

In [28]:
special_areas = dict(sf='San Francisco Neighborhoods', sj='San Jose Neighborhoods', 
                     far='Far Away Places', county='Bay Area Counties', big='California, USA and Earth')

def wandering(places, pcts=(100, 99, 90, 50, 33.3, 25, 0), special_areas=special_areas):
    "Plot charts within the various percent ranges, and special groups."
    for i in range(len(pcts) - 1):
        hi, lo = pcts[i], pcts[i + 1]
        inrange = places[(places.maxpct > lo) & (places.maxpct <= hi) & ~places.area.isin(special_areas)]
        wandrer_plot(f'Places with {lo}% to {hi}% roads traveled', inrange)
    for area in special_areas:
        wandrer_plot(special_areas[area], places[places.area == area])
        
def wandrer_plot(title, places):
    """Plot Wandrer.earth data for the places with given abbrevs."""
    if len(places) == 0:
        return # Don't make an empty plot
    places = places.sort_values(by='maxpct', ascending=False)
    fig, ax = plt.figure(), plt.subplot(111); 
    plt.plot()
    for (_, name, miles, area, *pcts, maxpct), marker in zip(places.itertuples(), markers):
        pcts = replace_nans(pcts)
        ax.plot(pcts, ':', marker=marker, label=label(pcts, name, miles))
    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), shadow=True,
              prop=matplotlib.font_manager.FontProperties(family='monospace'))
    plt.xticks(range(len(pcts)), labels=months, rotation=90, fontfamily='monospace')
    plt.ylabel('Percent of Roads Ridden')
    plt.title(title); plt.tight_layout(); grid(axis='y'); plt.show()
    
markers = '^v><osdhxDHPX*' * 3 # Matplotlib markers
bonuses = (0.02, 0.1, 2, 25, 50, 90, 99)   # Percents that earn important bonuses

def label(pcts, name, miles) -> str:
    """Make a label for the legend."""
    pct = f'{rounded(pcts[-1]):>3}' if pcts[-1] > 1.4 else f'{pcts[-1]}'
    done = miles * pcts[-1]
    bonus = next((f' {rounded((p - pcts[-1]) / 100 * miles):>3} to {p}%' 
                  for p in bonuses if p >= pcts[-1]), '')
    return f'{pct}% ({rounded(done / 100):>3}/{rounded(miles):<3} mi){bonus} {name}'
                                                                                          
def replace_nans(numbers) -> list:
    """Replace NaN (not a number) values with the previous actual number."""
    result = []
    prev = 0
    for x in numbers:
        if x == x:
            prev = x
        else: # Not a Number 
            x = prev
        result.append(x)
    return result                                                                           

def rounded(x: float) -> str: 
    """Round x to 3 spaces wide (if possible)."""
    return f'{round(x):,d}' if x > 10 else f'{x:.1f}'

# Pareto Front  

In [34]:
def make_leaders(data):
    """Make a dataframe of leaders in two counties."""
    leaders = pd.DataFrame(data, columns=['Name', 'Initials', 'SMC %', 'SCC %', 'Front?'])
    leaders['SMC miles'] = [round(29.51 * d[2]) for d in data]
    leaders['SCC miles'] = [round(75.64 * d[3]) for d in data]
    leaders['Total miles'] = leaders['SMC miles'] + leaders['SCC miles']
    leaders['Total %'] = leaders['SMC %'] + leaders['SCC %']
    return drop_index(leaders.sort_values('Total %', ascending=False))

leaders = make_leaders([ # Data as of Sept 8, 2022
    ('Barry Mann', 'BM', 75.34, 29.32, 1),   ('Jason Molenda', 'JM', 7.13, 54.59, 1),  
    ('Peter Norvig', 'PN', 55.26, 30.31, 1), ('Brian Feinberg', 'BF', 29.72, 35.93, 1),
    ('Jim Brooks', 'JB', 4.23, 43.53, 0),    ('Megan Gardner', 'MG', 92.51, 8.69, 1),
    ('Matthew Ring', 'MR', 75.53, 1.48, 0),  ('Elliot  Huff', 'EF', 51.78, 8.14, 0)])
                   
def pareto_front(leaders):
    ax = leaders.plot('SMC %', 'SCC %', grid=True, kind='scatter')
    front = sorted((x, y) for i, (_, _, x, y, f, *_) in leaders.iterrows() if f)
    ax.plot(*zip(*front), ':'); ax.axis('square'); grid()
    for i, (name, initials, x, y, *_) in leaders.iterrows():
        ax.text(x - 2, y + 2, initials)
    return leaders.drop(columns=['Front?'])

# Eddington Number

In [30]:
def Ed_number(rides, units) -> int:
    """Eddington number: The maximum integer e such that you have bicycled 
    a distance of at least e on at least e days."""
    distances = sorted(rides[units], reverse=True)
    return max(e for e, d in enumerate(distances, 1) if d >= e)

def Ed_gap(distances, target) -> int:
    """The number of rides needed to reach an Eddington number target."""
    return target - sum(distances >= target)

def Ed_gaps(rides, N=10) -> dict:
    """A table of gaps to Eddington numbers by year, and a plot.."""
    E_km = Ed_number(rides, 'kms')
    E_mi = Ed_number(rides, 'miles')
    data = [(E_km + d, Ed_gap(rides.kms,   E_km + d), 
             E_mi + d, Ed_gap(rides.miles, E_mi + d))
            for d in range(N)]
    df = pd.DataFrame(data, columns=['kms', 'kms gap', 'miles', 'miles gap'])
    return drop_index(df)

def Ed_progress(rides, years=reversed(range(2013, 2022 + 1))) -> pd.DataFrame:
    """A table of Eddington numbers by year, and a plot."""
    def Ed(year, unit): return Ed_number(rides[rides['year'] <= year], unit)
    data  = [(y, Ed(y, 'kms'), Ed(y, 'miles')) for y in years]
    df = pd.DataFrame(data, columns=['year', 'Ed_km', 'Ed_mi'])
    return drop_index(df)

# Climbing to Space

In [31]:
per_month_climbing = [35.491, 31.765, 39.186, 33.641, 32.782, 14.809, 46.731, 38.556]

space = {'100 kms': 328.204, '10 Everests': 290.320, '50 miles': 50 * 5.280}

def climbing(per_month=per_month_climbing, space=space):
    """Plot progress in climbing"""
    total = np.cumsum(per_month)
    for label in space:
        plt.plot(range(12), [space[label]] * 12, ':', label=label)
    plt.plot(range(len(total)), total, 'o-', label='my total')
    plt.plot(range(len(total)), per_month, 's-.', label='per month')
    plt.legend(loc=(1.04, .64), fontsize='large'); grid()
    plt.xlabel('Month of 2022'); plt.ylabel('Total climbing (Kft)')
    plt.xticks(range(12), 'Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec'.split())