<div align="right" style="text-align: right"><i>Peter Norvig, Oct 2017<br>pandas Aug 2020<br>Data updated monthly</i></div>

# Bike Code

Code to support the analysis in the notebook [Bike Speed versus Grade.ipynb](Bike%20Speed%20versus%20Grade.ipynb).

In [1]:
from IPython.core.display import HTML
from typing import Iterator, Iterable, Tuple, List, Dict
from collections import namedtuple
import matplotlib
import matplotlib.pyplot as plt
import numpy  as np
import pandas as pd
import re

# Reading Data: `rides`

I saved a bunch of my recorded [Strava](https://www.strava.com/athletes/575579) rides, most of them longer than 25 miles, as [`bikerides.tsv`](bikerides.tsv).  The columns are: the date; the year; a title; the elapsed time of the ride; the length of the ride in miles; and the total climbing in feet, e.g.: 

    Mon, 10/5	2020	Half way around the bay on bay trail	6:26:35	80.05	541
    
I parse the file into the pandas dataframe `rides`, adding derived columns for miles per hour, vertical meters climbed per hour (VAM), grade in feet per mile, grade in percent, and kilometers ridden:

In [2]:
def parse_rides(lines):
    """Parse a bikerides.tsv file."""
    return add_columns(pd.read_table(lines, comment='#',
                       converters=dict(hours=parse_hours, feet=parse_int)))

def parse_hours(time: str) -> float: 
    """Parse '4:30:00' => 4.5 hours."""
    hrs = sum(int(x) * 60 ** (i - 2) 
              for i, x in enumerate(reversed(time.split(':'))))
    return round(hrs, 2)

def parse_int(field: str) -> int: return int(field.replace(',', ''))

def add_columns(rides) -> pd.DataFrame:
    """Compute new columns from existing ones."""
    mi, hr, ft = rides['miles'], rides['hours'], rides['feet']
    return rides.assign(
        mph=round(mi / hr, 2),
        vam=round(ft / hr / 3.28084),
        fpm=round(ft / mi),
        pct=round(ft / mi * 100 / 5280, 2),
        kms=round(mi * 1.609, 2))

In [3]:
rides  = parse_rides(open('bikerides.tsv'))
yearly = parse_rides(open('bikeyears.tsv')).drop(columns=['date', 'title'])
yearly['miles'] = list(map(round, yearly['miles']))
yearly.index = [''] * len(yearly)

# Reading Data: `segments`

I picked some representative climbing segments ([`bikesegments.csv`](bikesegments.csv)) with the segment length in miles and climb in feet, along with several of my times on the segment. A line like

    Old La Honda, 2.98, 1255, 28:49, 34:03, 36:44
    
means that this segment of Old La Honda Rd is 2.98 miles long, 1255 feet of climbing, and I've selected three times for my rides on that segment: the fastest, middle, and slowest of the times  that Strava shows. (However, I ended up dropping the slowest time in the charts to make them less busy.)

In [4]:
def parse_segments(lines) -> pd.DataFrame:
    """Parse segments into rides. Each ride is a tuple of:
    (segment_title, time,  miles, feet_climb)."""
    records = []
    for segment in lines:
        title, mi, ft, *times = segment.split(',')[:5]
        for time in times:
            records.append((title, parse_hours(time), float(mi), parse_int(ft)))
    return add_columns(pd.DataFrame(records, columns=('title', 'hours', 'miles', 'feet')))

In [5]:
segments = parse_segments(open('bikesegments.csv'))

# Reading Data: `places`

Monthly, I will take my [summary data from wandrer.earth](https://wandrer.earth/athletes/3534/santa-clara-county-california) and enter it in the file [bikeplaceupdates.txt](bikeplaceupdates.txt), in a format where

      por |  48.2 | Portola Valley | 
      --------------------------------------------------------------------------------
      2022-03 por 99.5 sky 99.24
      
means that "por" is the abbreviation for Portola Valley, which has 48.2 miles of roads, and in March 2022, I had ridden 99.5% of the roads in Portola Valley, as well as 99.24% of the roads in Sky Londa, etc. (I wanted both the place declarations and the monthly udates to be in one file, in case I decide to global replace some abbreviation.)

In [6]:
Place = namedtuple('Place', 'name, miles, special, months, pcts')

def parse_places(filename='bikeplaces.txt', sep='-'*80) -> Dict:
    """Parse file into a dict:
    places = {'por':  Place('Portola Valley', 48.2, '', [month, ...], [pct, ...])}"""
    places = {}
    declarations, updates = open(filename).read().split(sep)
    for abbrev, miles, name, special in tokenize(declarations, sep='|'):
        places[abbrev] = Place(name, float(miles), special, [], [])
    for month, tokens in enumerate(tokenize(updates)):
        for i in range(1, len(tokens), 2): 
            abbrev, pct = tokens[i], float(tokens[i+1])
            places[abbrev].months.append(month)
            places[abbrev].pcts.append(pct)
    return places

def tokenize(text, sep=None): 
    """Split text into lines split by sep; strip each token; ignore blanks and comments."""
    lines = text.splitlines()
    return [[token.strip() for token in line.split(sep)]
            for line in lines if line.strip() and not line.startswith('#')]

places = parse_places()

# Plotting and Curve-Fitting

In [7]:
plt.rcParams["figure.figsize"] = (10, 6)

def show(X, Y, data, title='', degrees=(2, 3)): 
    """Plot X versus Y and a best fit curve to it, with some bells and whistles."""
    grid(); plt.ylabel(Y); plt.xlabel(X); plt.title(title)
    plt.scatter(X, Y, data=data, c='grey', marker='+')
    X1 = np.linspace(min(data[X]), max(data[X]), 100)
    for degree in degrees:
        F = poly_fit(data[X], data[Y], degree)
        plt.plot(X1, [F(x) for x in X1], '-')
    
def grid(axis='both'): 
    "Turn on the grid."
    plt.minorticks_on() 
    plt.grid(which='major', ls='-', alpha=3/4, axis=axis)
    plt.grid(which='minor', ls=':', alpha=1/2, axis=axis)
    
def poly_fit(X, Y, degree: int) -> callable:
    """The polynomial function that best fits the X,Y vectors."""
    coeffs = np.polyfit(X, Y, degree)[::-1]
    return lambda x: sum(c * x ** i for i, c in enumerate(coeffs)) 

estimator = poly_fit(rides['feet'] / rides['miles'], 
                   rides['miles'] / rides['hours'], 2)

def estimate(miles, feet, estimator=estimator) -> float:
    """Given a ride distance in miles and total climb in feet, estimate time in minutes."""
    return round(60 * miles / estimator(feet / miles))

def top(frame, field, n=20): return frame.sort_values(field, ascending=False).head(n)

# Plotting Wandrer Places

In [8]:
special_groups = dict(sf='San Francisco Neighborhoods', sj='San Jose Neighborhoods', 
                      far='Far Away Places', county='Bay Area Counties', big='California, USA and Earth')

def wandering(places, pcts=(100, 99, 90, 50, 33.3, 25, 0), specials=special_groups):
    "Plot charts within the various percent ranges, and special groups."
    for i in range(len(pcts) - 1):
        hi, lo = pcts[i], pcts[i + 1]
        abbrevs = [a for a in places 
                   if not places[a].special 
                   and lo <= max_pct(a) < hi]
        wandrer_plot(f'Places with {lo}% to {hi}% roads traveled', places, abbrevs)
    for s in specials:
        abbrevs = [a for a in places if places[a].special == s]
        wandrer_plot(specials[s], places, abbrevs)
        
def max_pct(abbrev) -> float: 
    """The maximum percent of roads achieved for this place abbreviation."""
    if not places[abbrev].pcts: print('Warning: No pcts for', abbrev)
    return max(places[abbrev].pcts, default=0)
        
def wandrer_plot(title, places, abbrevs):
    """Plot Wandrer.earth data for the places with given abbrevs."""
    if not abbrevs:
        return # Don't make an empty plot
    abbrevs = sorted(abbrevs, key=max_pct, reverse=True)
    fig, ax = plt.figure(), plt.subplot(111); 
    plt.plot()
    for abbrev, marker in zip(abbrevs, markers):
        name, miles, special, months, pcts = places[abbrev]
        dates = [month_name(i) for i in range(max(months))]
        name, miles, *_ = places[abbrev]
        ax.plot(months, pcts, ':', marker=marker, label=label(pcts, name, miles))
    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), shadow=True,
              prop=matplotlib.font_manager.FontProperties(family='monospace'))
    months = places['usa'].months
    plt.xticks(months, labels=[month_name(i) for i in months], rotation=90)
    plt.ylabel('Percent of Roads Ridden')
    plt.title(title); plt.tight_layout(); grid(axis='y'); plt.show()
    
markers = '^v><osdhxDHPX*' * 3 # Matplotlib markers
bonuses = (0.02, 0.1, 2, 25, 50, 90, 99)   # Percents that earn important bonuses

def label(pcts, name, miles) -> str:
    """Make a label for the legend."""
    pct = f'{rounded(pcts[-1]):>3}' if pcts[-1] > 1.4 else f'{pcts[-1]}'
    done = miles * pcts[-1]
    bonus = next((f' {rounded((p - pcts[-1]) / 100 * miles):>3} to {p}%' 
                  for p in bonuses if p >= pcts[-1]), '')
    return f'{pct}% ({rounded(done / 100):>3}/{rounded(miles):<3} mi){bonus} {name}'

def month_name(i, start=2020 * 12 + 6) -> str:
    """Maps 0 -> '2020-7' and 13 to '2021-8', etc."""
    year  = (start + i) // 12
    month = (start + i) %  12 + 1
    return f'{year}-{month:02}'

def rounded(x: float) -> str: 
    """Round x to 3 spaces wide (if possible)."""
    return f'{round(x):,d}' if x > 10 else f'{x:.1f}'

# Pareto Front  Across Two Counties

In [9]:
def make_leaders(data):
    """Make a dataframe of leaders in two counties."""
    leaders = pd.DataFrame(data, columns=['Name', 'SMC %', 'SCC %', 'Front?'])
    leaders['SMC miles'] = [round(29.51 * d[1]) for d in data]
    leaders['SCC miles'] = [round(75.64 * d[2]) for d in data]
    leaders['Total miles'] = leaders['SMC miles'] + leaders['SCC miles']
    return leaders

leaders = make_leaders([
    ('Barry Mann', 73.37, 29.35, 1),   ('Jason Molenda', 7.13, 54.65, 1),  
    ('Peter Norvig', 50.06, 30.31, 1), ('Brian Feinberg', 29.72, 35.59, 1),
    ('Jim Brooks', 4.23, 43.39, 0),    ('Megan Gardner', 89.43, 8.69, 1),
    ('Matthew Ring', 72.67, 1.48, 0),  ('Elliot  Huff', 50.43, 8.14, 0)])
                   
def pareto_front(leaders):
    ax = leaders.plot('SMC %', 'SCC %', grid=True, kind='scatter')
    front = sorted((x, y) for i, (_, x, y, f, *_) in leaders.iterrows() if f)
    ax.plot(*zip(*front), ':'); ax.axis('square'); grid()
    for i, (name, x, y, *_) in leaders.iterrows():
        initials = ''.join(w[0] for w in name.split())
        ax.text(x - 2, y + 2, initials)
    return leaders.drop(columns=['Front?'])

# Eddington Number

In [10]:
def Ed_number(rides, units) -> int:
    """Eddington number: The maximum integer e such that you have bicycled 
    a distance of at least e on at least e days."""
    distances = sorted(rides[units], reverse=True)
    return max(e for e, d in enumerate(distances, 1) if d >= e)

def Ed_gap(distances, target) -> int:
    """The number of rides needed to reach an Eddington number target."""
    return target - sum(distances >= target)

def Ed_gaps(rides, N=10) -> dict:
    """A table of gaps to Eddington numbers by year, and a plot.."""
    E_km = Ed_number(rides, 'kms')
    E_mi = Ed_number(rides, 'miles')
    data = [(E_km + d, Ed_gap(rides.kms,   E_km + d), 
             E_mi + d, Ed_gap(rides.miles, E_mi + d))
            for d in range(N)]
    df = pd.DataFrame(data, columns=['kms', 'kms gap', 'miles', 'miles gap'])
    return df

def Ed_progress(rides, years=reversed(range(2013, 2022 + 1))) -> pd.DataFrame:
    """A table of Eddington numbers by year, and a plot."""
    def Ed(year, unit): return Ed_number(rides[rides['year'] <= year], unit)
    data  = [(y, Ed(y, 'kms'), Ed(y, 'miles')) for y in years]
    df = pd.DataFrame(data, columns=['year', 'Ed_km', 'Ed_mi'])
    return df

# Climbing to Space

In [11]:
per_month_climbing = [35.491, 31.765, 39.186, 33.641, 32.782, 14.809, 46.731]

space = {'100 kms': 328.204, '10 Everests': 290.320, '50 miles': 50 * 5.280}

def climbing(per_month=per_month_climbing, space=space):
    """Plot progress in climbing"""
    total = np.cumsum(per_month_climbing)#[sum(per_month[:i+1]) for i in range(len(per_month))]
    for label in space:
        plt.plot(range(12), [space[label]] * 12, ':', label=label)
    plt.plot(range(len(total)), total, 'o-', label='my total')
    plt.plot(range(len(total)), per_month, 's-.', label='per month')
    plt.legend(loc=(1.04, .64), fontsize='large'); grid()
    plt.xlabel('Month of 2022'); plt.ylabel('Total climbing (Kft)')
    plt.xticks(range(13), 'Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec'.split())