<div align="right" style="text-align: right"><i>Peter Norvig, Oct 2017<br>pandas Aug 2020<br>Data updated monthly</i></div>

# Bike Code

Code to support the analysis in the notebook [Bike Speed versus Grade.ipynb](Bike%20Speed%20versus%20Grade.ipynb).

In [28]:
from IPython.core.display import HTML
from typing import Iterator, Tuple, List, Dict
import matplotlib
import matplotlib.pyplot as plt
import numpy  as np
import pandas as pd
import re

# Reading Data: `rides`

I downloaded a bunch of my recorded [Strava](https://www.strava.com/athletes/575579) rides, most of them longer than 25 miles (with a few exceptions), as [`bikerides.tsv`](bikerides.tsv).  The columns are: the date; the year; a title; the elapsed time of the ride; the length of the ride in miles; and the total climbing in feet, e.g.: 

    Mon, 10/5	2020	Half way around the bay on bay trail	6:26:35	80.05	541
    
I parse the file into the pandas dataframe `rides`, adding derived columns for miles per hour, vertical feet climbed per hour (VAM), grade in feet per mile, grade in percent, and kilometers ridden:

In [42]:
def parse_hours(time: str) -> float: 
    """Parse '4:30:00' => 4.5 hours."""
    while time.count(':') < 2: 
        time = '0:' + time
    return round(pd.Timedelta(time).seconds / 60 / 60, 4)

def parse_int(field: str) -> int: return int(field.replace(',', ''))

def add_derived_columns(rides) -> pd.DataFrame:
    return rides.assign(
        mph=round(rides['miles'] / rides['hours'], 2),
        vam=round(rides['feet'] / rides['hours']),
        fpm=round(rides['feet']  / rides['miles']),
        pct=round(rides['feet']  / rides['miles'] * 100 / 5280, 2),
        kms=round(rides['miles'] * 1.609, 2))

In [43]:
rides = add_derived_columns(pd.read_table(open('bikerides.tsv'), comment='#',
            converters=dict(hours=parse_hours, feet=parse_int)))

In [44]:
rides

Unnamed: 0,date,year,title,hours,miles,feet,mph,vam,fpm,pct,kms
0,"Sun, 3/14",2021,Fremont / Union City,5.3131,69.58,1552,13.10,292.0,22.0,0.42,111.95
1,"Fri, 3/12",2021,Santa Clara,5.1606,69.49,1138,13.47,221.0,16.0,0.31,111.81
2,"Sun, 2/7",2021,Saratoga / Campbell,5.8925,78.38,2270,13.30,385.0,29.0,0.55,126.11
3,"Fri, 1/8",2021,Coyote Hills Geocaching,4.9689,69.08,797,13.90,160.0,12.0,0.22,111.15
4,"Sun, 10/11",2020,Los Altos Hills Paths,5.8247,65.03,1870,11.16,321.0,29.0,0.54,104.63
...,...,...,...,...,...,...,...,...,...,...,...
474,"Sun, 6/23",2013,Climb,2.2747,24.30,2001,10.68,880.0,82.0,1.56,39.10
475,"Sat, 7/13",2013,Doug's Event,1.8653,21.35,1677,11.45,899.0,79.0,1.49,34.35
476,"Sun, 8/4",2013,Kris's first trike ride,1.8558,20.96,988,11.29,532.0,47.0,0.89,33.72
477,"Sun, 11/24",2013,Alpine Rd,1.7100,21.02,1289,12.29,754.0,61.0,1.16,33.82


# Reading Data: `segments`

I picked some representative climbing segments ([`bikesegments.csv`](bikesegments.csv)) with the segment length in miles and climb in feet, along with several of my times on the segment. A line like

    Old La Honda, 2.98, 1255, 28:49, 34:03, 36:44
    
means that this segment of Old La Honda Rd is 2.98 miles long, 1255 feet of climbing, and I've selected three times for my rides on that segment: the fastest, middle, and slowest of the times  that Strava shows. (However, I ended up dropping the slowest time in the charts to make them less busy.)

In [5]:
def parse_segments(lines):
    """Parse segments into rides. Each ride is a tuple of:
    (segment_title, time,  miles, feet_climb)."""
    for segment in lines:
        title, mi, ft, *times = segment.split(',')[:5]
        for time in times:
            yield title, parse_hours(time), float(mi), parse_int(ft)

In [15]:
segments = add_derived_columns(pd.DataFrame(
               parse_segments(open('bikesegments.csv')), 
               columns='title	hours	miles	feet'.split()))

# Reading Data: `places`

Monthly, I will take my [summary data from wandrer.earth](https://wandrer.earth/athletes/3534/santa-clara-county-california) and enter it in the file [bikeplaces.txt](bikeplaces.txt), in a format where

      Cupertino: 172: 22.1 23.9 26.2*3 26.3 | 26.4
      
means that Cupertino has 172 miles of roads, and that by the first month I started keeping track, I had ridden 22.1% of them; in the last month 26.4%; and the `26.2*3` means that for 3 months in a row I had 26.2%. The `|` indicates the end of a year. A line that starts with `#` is a comment.

In [8]:
class Month(int):
    """An integer in the form: 12 * year + month."""
    def __str__(self): return f'{(self - 1) // 12}-{(self % 12) or 12:02d}'

start   = Month(2020 * 12 + 7) # Starting month: July 2020
bonuses = (25, 90, 99)         # Percents the earn important bonuses

Entry = Tuple[str, float, List[float]] # (Place_Name, miles_of_roads, [pct_by_month,...])

def wandrer(category, entries, start=start):
    """Plot Wandrer.earth data."""
    fig, ax = plt.figure(), plt.subplot(111); plt.plot()
    for (place, miles, pcts), marker in zip(entries, '^v><osdhxDHPX*1234'):
        N = len(pcts)
        dates = [Month(start + i) for i in range(N)]
        X = [dates[i] for i in range(N) if pcts[i]]
        Y = [pcts[i]  for i in range(N) if pcts[i]]
        ax.plot(X, Y, ':', marker=marker, label=label(pcts, place, miles))
    all_pcts = [p for _, _, pcts in entries for p in pcts if p]
    for p in bonuses: 
        if min(all_pcts) < p < max(all_pcts):
            ax.plot(dates, [p] * N, 'k:', lw=1, alpha=3/4) # Plot bonus line
    ax.legend(loc='center left', bbox_to_anchor=(1, 0.5), shadow=True,
              prop=matplotlib.font_manager.FontProperties(family='monospace'))
    plt.xticks(dates, [str(d) for d in dates], rotation=90)
    plt.ylabel('Percent of Area Ridden')
    plt.title(category); plt.tight_layout(); grid(axis='y'); plt.show()
    
def label(pcts, place, miles) -> str:
    pct = f'{rounded(pcts[-1]):>3}' if pcts[-1] > 1 else f'{pcts[-1]}'
    done = miles * pcts[-1]
    bonus = next((f' {rounded((p - pcts[-1]) / 100 * miles):>3} to {p}%' 
                  for p in bonuses if p >= pcts[-1]), '')
    return f'{pct}% ({rounded(done / 100):>3}/{rounded(miles):>3} mi){bonus} {place}'
    
def parse_places(lines) -> Dict[str, List[Entry]]:
    "Parse bikeplaces.txt into a dict of {'Title': [entry,...]}"
    places = {}
    category = None
    for line in lines:
        line = line.strip()
        if line.startswith('#') or not line: 
            pass
        elif line.startswith(':'):
            title = line.strip(':')
            places[title] = []
        else:
            places[title].append(parse_entry(line))
            places[title].sort(key=lambda entry: -entry[-1][-1])
    return places
    
def parse_entry(line: str) -> Entry:
    """Parse line => ('Place Name', miles, [percents]); '=' can be used."""
    if line.count(':') != 2:
        print('bad', line)
    place, miles, pcts = line.replace('|', ' ').split(':')
    pcts = re.sub('( [0-9.]+)[*]([0-9]+)', lambda m: m.group(1) * int(m.group(2)),
                  pcts).split()
    for i, p in enumerate(pcts):
        pcts[i] = pcts[i - 1] if p == '=' else 100 if p == '100' else float(p)
    return place, float(miles), pcts 
                   
def rounded(x: float) -> str: return f'{round(x):,d}' if x > 10 else f'{x:.1f}'

def wandering(places: dict):
    "Plot charts of unique roads ridden in various places."
    for category in places:
        wandrer(category, places[category])

In [9]:
places = parse_places(open('bikeplaces.txt'))

# Eddington Number

In [17]:
def Ed_number(distances) -> int:
    """Eddington number: The maximum integer e such that you have bicycled 
    a distance of at least e on at least e days."""
    distances = sorted(distances, reverse=True)
    return max(e for e, d in enumerate(distances, 1) if d >= e)

def Ed_gap(distances, target) -> int:
    """The number of rides needed to reach an Eddington number target."""
    return target - sum(distances > target)

def Ed_progress(years=range(2013, 2022), rides=rides) -> pd.DataFrame:
    """A table of Eddington numbers by year, and a plot."""
    def Ed(year, d): return Ed_number(rides[rides['year'] <= year][d])
    data  = [(y, Ed(y, 'kms'), Ed(y, 'miles')) for y in years]
    frame = pd.DataFrame(data, columns=['year', 'Ed_km', 'Ed_mi'])
    frame.plot('year', ['Ed_km', 'Ed_mi'], style='o:',
               title='Eddington Numbers in kms and miles')
    grid(axis='y')
    return frame

# Plotting and Curve-Fitting

In [11]:
plt.rcParams["figure.figsize"] = (10, 6)

def show(X, Y, data, title='', degrees=(2, 3)): 
    """Plot X versus Y and a best fit curve to it, with some bells and whistles."""
    grid(); plt.ylabel(Y); plt.xlabel(X); plt.title(title)
    plt.scatter(X, Y, data=data, c='grey', marker='+')
    X1 = np.linspace(min(data[X]), max(data[X]), 100)
    for degree in degrees:
        F = poly_fit(data[X], data[Y], degree)
        plt.plot(X1, [F(x) for x in X1], '-')
    
def grid(axis='both'): 
    "Turn on the grid."
    plt.minorticks_on() 
    plt.grid(which='major', ls='-', alpha=3/4, axis=axis)
    plt.grid(which='minor', ls=':', alpha=1/2, axis=axis)
    
def poly_fit(X, Y, degree: int) -> callable:
    """The polynomial function that best fits the X,Y vectors."""
    coeffs = np.polyfit(X, Y, degree)[::-1]
    return lambda x: sum(c * x ** i for i, c in enumerate(coeffs)) 

estimator = poly_fit(rides['feet'] / rides['miles'], 
                   rides['miles'] / rides['hours'], 2)

def estimate(miles, feet, estimator=estimator) -> float:
    """Given a ride distance in miles and total climb in feet, estimate time in minutes."""
    return 60 * miles / estimator(feet / miles)

def sort(frame, field): return frame.sort_values(field, ascending=False)