# Relative importance of boulder and lead towards combined score

The goal of this notebooks is to explore the question of whether one of the two combined climbing disciplines boulder or lead tends to have an outsize impact on combined ranking.

We hypothesize that there is some advantage to being a strong lead climber over strong boulderer.

This analysis builds on the work already done by [Kuba Główka](https://twitter.com/kuba_glowka/status/1690468469320417282)

In [131]:
%matplotlib inline
import pandas as pd
import seaborn as sns
import scipy
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Patch

pd.set_option('display.max_columns', 500)
sns.set(font_scale=2)

In [5]:
# Import the data from all boulder & lead combined events in 2022 and 2023
# (the only years in which this scoring has been used)

from pathlib import Path
csvs = Path('data/combined').glob('*.csv')

# Read each CSV file into DataFrame
# This creates a list of dataframes
dfs = []
for filename in csvs:
    dfs.append(pd.read_csv(filename))

# Concat the dfs into one big df
alldf = pd.concat(dfs)
alldf.sample(5)

Unnamed: 0,event_id,year,location,category,stage,athlete_id,name,combined_rank,combined_score,boulder_rank,boulder_score,lead_rank,lead_score
19,1301,2023,Bern,women,semi-final,1750,CONDIE Kyra,20.0,45.5,13.0,33.4,21.0,12.1
5,1234,2022,Morioka,women,final,2381,TANII Natsuki,6.0,97.3,8.0,29.2,5.0,68.1
14,1301,2023,Bern,women,semi-final,620,AVEZOU Zélia,15.0,73.3,11.0,34.3,15.0,39.0
1,1290,2022,Laval,men,semi-final,1274,Scherz Stefan,2.0,140.1,4.0,56.0,1.0,84.1
0,1290,2022,Laval,women,final,455,Janicot Hélène,1.0,178.0,3.0,78.0,1.0,100.0


## Sanity check

As a first step, lets confirm Kuba's average and std dev score numbers for the Bern event, to confirm data is correct and we're starting from the same place

In [150]:
bern_event_id=1301
bern_event = alldf[alldf['event_id'] == bern_event_id]

grouped = bern_event.groupby(['stage', 'category']).agg({'boulder_score':['mean',lambda x: x.std(ddof=0)], 'lead_score':['mean',lambda x: x.std(ddof=0)]}).round(2)
d = {'<lambda_0>':'std'}
grouped1 = grouped.rename(columns = d)
grouped1

Unnamed: 0_level_0,Unnamed: 1_level_0,boulder_score,boulder_score,lead_score,lead_score
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,std,mean,std
stage,category,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
final,men,84.025,10.67,61.55,14.15
final,women,52.0,22.21,69.9625,23.77
semi-final,men,62.95,14.6,58.995,22.77
semi-final,women,38.92381,20.0,48.490476,19.36


Okay good these numbers look the same

## Further exploration

Kuba's analysis shows

1. In the Bern event, scores in lead rounds tended to have a higher standard deviation than in the corresponding boulder round. This suggests that the there is greater separation between climbers during lead rounds.
2. Kuba also compares each climber's score in each discipline to the mean for that round and gives a "normalized %". Climbers with standout lead performances tended to do better in the results.

These points suggest that there is an advantage to being a stronger lead climber in the combined event -- or at least that that was the case in Bern.

I will expand on this analysis in the following ways.

1. Answer the question -- Across all events using this combined format (albeit there have not been many) do lead scores tend to have a higher standard deviation than their corresponding bouldering round?
1. Explore novel approaches to quantifying how each discipline contributes to the final ranking.
1. Explore the correlation between standard deviation and relative importance of a discipline to combined score.
2. Explore whether even accounting for a difference in standard deviation the lead score is more important to the final combined ranking. This would suggest that beyond just the scoring, something about the lead event (e.g. coming second) carries more weight in the final ranking.

### Do lead rounds tend to have higher standard deviation than boulder rounds?

In [226]:
# First some clean up, drop any DNS rows
# alldf[alldf.isna().any(axis=1)] -- 3 rows with DNS for some portion
cleandf = alldf.dropna()

In [228]:
cols = cleandf.loc[:, cleandf.columns.str.endswith('_score')].columns
cleandf[cols] = cleandf[cols].apply(pd.to_numeric)
cleandf.dtypes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cleandf[cols] = cleandf[cols].apply(pd.to_numeric)


event_id            int64
year                int64
location           object
category           object
stage              object
athlete_id          int64
name               object
combined_rank     float64
combined_score    float64
boulder_rank      float64
boulder_score     float64
lead_rank         float64
lead_score        float64
dtype: object

In [248]:
grouped = cleandf.groupby(['event_id','stage', 'category'])
result = grouped.apply(lambda df: pd.Series({
    'boulder_std': round(df['boulder_score'].std(ddof=0),2),
    'lead_std': round(df['lead_score'].std(ddof=0),2)
})).reset_index()
result['lead_std_higher'] = result['lead_std'] - result['boulder_std'] >= 0
result

Unnamed: 0,event_id,stage,category,boulder_std,lead_std,lead_std_higher
0,1234,final,men,17.32,13.22,False
1,1234,final,women,24.89,21.68,False
2,1234,semi-final,men,22.11,12.34,False
3,1234,semi-final,women,23.76,17.68,False
4,1247,final,men,18.62,36.0,True
5,1247,final,women,20.47,25.47,True
6,1287,final,men,27.22,24.07,False
7,1287,final,women,15.77,12.57,False
8,1287,semi-final,men,20.31,24.76,True
9,1287,semi-final,women,26.11,29.47,True


In [231]:
result['lead_std_higher'].value_counts()

lead_std_higher
False    9
True     9
Name: count, dtype: int64

Answer: no. Across the events that have happened so far, an equal number of rounds have higher standard deviation in the boulder discipline.

But is the lead advantage confined only to events where the lead standard deviation is higher? Is there another way to quantify the impact of a discipline on the combined ranking for that round?

### Quantifying discipline impact on rank

Rather than comparing each athlete's score in a round to the mean for that round, we can look at how well all the scores from a given round predict the final rank. E.g. if we look at only a boulder round, how close is the rank vector to the final combined rank.

In [232]:
# There are various algorithms to compare ranked lists
# Let's start with Kendall's rank, an unweighted correlation measure
# This measure gives as much weight to ranks at the top as the bottom of the list
# which may not be right for this case.

# First, lets try it on a single event
bern_men_final = cleandf[(cleandf['event_id'] == bern_event_id) & (cleandf['stage'] == 'final') & (cleandf['category'] == 'men')]
combined_ranked_athletes = bern_men_final['athlete_id'].to_numpy()
boulder_ranked_athletes = bern_men_final.sort_values(by='boulder_rank')['athlete_id'].to_numpy()
lead_ranked_athletes = bern_men_final.sort_values(by='lead_rank')['athlete_id'].to_numpy()

boulder_rank_corr_k = scipy.stats.kendalltau(combined_ranked_athletes, boulder_ranked_athletes)
lead_rank_corr_k = scipy.stats.kendalltau(combined_ranked_athletes, lead_ranked_athletes)
print(f"{boulder_rank_corr_k}, {lead_rank_corr_k}")

SignificanceResult(statistic=0.14285714285714285, pvalue=0.7195436507936508), SignificanceResult(statistic=0.5714285714285714, pvalue=0.06101190476190476)


In [233]:
print(combined_ranked_athletes)
print(boulder_ranked_athletes)
print(lead_ranked_athletes)

[ 1214  1929  2276 13040 11490  1364  2101   547]
[ 2276  1214 13040  1929  1364 11490  2101   547]
[ 1214  1929 13040 11490  1364  2101  2276   547]


In [234]:
bern_men_final[['athlete_id','name','combined_rank','boulder_rank','lead_rank']]

Unnamed: 0,athlete_id,name,combined_rank,boulder_rank,lead_rank
0,1214,SCHUBERT Jakob,1.0,2.0,1.0
1,1929,DUFFY Colin,2.0,4.0,2.0
2,2276,NARASAKI Tomoa,3.0,1.0,7.0
3,13040,ANRAKU Sorato,4.0,3.0,3.0
4,11490,ROBERTS Toby,5.0,6.0,3.0
5,1364,ONDRA Adam,6.0,5.0,5.0
6,2101,LEE Dohyun,7.0,7.0,5.0
7,547,JENFT Paul,8.0,8.0,8.0


In [235]:
# Intuitively the result above feels a little weird
# We can turn instead to a different algorithm: Rank-biased Overlap (RBO)
import rbo

boulder_rank_corr = rbo.RankingSimilarity(combined_ranked_athletes, boulder_ranked_athletes).rbo()
lead_rank_corr = rbo.RankingSimilarity(combined_ranked_athletes, lead_ranked_athletes).rbo()

print(f"boulder_rank_corr: {boulder_rank_corr:.3f}, lead_rank_corr: {lead_rank_corr:.3f}")

# The RBO result feels a little more sensible, both boulder and lead
# rounds for this event are fairly similar to the final result,
# but lead is somewhat more similar

boulder_rank_corr: 0.746, lead_rank_corr: 0.881


In [251]:
# Now lets calculate this similarity score for every round for every event

def ranked_athletes(df, by='combined_rank'):
    return df.sort_values(by=by)['athlete_id'].to_numpy()
    
result2 = grouped.apply(lambda df: pd.Series({
    'boulder_rank_similarity': round(rbo.RankingSimilarity(ranked_athletes(df), ranked_athletes(df, by='boulder_rank')).rbo(), 3), 
    'lead_rank_similarity': round(rbo.RankingSimilarity(ranked_athletes(df), ranked_athletes(df, by='lead_rank')).rbo(), 3),
}))

result = result.set_index(['event_id','stage','category']).join(result2)
result

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,boulder_std,lead_std,lead_std_higher,boulder_rank_similarity,lead_rank_similarity
event_id,stage,category,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1234,final,men,17.32,13.22,False,0.694,0.707
1234,final,women,24.89,21.68,False,0.763,0.975
1234,semi-final,men,22.11,12.34,False,0.861,0.649
1234,semi-final,women,23.76,17.68,False,0.819,0.849
1247,final,men,18.62,36.0,True,0.599,0.958
1247,final,women,20.47,25.47,True,0.895,0.948
1287,final,men,27.22,24.07,False,0.694,0.844
1287,final,women,15.77,12.57,False,0.677,0.88
1287,semi-final,men,20.31,24.76,True,0.754,0.884
1287,semi-final,women,26.11,29.47,True,0.819,0.869


In [252]:
result['lead_rc_higher'] = result['lead_rank_similarity'] - result['boulder_rank_similarity'] >= 0

In [253]:
result['lead_rc_higher'].value_counts()

lead_rc_higher
True     13
False     5
Name: count, dtype: int64

In [254]:
result[['boulder_rank_similarity', 'lead_rank_similarity']].describe()

Unnamed: 0,boulder_rank_similarity,lead_rank_similarity
count,18.0,18.0
mean,0.764667,0.838889
std,0.072877,0.090489
min,0.599,0.649
25%,0.71225,0.78175
50%,0.766,0.862
75%,0.8185,0.88325
max,0.895,0.975


In [255]:
result[['boulder_rank_similarity', 'lead_rank_similarity']].median()

boulder_rank_similarity    0.766
lead_rank_similarity       0.862
dtype: float64

These results show that across combined format comps, lead rank is more tightly correlated with final combined rank than boulder rank.

### Does this round rank correlation score correlate with a round having a higher std dev?

In [256]:
#result['lead_std_higher'] = result['lead_std'] - result['boulder_std'] >= 0
pd.crosstab(result.lead_rc_higher,result.lead_std_higher, normalize='columns')

lead_std_higher,False,True
lead_rc_higher,Unnamed: 1_level_1,Unnamed: 2_level_1
False,0.333333,0.222222
True,0.666667,0.777778


Std deviation and rank correlation don't look that correlated. Even when the lead standard deviation is not higher than that of the boulder round, we still see that the lead score appears to matter more in the final ranking (i.e. the lead only ranking is more similar to the final ranking.) according to this metric.

This could be because standard deviation hides some features of the boulder score distribution. Lead scores are continuous, but boulder scores are not. We could imagine a boulder round with high standard deviation where scores are nonetheless clumped e.g. around 2 tops and around 4 tops and athletes are not actually well separated.

Another possibility is that there is some ordering effect. The lead round goes last, and so from a psychological perspective, might confer some advantage to lead climbers.

One note -- each of these rounds is not independent, and it does look like the setting team for a particular comp has an impact. I.e. in a given event, if one round tends to have a higher standard deviation in the boulder round, other rounds at the event will too.

In [257]:
result

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,boulder_std,lead_std,lead_std_higher,boulder_rank_similarity,lead_rank_similarity,lead_rc_higher
event_id,stage,category,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1234,final,men,17.32,13.22,False,0.694,0.707,True
1234,final,women,24.89,21.68,False,0.763,0.975,True
1234,semi-final,men,22.11,12.34,False,0.861,0.649,False
1234,semi-final,women,23.76,17.68,False,0.819,0.849,True
1247,final,men,18.62,36.0,True,0.599,0.958,True
1247,final,women,20.47,25.47,True,0.895,0.948,True
1287,final,men,27.22,24.07,False,0.694,0.844,True
1287,final,women,15.77,12.57,False,0.677,0.88,True
1287,semi-final,men,20.31,24.76,True,0.754,0.884,True
1287,semi-final,women,26.11,29.47,True,0.819,0.869,True


### Quantifying athlete specialty and its impact on rank

We previously explored using rank correlation to ascribe relative importance of a discipline to the combined score. Lets look at another way to try and quantify whether being a specialist in lead confers more of an advantage than being a specialist in boulder.

We can categorize or score athletes as lead or boulder specialists and look at how that impacts rankings.

Kuba's analysis did this by looking at how athletes performed in the individual discipline portion of the Bern event, but we can also use the world ranking points prior to the Bern competition to give athletes a lead and boulder score.

Caveat: these rankings are based on accumulated points which are subject to bias in terms of whether athletes are choosing to attend or skip events. For example, Janja is currently ranked #4 in bouldering, which intuitively feels wrong, but this is because she has not attended very many boulder world cups over the past two years.

In [258]:
# Read wr csvs into DataFrames
# This creates a list of dataframes
from functools import reduce

csvs = Path('data/wr').glob('*.csv')

wrdfs = []
for filename in csvs:
    wrdfs.append(pd.read_csv(filename))

for df in wrdfs:
    discipline = df.loc[0, 'discipline']
    df.rename(columns={"rank": f"{discipline}_rank", "score": f"{discipline}_score"}, inplace=True)

men_dfs = [df for df in wrdfs if df.loc[0, 'category'] == 'men']
women_dfs = [df for df in wrdfs if df.loc[0, 'category'] != 'men']

# Merge the dfs into one big df
w_wrdf = pd.concat([d.set_index('athlete_id').drop(['discipline', 'category'], axis=1) for d in women_dfs], axis=1).reset_index()
w_wrdf = w_wrdf.loc[:,~w_wrdf.columns.duplicated(keep='first')]

m_wrdf = pd.concat([d.set_index('athlete_id').drop(['discipline', 'category'], axis=1) for d in men_dfs], axis=1).reset_index()
m_wrdf = m_wrdf.loc[:,~m_wrdf.columns.duplicated(keep='first')]

wrdf = pd.concat([w_wrdf, m_wrdf])
wrdf.sample(10)

Unnamed: 0,athlete_id,name,boulder_rank,boulder_score,lead_rank,lead_score,combined_rank,combined_score
217,13854,,,,121.0,17.0,,
156,2045,KIM Hong il,155.0,3.5,,,,
0,13040,ANRAKU Sorato,1.0,4570.0,1.0,4405.0,1.0,5325.0
181,519,,,,39.0,578.0,,
33,914,Tesio Giorgia,34.0,725.66,76.0,95.0,48.0,675.16
104,7164,XIANG HONGCHUN,105.0,16.0,,,,
52,1131,Vezonik Gregor,53.0,221.0,95.0,45.0,65.0,266.0
225,1990,,,,151.0,7.0,,
147,1095,Janse van Rensburg Mel,148.0,4.0,149.0,5.0,118.0,5.0
93,2214,Yau Ka-chun,94.0,28.83,112.0,27.0,86.0,71.83


In [263]:
# Throw out rows with NaN, we only want combined athletes
cleanwrdf = wrdf.dropna()
cleanwrdf.head(10)

Unnamed: 0,athlete_id,name,boulder_rank,boulder_score,lead_rank,lead_score,combined_rank,combined_score
0,1803,Grossman Natalia,1.0,4617.5,15.0,1916.0,4.0,4915.0
1,11462,Bertone Oriane,2.0,4451.5,28.0,858.0,8.0,4005.0
2,1811,Raboutou Brooke,3.0,4370.0,7.0,3080.0,3.0,4980.0
3,1147,Garnbret Janja,4.0,4105.0,1.0,6220.0,1.0,6805.0
4,2294,Nonaka Miho,5.0,3215.0,18.0,1676.0,6.0,4300.0
5,4017,Kerem Ayala,6.0,2802.0,78.0,88.33,20.0,2110.33
6,2501,Mackenzie Oceania,7.0,2790.0,42.0,511.0,18.0,2210.0
7,13021,Sanders Anastasia,8.0,2285.0,30.0,790.66,13.0,2559.66
8,1226,Pilz Jessica,9.0,2218.0,4.0,4135.0,5.0,4445.0
9,2380,Mori Ai,10.0,2095.0,2.0,4805.0,2.0,5730.0


In [269]:
# Now lets give each athlete a normalized boulder and lead rank
cleanwrdf.loc[:,'boulder_rank_pct'] = cleanwrdf['boulder_rank'].rank(pct=True, ascending=False)
cleanwrdf.loc[:,'lead_rank_pct'] = cleanwrdf['lead_rank'].rank(pct=True, ascending=False)
cleanwrdf.head()

Unnamed: 0,athlete_id,name,boulder_rank,boulder_score,lead_rank,lead_score,combined_rank,combined_score,boulder_rank_pct,lead_rank_pct
0,1803,Grossman Natalia,1.0,4617.5,15.0,1916.0,4.0,4915.0,0.997758,0.876682
1,11462,Bertone Oriane,2.0,4451.5,28.0,858.0,8.0,4005.0,0.988789,0.782511
2,1811,Raboutou Brooke,3.0,4370.0,7.0,3080.0,3.0,4980.0,0.979821,0.943946
3,1147,Garnbret Janja,4.0,4105.0,1.0,6220.0,1.0,6805.0,0.970852,0.997758
4,2294,Nonaka Miho,5.0,3215.0,18.0,1676.0,6.0,4300.0,0.961883,0.849776


Now lets look at podiums across these events and whether there is a skew in the percentile rank of athletes on the podium

In [295]:
rankpcts = cleanwrdf[['athlete_id','boulder_rank_pct','lead_rank_pct']]
rankpct = cleandf.merge(rankpcts, on='athlete_id')
rankpct.head()

Unnamed: 0,event_id,year,location,category,stage,athlete_id,name,combined_rank,combined_score,boulder_rank,boulder_score,lead_rank,lead_score,boulder_rank_pct,lead_rank_pct
0,1247,2022,Munich,men,final,1214,Schubert Jakob,1.0,175.6,3.0,80.5,1.0,95.1,0.82287,0.979821
1,1301,2023,Bern,men,semi-final,1214,SCHUBERT Jakob,5.0,144.8,18.0,44.8,1.0,100.0,0.82287,0.979821
2,1301,2023,Bern,men,final,1214,SCHUBERT Jakob,1.0,183.6,2.0,99.6,1.0,84.0,0.82287,0.979821
3,1247,2022,Munich,men,final,1364,Ondra Adam,2.0,170.8,1.0,80.7,2.0,90.1,0.872197,0.876682
4,1301,2023,Bern,men,semi-final,1364,ONDRA Adam,6.0,144.1,12.0,64.1,5.0,80.0,0.872197,0.876682


In [298]:
podiumrankpct = rankpct[(rankpct.stage == 'final') & (rankpct.combined_rank <= 3.0)]

podiumrankpct[['boulder_rank_pct','lead_rank_pct']].median()

boulder_rank_pct    0.899103
lead_rank_pct       0.903587
dtype: float64

In [299]:
podiumrankpct[['boulder_rank_pct','lead_rank_pct']].describe()

Unnamed: 0,boulder_rank_pct,lead_rank_pct
count,29.0,29.0
mean,0.870651,0.85658
std,0.120394,0.150737
min,0.5,0.401345
25%,0.840807,0.775785
50%,0.899103,0.903587
75%,0.961883,0.979821
max,0.997758,0.997758


In [300]:
finalsrankpct = rankpct[(rankpct.stage == 'final')]

finalsrankpct[['boulder_rank_pct','lead_rank_pct']].median()

boulder_rank_pct    0.872197
lead_rank_pct       0.849776
dtype: float64

In [301]:
podiumrankpct[['name','boulder_rank_pct','lead_rank_pct']]

Unnamed: 0,name,boulder_rank_pct,lead_rank_pct
0,Schubert Jakob,0.82287,0.979821
2,SCHUBERT Jakob,0.82287,0.979821
3,Ondra Adam,0.872197,0.876682
6,Ginés López Alberto,0.778027,0.849776
16,SEO Chaehyun,0.881166,0.979821
18,SEO Chaehyun,0.881166,0.979821
21,Tanii Natsuki,0.5,0.921525
25,Ito Futaba,0.872197,0.79148
41,GARNBRET Janja,0.970852,0.997758
43,Garnbret Janja,0.970852,0.997758
