# Looking for the best Formula 1 season

For my master's project, I'm making a piece about answering the question: **What championship winning team had the best Formula 1 season?**

To answer this question, I'll be checking three definitions of best:

1. most wins in a season
1. most podiums in a season
1. how close was the performance to perfect

To do this I was working with data provided by the [Ergast Developer API](https://ergast.com/mrd/). I noticed an error in the driver-constructor pairing for the 1950 season and wanted to verify things without moving forward. I was originally going to create a table of the driver-constructor pairs for each race, and then compare it with the data I had.

Instead I went straight to the source for F1 information, [formula1.com](https://formula1.com), and scraped race information for each race from 1950 to 2018. There were some holes with how disqualifications and withdrawal were recorded (or not, in this case) as we went back in time to earlier seasons.

Now I've gone and gotten data from [statsf1.com](https://www.statsf1.com/) which is tabulated in an easy to understand manner and is more complete than the formula1.com data, and doesn't have the issues of the Ergast data.

In [123]:
import pandas as pd
import numpy as np

In [2]:
race_results = pd.read_csv("../data/other/race_results_v3.csv")

In [3]:
race_results.head()

Unnamed: 0,race_id,year,round,race_name,position,order,driver,team,constructor_long,extra
0,1,1950,1,Britain,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 23.6s ( 146.378 km/h )
1,1,1950,1,Britain,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 26.2s ( +02.6s )
2,1,1950,1,Britain,3,3,Reg PARNELL,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 14m 15.6s ( +52.0s )
3,1,1950,1,Britain,4,4,Yves GIRAUD-CABANTOUS,Talbot Lago,Talbot Lago Talbot,
4,1,1950,1,Britain,5,5,Louis ROSIER,Talbot Lago,Talbot Lago Talbot,


Quickly check for the races in this period (should be 997)

In [4]:
race_results.race_id.max()

997

When I did the results scraping, I set the order of the disqualified drivers to `-99`, but now we can go in and set those up so they make sense. If you are disqualified, your team should have the same benefit as if you finished last.

In [5]:
def update_order(row):
    season = race_results[race_results.year == row.year]
    race = season[season["round"] == row["round"]]
    
    num_drivers = race.driver.nunique()
    last_place = race.order.max()
    avg_retire = np.round(race[race.position == "ab"].order.mean())
    
    if (row.position == "dsq") or (row.position == "f") or (row.position == "np"):
        return last_place
#     elif (row.position == "ab"):
#         return avg_retire
    else:
        return row.order

In [6]:
race_results["p_final"] = race_results.apply(update_order, axis=1)

In [7]:
race_results.head(20)

Unnamed: 0,race_id,year,round,race_name,position,order,driver,team,constructor_long,extra,p_final
0,1,1950,1,Britain,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 23.6s ( 146.378 km/h ),1
1,1,1950,1,Britain,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 26.2s ( +02.6s ),2
2,1,1950,1,Britain,3,3,Reg PARNELL,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 14m 15.6s ( +52.0s ),3
3,1,1950,1,Britain,4,4,Yves GIRAUD-CABANTOUS,Talbot Lago,Talbot Lago Talbot,,4
4,1,1950,1,Britain,5,5,Louis ROSIER,Talbot Lago,Talbot Lago Talbot,,5
5,1,1950,1,Britain,6,6,Bob GERARD,ERA,ERA ERA,,6
6,1,1950,1,Britain,7,7,Cuth HARRISON,ERA,ERA ERA,,7
7,1,1950,1,Britain,8,8,Philippe ETANCELIN,Talbot Lago,Talbot Lago Talbot,,8
8,1,1950,1,Britain,9,9,David HAMPSHIRE,Maserati,Maserati Maserati,,9
9,1,1950,1,Britain,10,10,Joe FRY,Maserati,Maserati Maserati,,10


I will work with a slice of this `race_results` dataFrame that only includes the team in their championship winning season. Let's make that slice now:

In [9]:
winning_teams.head(20)

Unnamed: 0,year,team
0,1950,Alfa Romeo
1,1951,Alfa Romeo
2,1952,Ferrari
3,1953,Ferrari
4,1954,Mercedes
5,1955,Mercedes
6,1956,Ferrari
7,1957,Maserati
8,1958,Ferrari
9,1959,Cooper


Unnamed: 0,race_id,year,round,race_name,position,order,driver,team,constructor_long,extra,p_final,keep
0,1,1950,1,Britain,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 23.6s ( 146.378 km/h ),1,both
1,1,1950,1,Britain,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 26.2s ( +02.6s ),2,both
2,1,1950,1,Britain,3,3,Reg PARNELL,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 14m 15.6s ( +52.0s ),3,both
3,1,1950,1,Britain,4,4,Yves GIRAUD-CABANTOUS,Talbot Lago,Talbot Lago Talbot,,4,left_only
4,1,1950,1,Britain,5,5,Louis ROSIER,Talbot Lago,Talbot Lago Talbot,,5,left_only
5,1,1950,1,Britain,6,6,Bob GERARD,ERA,ERA ERA,,6,left_only
6,1,1950,1,Britain,7,7,Cuth HARRISON,ERA,ERA ERA,,7,left_only
7,1,1950,1,Britain,8,8,Philippe ETANCELIN,Talbot Lago,Talbot Lago Talbot,,8,left_only
8,1,1950,1,Britain,9,9,David HAMPSHIRE,Maserati,Maserati Maserati,,9,left_only
9,1,1950,1,Britain,10,10,Joe FRY,Maserati,Maserati Maserati,,10,left_only


Unnamed: 0,race_id,year,round,race_name,position,order,driver,team,constructor_long,extra,p_final,keep
9649,358,1982,1,South Africa,2,2,Carlos REUTEMANN,Williams,Williams Ford Cosworth,1h 32m 23.347s ( +14.946s ),2,both
9652,358,1982,1,South Africa,5,5,Keke ROSBERG,Williams,Williams Ford Cosworth,1h 32m 54.540s ( +46.139s ),5,both
9680,359,1982,2,Brazil,dsq,2,Keke ROSBERG,Williams,Williams Ford Cosworth,Weight infringement 1h 44m 05.737s,31,both
9697,359,1982,2,Brazil,ab,19,Carlos REUTEMANN,Williams,Williams Ford Cosworth,Collision,19,both
9711,360,1982,3,USA West,2,2,Keke ROSBERG,Williams,Williams Ford Cosworth,1h 58m 39.978s ( +14.660s ),2,both
9729,360,1982,3,USA West,ab,20,Mario ANDRETTI,Williams,Williams Ford Cosworth,Suspension,20,both
9756,362,1982,5,Belgium,2,2,Keke ROSBERG,Williams,Williams Ford Cosworth,1h 35m 49.263s ( +07.268s ),2,both
9765,362,1982,5,Belgium,ab,11,Derek DALY,Williams,Williams Ford Cosworth,Accident,11,both
9792,363,1982,6,Monaco,6,6,Derek DALY,Williams,Williams Ford Cosworth,Accident,6,both
9797,363,1982,6,Monaco,ab,11,Keke ROSBERG,Williams,Williams Ford Cosworth,Suspension,11,both


---

## Method 01: Wins

Let's compare championship seasons by how many wins each team got in their season.

We can look for wins by doing one of two things:

* pick all rows where `order == 1`
* pick all rows where `position == "1"`

In terms of wins, there were three races where two drivers shared first: 1951 French GP (Alfa Romeo), 1956 Argentine GP (Ferrari), and 1957 British GP (Vanwall).

For this analysis I care more that the constructor/team finished first than I do about it being a shared drive. By selecting rows using the position column, I also don't have to worry about shared drives.


In [17]:
wins.head(12)

Unnamed: 0,race_id,year,round,race_name,position,order,driver,team,constructor_long,extra,p_final
0,1,1950,1,Britain,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 23.6s ( 146.378 km/h ),1
25,2,1950,2,Monaco,1,1,Juan Manuel FANGIO,Alfa Romeo,Alfa Romeo Alfa Romeo,3h 13m 18.7s ( 98.701 km/h ),1
127,4,1950,4,Switzerland,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 02m 53.7s ( 149.279 km/h ),1
150,5,1950,5,Belgium,1,1,Juan Manuel FANGIO,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 47m 26s ( 177.097 km/h ),1
164,6,1950,6,France,1,1,Juan Manuel FANGIO,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 57m 52.8s ( 168.729 km/h ),1
188,7,1950,7,Italy,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 51m 17.4s ( 176.543 km/h ),1
222,8,1951,1,Switzerland,1,1,Juan Manuel FANGIO,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 07m 53.64s ( 143.444 km/h ),1
309,10,1951,3,Belgium,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 45m 46.2s ( 183.985 km/h ),1
325,11,1951,4,France,1,1,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,,1
433,15,1951,8,Spain,1,1,Juan Manuel FANGIO,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 46m 54.10s ( 158.939 km/h ),1


In [18]:
wins.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 502 entries, 0 to 25185
Data columns (total 11 columns):
race_id             502 non-null int64
year                502 non-null int64
round               502 non-null int64
race_name           502 non-null object
position            502 non-null object
order               502 non-null int64
driver              502 non-null object
team                502 non-null object
constructor_long    502 non-null object
extra               502 non-null object
p_final             502 non-null int64
dtypes: int64(5), object(6)
memory usage: 47.1+ KB


Unnamed: 0,race_id,year,round,race_name,position,order,driver,team,constructor_long,extra,p_final
10028,371,1982,14,Switzerland,1,1,Keke ROSBERG,Williams,Williams Ford Cosworth,1h 32m 41.087s ( 196.796 km/h ),1


# Now that we have the wins, we can group and start counting

In [21]:
win_count = wins_grouped.order.count().rename("wins")

In [22]:
win_count.sort_values(ascending=False).head(10)

year  team    
2016  Mercedes    19
2015  Mercedes    16
2014  Mercedes    16
2002  Ferrari     15
1988  McLaren     15
2004  Ferrari     15
2013  Red Bull    13
1996  Williams    12
2017  Mercedes    12
2011  Red Bull    12
Name: wins, dtype: int64

We can turn this series to a dataframe for what we'll be doing later with it.

In [23]:
win_count = win_count.to_frame().reset_index()

In [24]:
win_count.sort_values(by="wins",ascending=False).head(10)

Unnamed: 0,year,team,wins
63,2016,Mercedes,19
62,2015,Mercedes,16
61,2014,Mercedes,16
52,2002,Ferrari,15
38,1988,McLaren,15
54,2004,Ferrari,15
60,2013,Red Bull,13
58,2011,Red Bull,12
46,1996,Williams,12
34,1984,McLaren,12


Things match up.

To better compare things we should also normalize by the number of races in each season. We'll compute the percentage of races won in each season

In [25]:
def races_in_season(row):
    season = results[results.year == int(row.year)]
    return season["round"].max()

def get_win_percentage(row):
    w = float(row.wins)
    total = float(row.races)
    return (w/total)*100

In [26]:
win_analysis = win_count.copy()

In [27]:
win_analysis["races"] = win_count.apply(races_in_season, axis=1)

In [28]:
win_analysis.head(10)

Unnamed: 0,year,team,wins,races
0,1950,Alfa Romeo,6,7
1,1951,Alfa Romeo,4,8
2,1952,Ferrari,7,8
3,1953,Ferrari,7,9
4,1954,Mercedes,4,9
5,1955,Mercedes,5,7
6,1956,Ferrari,5,8
7,1957,Maserati,4,8
8,1958,Ferrari,2,11
9,1959,Cooper,5,9


In [29]:
win_analysis["win_percentage"] = win_analysis.apply(get_win_percentage, axis=1)

In [30]:
win_analysis.sort_values(by="wins", ascending=False).head(10)

Unnamed: 0,year,team,wins,races,win_percentage
63,2016,Mercedes,19,21,90.47619
62,2015,Mercedes,16,19,84.210526
61,2014,Mercedes,16,19,84.210526
52,2002,Ferrari,15,17,88.235294
38,1988,McLaren,15,16,93.75
54,2004,Ferrari,15,18,83.333333
60,2013,Red Bull,13,19,68.421053
58,2011,Red Bull,12,19,63.157895
46,1996,Williams,12,16,75.0
34,1984,McLaren,12,16,75.0


In [31]:
win_analysis.sort_values(by="win_percentage", ascending=False).head(10)

Unnamed: 0,year,team,wins,races,win_percentage
38,1988,McLaren,15,16,93.75
63,2016,Mercedes,19,21,90.47619
52,2002,Ferrari,15,17,88.235294
2,1952,Ferrari,7,8,87.5
0,1950,Alfa Romeo,6,7,85.714286
62,2015,Mercedes,16,19,84.210526
61,2014,Mercedes,16,19,84.210526
54,2004,Ferrari,15,18,83.333333
3,1953,Ferrari,7,9,77.777778
34,1984,McLaren,12,16,75.0


McLaren's 1988 run is ~4% better than Mercedes's 2016 run.

Let's save this analysis for plotting purposes in the piece.

In [32]:
win_analysis.to_csv("../data/output/win_analysis.csv", index=False)

---

## Method 02: Podiums

Looking at the wins is a good start, but there are a lot of factors about the team's performance over a season that it leaves out.

* It only shows a very narrow slice of the team's drivers's performance. If we only know that one of the drivers won, we have no idea how the other driver did.
* It offers a limited amount of comparison. Winning is a binary variable — you win or you don't. When looking at the history of the sport, things are greyer. For example, Keke Rosberg won the driver's cup in 1982, but he only had one victory that season. Looking only at the number of wins doesn't provide any context about how this happened.

We can dig a little deeper and look at podiums. The podium refers to the drivers who finished first, second, and third in any given race. A team that consistenly has both drivers on the podium over a season is doing amazing. (ex: Mercedes's dominance is better understood when you see Bottas and Hamilton on podium for almost every race of 2019 so far.)

Unnamed: 0,race_id,year,round,race_name,position,order,driver,team,constructor_long,extra,p_final
0,1,1950,1,Britain,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 23.6s ( 146.378 km/h ),1
1,1,1950,1,Britain,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 26.2s ( +02.6s ),2
2,1,1950,1,Britain,3,3,Reg PARNELL,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 14m 15.6s ( +52.0s ),3
25,2,1950,2,Monaco,1,1,Juan Manuel FANGIO,Alfa Romeo,Alfa Romeo Alfa Romeo,3h 13m 18.7s ( 98.701 km/h ),1
127,4,1950,4,Switzerland,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 02m 53.7s ( 149.279 km/h ),1
128,4,1950,4,Switzerland,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 02m 54.1s ( +00.4s ),2
150,5,1950,5,Belgium,1,1,Juan Manuel FANGIO,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 47m 26s ( 177.097 km/h ),1
151,5,1950,5,Belgium,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 47m 40s ( +14.000s ),2
164,6,1950,6,France,1,1,Juan Manuel FANGIO,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 57m 52.8s ( 168.729 km/h ),1
165,6,1950,6,France,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 58m 18.5s ( +25.7s ),2


In [36]:
podium_count.sort_values(ascending=False).head(10)

year  team    
2016  Mercedes    33
2015  Mercedes    32
2014  Mercedes    31
2004  Ferrari     29
2011  Red Bull    27
2002  Ferrari     27
2017  Mercedes    26
2018  Mercedes    25
1988  McLaren     25
2001  Ferrari     24
Name: podiums, dtype: int64

In [37]:
podium_count = podium_count.to_frame().reset_index()

Unnamed: 0,year,team,podiums
64,2016,Mercedes,33
63,2015,Mercedes,32
62,2014,Mercedes,31
54,2004,Ferrari,29
52,2002,Ferrari,27
59,2011,Red Bull,27
65,2017,Mercedes,26
38,1988,McLaren,25
66,2018,Mercedes,25
61,2013,Red Bull,24


We'll want to normalize this as well, but a little different. We want to take into account that there are different number of races and different number of drivers in each race (usually 2 drivers per team per race, but they sometimes have a third subbing in for one of the two, or in the earlier years, they had more than 2 drivers.)

To account for this, we'll look at each race and count the number of unique drivers who raced. We'll ignore the drivers whose `position` value is one of the following:

* nq: not qualified
* npq: not pre-qualified
* exc: excluded
* tf: parade lap
* f: withdrawal

We'll keep in the drivers whose `position` value was:

* a number spot
* ab: retired
* nc: not classified


For each race, we'll compute the minimum between 3 and the number of drivers for the team in that race. The reason for picking the minimum between 3 and number of drivers is that the max number of podium spots for any race is 3 and if a team only brought two drivers, their best they can do is get two podiums.

In [2]:
keep_out = ["nq", "npq", "exc", "tf"]
race_entries = results[~results.position.isin(keep_out)]

def podium_spots(row):
    race = race_entries[race_entries.race_id == row.race_id]
    team = race[race.team == row.team]
    races = team["round"].unique()
    spots = 0
    
    for race in races:
        driver_entries = team[team["round"] == race].driver.nunique()
        spots += min(3, driver_entries)
    
    return spots

NameError: name 'results' is not defined

NameError: name 'podium_count' is not defined

In [42]:
podium_analysis.head()

Unnamed: 0,year,team,podiums,podium_spots
0,1950,Alfa Romeo,12,18
1,1951,Alfa Romeo,9,21
2,1952,Ferrari,17,22
3,1953,Ferrari,16,24
4,1954,Mercedes,7,17


In [43]:
def podium_percentage(row):
    p = float(row.podiums)
    total = float(row.podium_spots)
    return (p/total) * 100

In [44]:
podium_analysis["podium_percentage"] = podium_analysis.apply(podium_percentage, axis=1)

In [45]:
podium_analysis.sort_values(by="podium_percentage", ascending=False).head(10)

Unnamed: 0,year,team,podiums,podium_spots,podium_percentage
63,2015,Mercedes,32,38,84.210526
62,2014,Mercedes,31,38,81.578947
54,2004,Ferrari,29,36,80.555556
52,2002,Ferrari,27,34,79.411765
64,2016,Mercedes,33,42,78.571429
38,1988,McLaren,25,32,78.125
2,1952,Ferrari,17,22,77.272727
59,2011,Red Bull,27,38,71.052632
51,2001,Ferrari,24,34,70.588235
43,1993,Williams,22,32,68.75


While Mercedes's 2016 run has the most podiums (they also had most wins), their podium percentage is only the fifth highest. Of their 33 podium spots, 19 are first place finishes. From the other 14 podium spots we can see they didn't have both drivers on the podium for 7 of the season's 21 races.

Their 2015 and 2014 percentages were way better in terms of podiums.

Looking at McLaren's 1988 run, they also had a lower podium percentage. This could be related to their car performance or driver mistakes costing them podiums.

Third place ferrari is also down to fourth from third, but still higher than Mercedes2016 or McLaren 1988 -- that F2002 was really robust.

We can save this podium analysis now.

In [47]:
podium_analysis.to_csv("../data/output/podium_analysis.csv", index=False)

---

### Putting both podiums and wins together

We can combine the `win_analysis` and `podium_analysis` dataframes to make web loading slightly faster (1 request vs 2 requests, no duplicate columns requested).

In [48]:
analysis = pd.merge(win_analysis, podium_analysis, on=["year","team"])

Unnamed: 0,year,team,wins,races,win_percentage,podiums,podium_spots,podium_percentage
0,1950,Alfa Romeo,6,7,85.714286,12,18,66.666667
1,1951,Alfa Romeo,4,8,50.0,9,21,42.857143
2,1952,Ferrari,7,8,87.5,17,22,77.272727
3,1953,Ferrari,7,9,77.777778,16,24,66.666667
4,1954,Mercedes,4,9,44.444444,7,17,41.176471


Let's also add a column to be a single label for each run:

In [52]:
analysis.head()

Unnamed: 0,year,team,wins,races,win_percentage,podiums,podium_spots,podium_percentage,run_id
0,1950,Alfa Romeo,6,7,85.714286,12,18,66.666667,Alfa Romeo 1950
1,1951,Alfa Romeo,4,8,50.0,9,21,42.857143,Alfa Romeo 1951
2,1952,Ferrari,7,8,87.5,17,22,77.272727,Ferrari 1952
3,1953,Ferrari,7,9,77.777778,16,24,66.666667,Ferrari 1953
4,1954,Mercedes,4,9,44.444444,7,17,41.176471,Mercedes 1954


---

## Method 3: Race Averages and Consistency

The best podium finish a team can have is to have one of their drivers on first, and the other on second — a one-two finish. We can see who had the highest number of one-two finishes each season, but I think it's more interesting to see who overall got the closest to having a perfect season.

To figure this out, I'll introduce the idea of a race average: for each race I'll average all of the team's finishing positions. the lower the average, the better the team performed in that race. If there are two drivers in a team, then the best average is a one-two finish which is a race average of 1.5.


First let's try to see how many drivers each team had for each race.

In [55]:
race_entries.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2449 entries, 0 to 25189
Data columns (total 11 columns):
race_id             2449 non-null int64
year                2449 non-null int64
round               2449 non-null int64
race_name           2449 non-null object
position            2449 non-null object
order               2449 non-null int64
driver              2449 non-null object
team                2449 non-null object
constructor_long    2449 non-null object
extra               2449 non-null object
p_final             2449 non-null int64
dtypes: int64(5), object(6)
memory usage: 229.6+ KB


year  team        round
1950  Alfa Romeo  1        4
                  2        3
                  4        3
                  5        3
                  6        3
                  7        5
1951  Alfa Romeo  1        4
                  3        3
                  4        4
                  5        4
                  6        4
                  7        5
                  8        4
Name: driver, dtype: int64

This is a good starting point. I can go race by race and get the averages doing the same thing (with say `.order.mean()` instead of `driver.nunique()`.

But I also want to look at how shared drives are handled.

year  team        round  driver                 
1950  Alfa Romeo  1      Giuseppe FARINA            1
                         Juan Manuel FANGIO         1
                         Luigi FAGIOLI              1
                         Reg PARNELL                1
                  2      Giuseppe FARINA            1
                         Juan Manuel FANGIO         1
                         Luigi FAGIOLI              1
                  4      Giuseppe FARINA            1
                         Juan Manuel FANGIO         1
                         Luigi FAGIOLI              1
                  5      Giuseppe FARINA            1
                         Juan Manuel FANGIO         1
                         Luigi FAGIOLI              1
                  6      Giuseppe FARINA            1
                         Juan Manuel FANGIO         1
                         Luigi FAGIOLI              1
                  7      Consalvo SANESI            1
                         Giuseppe

In the 1950 Italian GP, Juan Manuel Fangio has two records because of shared driving. Let's look at a more extreme example, the 1955 argentinian grand prix:

In [119]:
f1_1955 = race_results[race_results.year == 1955]

In [120]:
f1_1955.head()

Unnamed: 0,race_id,year,round,race_name,position,order,driver,team,constructor_long,extra,p_final
1307,42,1955,1,Argentina,1,1,Juan Manuel FANGIO,Mercedes,Mercedes Mercedes,3h 00m 38.6s ( 124.738 km/h ),1
1308,42,1955,1,Argentina,2,2,Jose-Froilan GONZALEZ,Ferrari,Ferrari Ferrari,,2
1309,42,1955,1,Argentina,&,2,Giuseppe FARINA,Ferrari,Ferrari Ferrari,3h 02m 08.2s ( +1m 29.6s ),2
1310,42,1955,1,Argentina,&,2,Maurice TRINTIGNANT,Ferrari,Ferrari Ferrari,,2
1311,42,1955,1,Argentina,3,3,Giuseppe FARINA,Ferrari,Ferrari Ferrari,,3


In [121]:
argentina55 = f1_1955[f1_1955["round"] == 1]

In [122]:
argentina55

Unnamed: 0,race_id,year,round,race_name,position,order,driver,team,constructor_long,extra,p_final
1307,42,1955,1,Argentina,1,1,Juan Manuel FANGIO,Mercedes,Mercedes Mercedes,3h 00m 38.6s ( 124.738 km/h ),1
1308,42,1955,1,Argentina,2,2,Jose-Froilan GONZALEZ,Ferrari,Ferrari Ferrari,,2
1309,42,1955,1,Argentina,&,2,Giuseppe FARINA,Ferrari,Ferrari Ferrari,3h 02m 08.2s ( +1m 29.6s ),2
1310,42,1955,1,Argentina,&,2,Maurice TRINTIGNANT,Ferrari,Ferrari Ferrari,,2
1311,42,1955,1,Argentina,3,3,Giuseppe FARINA,Ferrari,Ferrari Ferrari,,3
1312,42,1955,1,Argentina,&,3,Umberto MAGLIOLI,Ferrari,Ferrari Ferrari,,3
1313,42,1955,1,Argentina,&,3,Maurice TRINTIGNANT,Ferrari,Ferrari Ferrari,,3
1314,42,1955,1,Argentina,4,4,Hans HERRMANN,Mercedes,Mercedes Mercedes,,4
1315,42,1955,1,Argentina,&,4,Karl KLING,Mercedes,Mercedes Mercedes,,4
1316,42,1955,1,Argentina,&,4,Stirling MOSS,Mercedes,Mercedes Mercedes,,4


In [65]:
g3 = argentina55.groupby(["team", "driver"])

In [66]:
g3.order.count()

team      driver               
Ferrari   Giuseppe FARINA          2
          Jose-Froilan GONZALEZ    1
          Maurice TRINTIGNANT      3
          Umberto MAGLIOLI         1
Gordini   Elie BAYOL               1
          Jesus IGLESIAS           1
          Pablo BIRGER             1
Lancia    Alberto ASCARI           1
          Eugenio CASTELLOTTI      1
          Luigi VILLORESI          2
Maserati  Alberto URIA             1
          Carlos MENDITEGUY        2
          Clemar BUCCI             1
          Harry SCHELL             3
          Jean BEHRA               3
          Luigi MUSSO              2
          Roberto MIERES           1
          Sergio MANTOVANI         2
Mercedes  Hans HERRMANN            1
          Juan Manuel FANGIO       1
          Karl KLING               2
          Stirling MOSS            2
Name: order, dtype: int64

In [67]:
ag55_drivers = g3.order.mean()

In [68]:
ag55_drivers.head(10)

team     driver               
Ferrari  Giuseppe FARINA           2.500000
         Jose-Froilan GONZALEZ     2.000000
         Maurice TRINTIGNANT       5.333333
         Umberto MAGLIOLI          3.000000
Gordini  Elie BAYOL               16.000000
         Jesus IGLESIAS           10.000000
         Pablo BIRGER             20.000000
Lancia   Alberto ASCARI           15.000000
         Eugenio CASTELLOTTI      12.000000
         Luigi VILLORESI          14.500000
Name: order, dtype: float64

In [69]:
ag55 = ag55_drivers.to_frame().reset_index()

In [70]:
ag55.head(10)

Unnamed: 0,team,driver,order
0,Ferrari,Giuseppe FARINA,2.5
1,Ferrari,Jose-Froilan GONZALEZ,2.0
2,Ferrari,Maurice TRINTIGNANT,5.333333
3,Ferrari,Umberto MAGLIOLI,3.0
4,Gordini,Elie BAYOL,16.0
5,Gordini,Jesus IGLESIAS,10.0
6,Gordini,Pablo BIRGER,20.0
7,Lancia,Alberto ASCARI,15.0
8,Lancia,Eugenio CASTELLOTTI,12.0
9,Lancia,Luigi VILLORESI,14.5


In [71]:
ag55_main = ag55.groupby(["team"]).order.mean()

In [72]:
ag55_main.to_frame().reset_index()

Unnamed: 0,team,order
0,Ferrari,3.208333
1,Gordini,15.333333
2,Lancia,13.833333
3,Maserati,9.4375
4,Mercedes,6.25


Extending this pattern to other races could be the way to go:

1. groupby `["year", "round", "team", "driver"]` and find the average finishing position for each driver.
1. Turn it into a dataframe
1. take dataframe and now group by `["year", "round", "team"]`. Find average finishing position for each team on a particular round.
1. Turn into a dataframe. Save for general graphic; Do step 5 for the winning runs
1. Take dataframe and groupby `["year"]`. And find the average finishing position for each team + standard deviation. With this we can compare the teams, and also get an idea for how varied each team's performance was. (ideal would be small variance and small average).

Let's try things out with the 16 teams.

In [73]:
grouped= race_entries.groupby(["year", "round", "team", "driver"])

In [74]:
grouped.order.mean()

year  round  team        driver                 
1950  1      Alfa Romeo  Giuseppe FARINA             1.0
                         Juan Manuel FANGIO         12.0
                         Luigi FAGIOLI               2.0
                         Reg PARNELL                 3.0
      2      Alfa Romeo  Giuseppe FARINA            12.0
                         Juan Manuel FANGIO          1.0
                         Luigi FAGIOLI              11.0
      4      Alfa Romeo  Giuseppe FARINA             1.0
                         Juan Manuel FANGIO         12.0
                         Luigi FAGIOLI               2.0
      5      Alfa Romeo  Giuseppe FARINA             4.0
                         Juan Manuel FANGIO          1.0
                         Luigi FAGIOLI               2.0
      6      Alfa Romeo  Giuseppe FARINA             7.0
                         Juan Manuel FANGIO          1.0
                         Luigi FAGIOLI               2.0
      7      Alfa Romeo  Consalvo SANES

In [75]:
grouped_drivers = race_entries.groupby(["year", "round", "team", "driver"]).order.mean().rename("average_finishing_position")

In [76]:
g_drivers = grouped_drivers.to_frame().reset_index()

In [77]:
g_drivers.head(21)

Unnamed: 0,year,round,team,driver,average_finishing_position
0,1950,1,Alfa Romeo,Giuseppe FARINA,1.0
1,1950,1,Alfa Romeo,Juan Manuel FANGIO,12.0
2,1950,1,Alfa Romeo,Luigi FAGIOLI,2.0
3,1950,1,Alfa Romeo,Reg PARNELL,3.0
4,1950,2,Alfa Romeo,Giuseppe FARINA,12.0
5,1950,2,Alfa Romeo,Juan Manuel FANGIO,1.0
6,1950,2,Alfa Romeo,Luigi FAGIOLI,11.0
7,1950,4,Alfa Romeo,Giuseppe FARINA,1.0
8,1950,4,Alfa Romeo,Juan Manuel FANGIO,12.0
9,1950,4,Alfa Romeo,Luigi FAGIOLI,2.0


This is alfa romeo in 1950. Let's move on to getting their full team average

In [78]:
grouped_rounds = g_drivers.groupby(["year", "team", "round"])

In [79]:
g_rounds = grouped_rounds.average_finishing_position.mean().to_frame().reset_index()

In [80]:
g_rounds.head(10)

Unnamed: 0,year,team,round,average_finishing_position
0,1950,Alfa Romeo,1,4.5
1,1950,Alfa Romeo,2,8.0
2,1950,Alfa Romeo,4,5.0
3,1950,Alfa Romeo,5,2.333333
4,1950,Alfa Romeo,6,3.333333
5,1950,Alfa Romeo,7,10.8
6,1951,Alfa Romeo,1,3.25
7,1951,Alfa Romeo,3,7.0
8,1951,Alfa Romeo,4,6.75
9,1951,Alfa Romeo,5,6.5


Now we can get to the team, year combos: 

In [81]:
grouped = g_rounds.groupby(["year", "team"])

In [82]:
g = grouped.average_finishing_position.mean().to_frame().reset_index()

In [83]:
g.sort_values("average_finishing_position", ascending=True).head()

Unnamed: 0,year,team,average_finishing_position
66,2017,Mercedes,3.175
64,2015,Mercedes,3.210526
54,2004,Ferrari,3.333333
60,2011,Red Bull,3.473684
65,2016,Mercedes,3.47619


Let's save this to a csv file:

In [84]:
g.to_csv("../data/output/averages_champions.csv", index=False)

---
#### Aside: Dealing with drivers who had multiple records in a race:
Before I do this to all the races and all the teams, I want to check if the way I handled the multiple records is crazy. When I had multiple records, I averaged them to a single one. What if I handled them another way?

1. Take the lowest finish (highest number)?
1. Take the highest finish (lowest number)?

Lets' compare the averages over these three cases, first we need a way to check if a driver had more than 1 record in a race. I calculated that earlier:

In [85]:
count_driver_entries = race_entries.groupby(["year", "team", "round", "driver"]).order.count().rename("records")

In [86]:
count_driver_entries.head(12)

year  team        round  driver            
1950  Alfa Romeo  1      Giuseppe FARINA       1
                         Juan Manuel FANGIO    1
                         Luigi FAGIOLI         1
                         Reg PARNELL           1
                  2      Giuseppe FARINA       1
                         Juan Manuel FANGIO    1
                         Luigi FAGIOLI         1
                  4      Giuseppe FARINA       1
                         Juan Manuel FANGIO    1
                         Luigi FAGIOLI         1
                  5      Giuseppe FARINA       1
                         Juan Manuel FANGIO    1
Name: records, dtype: int64

Let's turn this to a frame and work on finding the averages for each race:

In [87]:
driver_entries = count_driver_entries.to_frame().reset_index()

In [88]:
driver_entries.head()

Unnamed: 0,year,team,round,driver,records
0,1950,Alfa Romeo,1,Giuseppe FARINA,1
1,1950,Alfa Romeo,1,Juan Manuel FANGIO,1
2,1950,Alfa Romeo,1,Luigi FAGIOLI,1
3,1950,Alfa Romeo,1,Reg PARNELL,1
4,1950,Alfa Romeo,2,Giuseppe FARINA,1


Let's now define functions for each of the three cases:

In [89]:
def all_finishes(row):
    season = race_entries[race_entries.year == row.year]
    race = season[season["round"] == row["round"]]
    team = race[race.team == row.team]
    driver = team[team.driver == row.driver]
    return driver.order.unique().tolist()

def highest_finish_v1(row):
    finishes = row.finishes
    if len(finishes) < 2:
        return finishes[0]
    else:
        return min(finishes)
    
def lowest_finish_v1(row):
    finishes = row.finishes
    if len(finishes) < 2:
        return finishes[0]
    else:
        return max(finishes)

def average_finish_v1(row):
    finishes = row.finishes
    if len(finishes) < 2:
        return finishes[0]
    else:
        return np.mean(finishes)

In [90]:
driver_entries["finishes"] = driver_entries.apply(all_finishes, axis = 1)
driver_entries["highest_finish"] = driver_entries.apply(highest_finish_v1, axis = 1)
driver_entries["lowest_finish"] = driver_entries.apply(lowest_finish_v1, axis = 1)
driver_entries["average_finish"] = driver_entries.apply(average_finish_v1, axis = 1)

In [91]:
driver_entries.head(20)

Unnamed: 0,year,team,round,driver,records,finishes,highest_finish,lowest_finish,average_finish
0,1950,Alfa Romeo,1,Giuseppe FARINA,1,[1],1,1,1.0
1,1950,Alfa Romeo,1,Juan Manuel FANGIO,1,[12],12,12,12.0
2,1950,Alfa Romeo,1,Luigi FAGIOLI,1,[2],2,2,2.0
3,1950,Alfa Romeo,1,Reg PARNELL,1,[3],3,3,3.0
4,1950,Alfa Romeo,2,Giuseppe FARINA,1,[12],12,12,12.0
5,1950,Alfa Romeo,2,Juan Manuel FANGIO,1,[1],1,1,1.0
6,1950,Alfa Romeo,2,Luigi FAGIOLI,1,[11],11,11,11.0
7,1950,Alfa Romeo,4,Giuseppe FARINA,1,[1],1,1,1.0
8,1950,Alfa Romeo,4,Juan Manuel FANGIO,1,[12],12,12,12.0
9,1950,Alfa Romeo,4,Luigi FAGIOLI,1,[2],2,2,2.0


Now that we have these we can do the 3 different calculations based on the earlier algorithm:

In [92]:
group_drivers_avg = driver_entries.copy().groupby(["year", "round", "team", "driver"]).average_finish.mean().rename("average_finish")
group_drivers_low = driver_entries.copy().groupby(["year", "round", "team", "driver"]).lowest_finish.mean().rename("average_finish")
group_drivers_hig = driver_entries.copy().groupby(["year", "round", "team", "driver"]).highest_finish.mean().rename("average_finish")

In [93]:
g_d_avg = group_drivers_avg.to_frame().reset_index()
g_d_low = group_drivers_low.to_frame().reset_index()
g_d_hig = group_drivers_hig.to_frame().reset_index()

Change the last three letters to check the different groups

In [94]:
g_d_hig.head(21)

Unnamed: 0,year,round,team,driver,average_finish
0,1950,1,Alfa Romeo,Giuseppe FARINA,1
1,1950,1,Alfa Romeo,Juan Manuel FANGIO,12
2,1950,1,Alfa Romeo,Luigi FAGIOLI,2
3,1950,1,Alfa Romeo,Reg PARNELL,3
4,1950,2,Alfa Romeo,Giuseppe FARINA,12
5,1950,2,Alfa Romeo,Juan Manuel FANGIO,1
6,1950,2,Alfa Romeo,Luigi FAGIOLI,11
7,1950,4,Alfa Romeo,Giuseppe FARINA,1
8,1950,4,Alfa Romeo,Juan Manuel FANGIO,12
9,1950,4,Alfa Romeo,Luigi FAGIOLI,2


In [95]:
g_r_avg = g_d_avg.groupby(["year", "team", "round"]).average_finish.mean().to_frame().reset_index()
g_r_low = g_d_low.groupby(["year", "team", "round"]).average_finish.mean().to_frame().reset_index()
g_r_hig = g_d_hig.groupby(["year", "team", "round"]).average_finish.mean().to_frame().reset_index()

In [96]:
g_r_hig.head(10)

Unnamed: 0,year,team,round,average_finish
0,1950,Alfa Romeo,1,4.5
1,1950,Alfa Romeo,2,8.0
2,1950,Alfa Romeo,4,5.0
3,1950,Alfa Romeo,5,2.333333
4,1950,Alfa Romeo,6,3.333333
5,1950,Alfa Romeo,7,10.6
6,1951,Alfa Romeo,1,3.25
7,1951,Alfa Romeo,3,7.0
8,1951,Alfa Romeo,4,4.25
9,1951,Alfa Romeo,5,6.5


In [97]:
g_avg = g_r_avg.groupby(["year", "team"]).average_finish.mean().to_frame().reset_index()
g_low = g_r_low.groupby(["year", "team"]).average_finish.mean().to_frame().reset_index()
g_hig = g_r_hig.groupby(["year", "team"]).average_finish.mean().to_frame().reset_index()

In [98]:
g_low.head(10)

Unnamed: 0,year,team,average_finish
0,1950,Alfa Romeo,5.694444
1,1951,Alfa Romeo,8.157143
2,1952,Ferrari,12.661756
3,1953,Ferrari,7.853125
4,1954,Mercedes,6.083333
5,1955,Mercedes,7.555556
6,1956,Ferrari,9.702381
7,1957,Maserati,10.196712
8,1958,Ferrari,7.833333
9,1959,Cooper,9.825893


In [99]:
g_hig.head(10)

Unnamed: 0,year,team,average_finish
0,1950,Alfa Romeo,5.627778
1,1951,Alfa Romeo,7.071429
2,1952,Ferrari,12.391915
3,1953,Ferrari,7.603125
4,1954,Mercedes,6.083333
5,1955,Mercedes,6.555556
6,1956,Ferrari,7.804762
7,1957,Maserati,10.002268
8,1958,Ferrari,7.833333
9,1959,Cooper,9.825893


In [100]:
g_avg.head(10)

Unnamed: 0,year,team,average_finish
0,1950,Alfa Romeo,5.661111
1,1951,Alfa Romeo,7.614286
2,1952,Ferrari,12.526835
3,1953,Ferrari,7.728125
4,1954,Mercedes,6.083333
5,1955,Mercedes,7.055556
6,1956,Ferrari,8.753571
7,1957,Maserati,10.09949
8,1958,Ferrari,7.833333
9,1959,Cooper,9.825893


The differences seem pretty small. Let's put them all together into one dataframe.

In [101]:
g = g_avg.copy()

In [102]:
g["average_finish_lowest"] = g_low.average_finish
g["average_finish_highest"] = g_hig.average_finish

In [103]:
g.head()

Unnamed: 0,year,team,average_finish,average_finish_lowest,average_finish_highest
0,1950,Alfa Romeo,5.661111,5.694444,5.627778
1,1951,Alfa Romeo,7.614286,8.157143,7.071429
2,1952,Ferrari,12.526835,12.661756,12.391915
3,1953,Ferrari,7.728125,7.853125,7.603125
4,1954,Mercedes,6.083333,6.083333,6.083333


In [104]:
def avg(row):
    a = [row.average_finish_lowest, row.average_finish_highest]
    return np.mean(a)

def mean(row):
    a = [row.average_finish_lowest, row.average_finish, row.average_finish_highest]
    return np.mean(a)

def overall_stdev(row):
    a = [row.average_finish_lowest, row.average_finish, row.average_finish_highest]
    return np.std(a)

In [105]:
g["avg"] = g.apply(avg, axis = 1)
g["mean"] = g.apply(mean, axis = 1)
g["std"] = g.apply(overall_stdev, axis = 1)

In [106]:
g

Unnamed: 0,year,team,average_finish,average_finish_lowest,average_finish_highest,avg,mean,std
0,1950,Alfa Romeo,5.661111,5.694444,5.627778,5.661111,5.661111,2.721655e-02
1,1951,Alfa Romeo,7.614286,8.157143,7.071429,7.614286,7.614286,4.432410e-01
2,1952,Ferrari,12.526835,12.661756,12.391915,12.526835,12.526835,1.101622e-01
3,1953,Ferrari,7.728125,7.853125,7.603125,7.728125,7.728125,1.020621e-01
4,1954,Mercedes,6.083333,6.083333,6.083333,6.083333,6.083333,0.000000e+00
5,1955,Mercedes,7.055556,7.555556,6.555556,7.055556,7.055556,4.082483e-01
6,1956,Ferrari,8.753571,9.702381,7.804762,8.753571,8.753571,7.746997e-01
7,1957,Maserati,10.099490,10.196712,10.002268,10.099490,10.099490,7.938161e-02
8,1958,Ferrari,7.833333,7.833333,7.833333,7.833333,7.833333,0.000000e+00
9,1959,Cooper,9.825893,9.825893,9.825893,9.825893,9.825893,0.000000e+00


So I don't have to worry about picking the highest finish or the lowest finish or the average finish for each driver. The deviations are so small that it's basically the same, and the average of the three is the average finish in most cases. Now we can get back to method 3.

---

Let's take a look at the winning teams again and find their standard deviation and variance and maybe compute a confidence interval for each season.

In [109]:
g_drivers = race_entries.groupby(["year", "round", "team", "driver"]).order.mean().rename("average_finish").to_frame().reset_index()
g_rounds = g_drivers.groupby(["year", "team", "round"]).average_finish.mean().to_frame().reset_index()
g_teams = g_rounds.groupby(["year", "team"]).average_finish.mean().to_frame().reset_index()

In [110]:
g_teams.head()

Unnamed: 0,year,team,average_finish
0,1950,Alfa Romeo,5.661111
1,1951,Alfa Romeo,7.614286
2,1952,Ferrari,12.526835
3,1953,Ferrari,7.728125
4,1954,Mercedes,6.083333


In [111]:
def get_std(row):
    season = g_rounds[g_rounds.year == row.year]
    team = season[season.team == row.team]
    return team.average_finish.std()

def get_var(row):
    season = g_rounds[g_rounds.year == row.year]
    team = season[season.team == row.team]
    return team.average_finish.var()

# g_rounds.groupby(["year", "team"]).average_finish.describe()

In [112]:
g_teams["std"] = g_teams.apply(get_std, axis = 1)
g_teams["var"] = g_teams.apply(get_var, axis = 1)

In [113]:
g_teams.head(10)

Unnamed: 0,year,team,average_finish,std,var
0,1950,Alfa Romeo,5.661111,3.167222,10.031296
1,1951,Alfa Romeo,7.614286,3.980129,15.841429
2,1952,Ferrari,12.526835,7.670544,58.837243
3,1953,Ferrari,7.728125,2.681312,7.189433
4,1954,Mercedes,6.083333,1.237156,1.530556
5,1955,Mercedes,7.055556,4.784717,22.893519
6,1956,Ferrari,8.753571,2.682358,7.195043
7,1957,Maserati,10.09949,3.466743,12.018309
8,1958,Ferrari,7.833333,1.940074,3.763889
9,1959,Cooper,9.825893,2.229432,4.970367


In [114]:
g_teams.sort_values("average_finish", ascending=True)

Unnamed: 0,year,team,average_finish,std,var
66,2017,Mercedes,3.175000,1.779082,3.165132
64,2015,Mercedes,3.210526,3.220230,10.369883
54,2004,Ferrari,3.333333,2.555271,6.529412
60,2011,Red Bull,3.473684,3.207949,10.290936
65,2016,Mercedes,3.476190,4.578417,20.961905
38,1988,McLaren,3.718750,3.768151,14.198958
67,2018,Mercedes,3.857143,3.428348,11.753571
52,2002,Ferrari,4.147059,3.757346,14.117647
63,2014,Mercedes,4.236842,3.931244,15.454678
57,2007,Ferrari,4.794118,3.450810,11.908088


Let's save this for working on in presentation

In [115]:
g_teams.to_csv("../data/output/averages_champions.csv", index=False)

Now let's apply the same process to all the teams and races:

In [116]:
all_race_entries = race_results[~race_results.position.isin(keep_out)]

In [117]:
all_race_entries.head(10)

Unnamed: 0,race_id,year,round,race_name,position,order,driver,team,constructor_long,extra,p_final
0,1,1950,1,Britain,1,1,Giuseppe FARINA,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 23.6s ( 146.378 km/h ),1
1,1,1950,1,Britain,2,2,Luigi FAGIOLI,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 13m 26.2s ( +02.6s ),2
2,1,1950,1,Britain,3,3,Reg PARNELL,Alfa Romeo,Alfa Romeo Alfa Romeo,2h 14m 15.6s ( +52.0s ),3
3,1,1950,1,Britain,4,4,Yves GIRAUD-CABANTOUS,Talbot Lago,Talbot Lago Talbot,,4
4,1,1950,1,Britain,5,5,Louis ROSIER,Talbot Lago,Talbot Lago Talbot,,5
5,1,1950,1,Britain,6,6,Bob GERARD,ERA,ERA ERA,,6
6,1,1950,1,Britain,7,7,Cuth HARRISON,ERA,ERA ERA,,7
7,1,1950,1,Britain,8,8,Philippe ETANCELIN,Talbot Lago,Talbot Lago Talbot,,8
8,1,1950,1,Britain,9,9,David HAMPSHIRE,Maserati,Maserati Maserati,,9
9,1,1950,1,Britain,10,10,Joe FRY,Maserati,Maserati Maserati,,10


In [None]:
dsq = all_race_entries[all_race_entries.position == "dsq"]

In [None]:
dsq.head()

In [None]:
results[results.position == "dsq"]

In [None]:
retired = r[r.position == "ab"]