### Let’s start with importing Pandas and creating a sample DataFrame.

In [1]:
import pandas as pd
df = pd.DataFrame({
    "name": ["John","Jane","Emily","Lisa","Matt","Jenny","Adam"],
    "current": [92,94,87,82,90,78,84],
    "overall": [184,173,184,201,208,182,185],
    "group":["A","B","C","A","A","C","B"]
})
df

Unnamed: 0,name,current,overall,group
0,John,92,184,A
1,Jane,94,173,B
2,Emily,87,184,C
3,Lisa,82,201,A
4,Matt,90,208,A
5,Jenny,78,182,C
6,Adam,84,185,B


#### We have created a DataFrame with 7 rows and 4 columns. Let’s start with the default settings and assign a rank to the rows based on the overall column.

In [2]:
df["rank_default"] = df["overall"].rank()
df

Unnamed: 0,name,current,overall,group,rank_default
0,John,92,184,A,3.5
1,Jane,94,173,B,1.0
2,Emily,87,184,C,3.5
3,Lisa,82,201,A,6.0
4,Matt,90,208,A,7.0
5,Jenny,78,182,C,2.0
6,Adam,84,185,B,5.0


#### The rank default column contains the rank values. Two important things about the default settings are:

* The order is ascending so the lowest value is assigned the first rank. In our example, Jane has the lowest overall score and so the first rank.


* In the case of equality, the rank is determined by taking the average. For instance, John and Emily have the same overall score which is the third-lowest. Since two people share the same score, they are assigned rank 3.5 which is the average of 3 and 4. The next lowest score is 185 and it is ranked 5th.

#### These are the default settings. Let’s change the order and sort them in descending order so that the person with the highest score is ranked 1st.

In [3]:
df["rank_default_desc"] = df["overall"].rank(ascending=False)
df = df.sort_values(by="rank_default_desc", ignore_index=True)
df

Unnamed: 0,name,current,overall,group,rank_default,rank_default_desc
0,Matt,90,208,A,7.0,1.0
1,Lisa,82,201,A,6.0,2.0
2,Adam,84,185,B,5.0,3.0
3,John,92,184,A,3.5,4.5
4,Emily,87,184,C,3.5,4.5
5,Jenny,78,182,C,2.0,6.0
6,Jane,94,173,B,1.0,7.0


We have also sorted the rows based on the “rank_default_desc” column by using the sort_values function. It is easier to follow and compare the values in this way.

Matt has the highest overall score so he is ranked 1st when sorted in descending order. John and Emily are now the 4th and 5th people so they are assigned with rank 4.5 because we are still using the average method in the case of equality.

# Different ranking methods

The rank function has 5 different options to be used in the case of equality.
The option is selected with the method parameter and the default value is “average” as we have seen in the previous examples.


The other options are “min”, “max”, “first”, and “dense”. Let’s first compare the min and max with the average.

In [4]:
# create DataFrame
df = pd.DataFrame({
    "name": ["John","Jane","Emily","Lisa","Matt","Jenny","Adam"],
    "current": [92,94,87,82,90,78,84],
    "overall": [184,173,184,201,208,182,185],
    "group":["A","B","C","A","A","C","B"]
})

In [5]:
# create rank columns
df["rank_default"] = df["overall"].rank(ascending=False)
df["rank_min"] = df["overall"].rank(method="min", ascending=False)
df["rank_max"] = df["overall"].rank(method="max", ascending=False)

In [6]:
# sort rows
df = df.sort_values(by="rank_default", ignore_index=True)

df

Unnamed: 0,name,current,overall,group,rank_default,rank_min,rank_max
0,Matt,90,208,A,1.0,1.0,1.0
1,Lisa,82,201,A,2.0,2.0,2.0
2,Adam,84,185,B,3.0,3.0,3.0
3,John,92,184,A,4.5,4.0,5.0
4,Emily,87,184,C,4.5,4.0,5.0
5,Jenny,78,182,C,6.0,6.0,6.0
6,Jane,94,173,B,7.0,7.0,7.0


The difference is observed when two or more rows have the same value. John and Emily are the 4th and 5th people.

* method = “average” gives the average value which is 4.5
* method = “min” gives the lowest value which is 4.
* method = “max” gives the highest value which is 5.

#### You may have noticed that we did not write the method parameter when creating the “rank_default” column. Since it is the default value, we do not need to specify it but it also works if you write it as follows:

In [7]:
df["rank_default"] = df["overall"].rank(method="average", ascending=False)
df

Unnamed: 0,name,current,overall,group,rank_default,rank_min,rank_max
0,Matt,90,208,A,1.0,1.0,1.0
1,Lisa,82,201,A,2.0,2.0,2.0
2,Adam,84,185,B,3.0,3.0,3.0
3,John,92,184,A,4.5,4.0,5.0
4,Emily,87,184,C,4.5,4.0,5.0
5,Jenny,78,182,C,6.0,6.0,6.0
6,Jane,94,173,B,7.0,7.0,7.0


The other two options for the method parameter are “first” and “dense”. Let’s add them to our current DataFrame and then explain what they do.

In [8]:
df["rank_first"] = df["overall"].rank(method="first", ascending=False)
df["rank_dense"] = df["overall"].rank(method="dense", ascending=False)
df

Unnamed: 0,name,current,overall,group,rank_default,rank_min,rank_max,rank_first,rank_dense
0,Matt,90,208,A,1.0,1.0,1.0,1.0,1.0
1,Lisa,82,201,A,2.0,2.0,2.0,2.0,2.0
2,Adam,84,185,B,3.0,3.0,3.0,3.0,3.0
3,John,92,184,A,4.5,4.0,5.0,4.0,4.0
4,Emily,87,184,C,4.5,4.0,5.0,5.0,4.0
5,Jenny,78,182,C,6.0,6.0,6.0,6.0,5.0
6,Jane,94,173,B,7.0,7.0,7.0,7.0,6.0


* The “first” method assigns the ranks based on the order they appear. In the case of equality, the first observation in the DataFrame takes the next value, the second one takes the next, and so on. In our case, John and Emily are the 4th and 5th people so John’s rank is 4, and Emily’s rank is 5.

* The “dense” method is like the “min” but it does not put any gap between the rows that have the same value and the next row. John and Emily have the rank 4 as they do with the “min” but there is a difference between the ranks of the next person. Jenny has ranked 6 with the “min” method but her rank is 5 with the “dense” method.

# Ranking based on multiple columns

In [9]:
# create DataFrame
df = pd.DataFrame({
    "name": ["John","Jane","Emily","Lisa","Matt","Jenny","Adam"],
    "current": [92,94,87,82,90,78,84],
    "overall": [184,173,184,201,208,182,185],
    "group":["A","B","C","A","A","C","B"]
})

In [10]:
# create rank column
df["rank_overall_current"] = df[["overall","current"]].apply(tuple, axis=1).rank(ascending=False)

In [11]:
# sort values
df = df.sort_values(by="rank_overall_current", ignore_index=True)
df

Unnamed: 0,name,current,overall,group,rank_overall_current
0,Matt,90,208,A,1.0
1,Lisa,82,201,A,2.0
2,Adam,84,185,B,3.0
3,John,92,184,A,4.0
4,Emily,87,184,C,5.0
5,Jenny,78,182,C,6.0
6,Jane,94,173,B,7.0


The rank is created based on the combination of overall and rank. Since the values in the overall column come first, they are checked first. If the overall values are equal, then the current values are considered.

For instance, John and Emily have the same overall value but John’s current value is higher so he is ranked before Emily.

# Ranking within groups

In [12]:
# create DataFrame
df = pd.DataFrame({
    "name": ["John","Jane","Emily","Lisa","Matt","Jenny","Adam"],
    "current": [92,94,87,82,90,78,84],
    "overall": [184,173,184,201,208,182,185],
    "group":["A","B","C","A","A","C","B"]
})

In [14]:
# create rank column
df["group_rank"] = df.groupby("group")["overall"].rank(
ascending=False)

In [15]:
# sort values
df = df.sort_values(by=["group","group_rank"], ignore_index=True)
df

Unnamed: 0,name,current,overall,group,group_rank
0,Matt,90,208,A,1.0
1,Lisa,82,201,A,2.0
2,John,92,184,A,3.0
3,Adam,84,185,B,1.0
4,Jane,94,173,B,2.0
5,Emily,87,184,C,1.0
6,Jenny,78,182,C,2.0


We now have a separate rank within each group. We can change the method of the rank function. It will work just like the example we have done so far. The only difference is that everything will be done separately for each group.