# Data Ranking

Ranking the series based on the given values each element has.

- Rank by ascending and descending order
- Rank by dense rank if found 2 values are same
- Rank by Maximum rank if found 2 values are same
- Rank by Minimum rank if found 2 values are same
- Rank by group

In [1]:
import pandas as pd
import datetime
import numpy as np

In [2]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [3]:
data=pd.read_excel('Sample - Superstore.xls')
data['OrderDateMonth']=data['Order Date'].apply(lambda x: x.month)
data['OrderDateYear']=data['Order Date'].apply(lambda x: x.year)

In [4]:
salesMonthYear=pd.pivot_table(data,index=['OrderDateMonth'],columns=['OrderDateYear'],aggfunc='sum',values='Sales')

In [5]:
salesMonthYear

OrderDateYear,2014,2015,2016,2017
OrderDateMonth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,14236.895,18174.0756,18542.491,43971.374
2,4519.892,11951.411,22978.815,20301.1334
3,55691.009,38726.252,51715.875,58872.3528
4,28295.345,34195.2085,38750.039,36521.5361
5,23648.287,30131.6865,56987.728,44261.1102
6,34595.1276,24797.292,40344.534,52981.7257
7,33946.393,28765.325,39261.963,45264.416
8,27909.4685,36898.3322,31115.3743,63120.888
9,81777.3508,64595.918,73410.0249,87866.652
10,31453.393,31404.9235,59687.745,77776.9232


Pandas comes wit rank function which optionally takes a parameter ascending which by default is true; when false, data is reverse-ranked, with larger values assigned a smaller rank.

In [6]:
salesMonthYear.rank()

OrderDateYear,2014,2015,2016,2017
OrderDateMonth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,2.0,2.0,1.0,3.0
2,1.0,1.0,2.0,1.0
3,9.0,9.0,7.0,7.0
4,5.0,7.0,4.0,2.0
5,3.0,5.0,8.0,4.0
6,8.0,3.0,6.0,6.0
7,7.0,4.0,5.0,5.0
8,4.0,8.0,3.0,8.0
9,12.0,10.0,10.0,11.0
10,6.0,6.0,9.0,9.0


In [7]:
salesMonthYear.rank(ascending=False)

OrderDateYear,2014,2015,2016,2017
OrderDateMonth,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,11.0,11.0,12.0,10.0
2,12.0,12.0,11.0,12.0
3,4.0,4.0,6.0,6.0
4,8.0,6.0,9.0,11.0
5,10.0,8.0,5.0,9.0
6,5.0,10.0,7.0,7.0
7,6.0,9.0,8.0,8.0
8,9.0,5.0,10.0,5.0
9,1.0,3.0,3.0,2.0
10,7.0,7.0,4.0,4.0


In [8]:
sampleDF=pd.DataFrame(data={'A':[100,101,102,101,103,105,107,107,105,99]})

In [9]:
sampleDF['defaultRank']=sampleDF.rank()
sampleDF

Unnamed: 0,A,defaultRank
0,100,2.0
1,101,3.5
2,102,5.0
3,101,3.5
4,103,6.0
5,105,7.5
6,107,9.5
7,107,9.5
8,105,7.5
9,99,1.0


In [10]:
sampleDF['denseRank']=sampleDF['A'].rank(method='dense')
sampleDF

Unnamed: 0,A,defaultRank,denseRank
0,100,2.0,2.0
1,101,3.5,3.0
2,102,5.0,4.0
3,101,3.5,3.0
4,103,6.0,5.0
5,105,7.5,6.0
6,107,9.5,7.0
7,107,9.5,7.0
8,105,7.5,6.0
9,99,1.0,1.0


If a series has two values same then assigns the minimum rank to both the value as shown below

In [11]:
sampleDF['minRank']=sampleDF['A'].rank(method='min')
sampleDF

Unnamed: 0,A,defaultRank,denseRank,minRank
0,100,2.0,2.0,2.0
1,101,3.5,3.0,3.0
2,102,5.0,4.0,5.0
3,101,3.5,3.0,3.0
4,103,6.0,5.0,6.0
5,105,7.5,6.0,7.0
6,107,9.5,7.0,9.0
7,107,9.5,7.0,9.0
8,105,7.5,6.0,7.0
9,99,1.0,1.0,1.0


If a series has two values same then assigns the maximum rank to both the value as shown below

In [12]:
sampleDF['maxRank']=sampleDF['A'].rank(method='max')
sampleDF

Unnamed: 0,A,defaultRank,denseRank,minRank,maxRank
0,100,2.0,2.0,2.0,2.0
1,101,3.5,3.0,3.0,4.0
2,102,5.0,4.0,5.0,5.0
3,101,3.5,3.0,3.0,4.0
4,103,6.0,5.0,6.0,6.0
5,105,7.5,6.0,7.0,8.0
6,107,9.5,7.0,9.0,10.0
7,107,9.5,7.0,9.0,10.0
8,105,7.5,6.0,7.0,8.0
9,99,1.0,1.0,1.0,1.0


# Rank in Group by approach

In [13]:
data.head(2)

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit,OrderDateMonth,OrderDateYear
0,1,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96,2,0.0,41.9136,11,2016
1,2,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94,3,0.0,219.582,11,2016


In [14]:
someMoreDf=data.groupby(['OrderDateYear','Category']).agg({'Profit':'sum'}).reset_index()

In [15]:
someMoreDf

Unnamed: 0,OrderDateYear,Category,Profit
0,2014,Furniture,5457.7255
1,2014,Office Supplies,22593.4161
2,2014,Technology,21492.8325
3,2015,Furniture,3015.2029
4,2015,Office Supplies,25099.5338
5,2015,Technology,33503.867
6,2016,Furniture,6959.9531
7,2016,Office Supplies,35061.2292
8,2016,Technology,39773.992
9,2017,Furniture,3018.3913


In [16]:
someMoreDf['overallRank']=someMoreDf['Profit'].rank()

In [17]:
someMoreDf

Unnamed: 0,OrderDateYear,Category,Profit,overallRank
0,2014,Furniture,5457.7255,3.0
1,2014,Office Supplies,22593.4161,6.0
2,2014,Technology,21492.8325,5.0
3,2015,Furniture,3015.2029,1.0
4,2015,Office Supplies,25099.5338,7.0
5,2015,Technology,33503.867,8.0
6,2016,Furniture,6959.9531,4.0
7,2016,Office Supplies,35061.2292,9.0
8,2016,Technology,39773.992,11.0
9,2017,Furniture,3018.3913,2.0


In [19]:
for i in someMoreDf.groupby("OrderDateYear")["Profit"]:
    print (i)

(2014, 0     5457.7255
1    22593.4161
2    21492.8325
Name: Profit, dtype: float64)
(2015, 3     3015.2029
4    25099.5338
5    33503.8670
Name: Profit, dtype: float64)
(2016, 6     6959.9531
7    35061.2292
8    39773.9920
Name: Profit, dtype: float64)
(2017, 9      3018.3913
10    39736.6217
11    50684.2566
Name: Profit, dtype: float64)


In [18]:
someMoreDf['groupRank']=someMoreDf.groupby("OrderDateYear")["Profit"].rank(ascending=False,method='dense')
someMoreDf

Unnamed: 0,OrderDateYear,Category,Profit,overallRank,groupRank
0,2014,Furniture,5457.7255,3.0,3.0
1,2014,Office Supplies,22593.4161,6.0,1.0
2,2014,Technology,21492.8325,5.0,2.0
3,2015,Furniture,3015.2029,1.0,3.0
4,2015,Office Supplies,25099.5338,7.0,2.0
5,2015,Technology,33503.867,8.0,1.0
6,2016,Furniture,6959.9531,4.0,3.0
7,2016,Office Supplies,35061.2292,9.0,2.0
8,2016,Technology,39773.992,11.0,1.0
9,2017,Furniture,3018.3913,2.0,3.0
