# Pandas Sort - Sorting Data in Python

## Getting Started with Pandas Sort Methods

### Preparing the Dataset

In [6]:
from pathlib import Path
import pandas as pd
import glob

In [7]:
CURRENT_DIR = Path.cwd()
DATA_DIR = CURRENT_DIR / "data"

In [8]:
column_subset = [
    'id',
    'make',
    'model',
    'year',
    'cylinders',
    'fuelType',
    'trany',
    'mpgData',
    'city08',
    'highway08'
]

In [29]:
df = pd.read_csv(
    DATA_DIR / 'vehicles.csv',
    usecols=column_subset,
    nrows=100
)

In [30]:
df.head()

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
0,19,4,Regular,25,1,Alfa Romeo,Spider Veloce 2000,Y,Manual 5-spd,1985
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
2,23,4,Regular,33,100,Dodge,Charger,Y,Manual 5-spd,1985
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
4,17,4,Premium,23,10000,Subaru,Legacy AWD Turbo,N,Manual 5-spd,1993


## Sorting on a Single Column

Sorting the DataFrame on a single column in ascending order:

In [31]:
df.sort_values('city08')

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
99,9,8,Premium,13,10087,Rolls-Royce,Brooklands/Brklnds L,N,Automatic 4-spd,1993
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
80,9,8,Regular,10,1007,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985
47,9,8,Regular,11,1004,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
...,...,...,...,...,...,...,...,...,...,...
9,23,4,Regular,30,10005,Toyota,Corolla,Y,Automatic 4-spd,1993
8,23,4,Regular,31,10004,Toyota,Corolla,Y,Manual 5-spd,1993
7,23,4,Regular,26,10003,Toyota,Corolla,Y,Automatic 3-spd,1993
76,23,4,Regular,31,10066,Mazda,626,Y,Manual 5-spd,1993


To change the sort order, set `ascending=False`:

In [32]:
df.sort_values(
    by='city08',
    ascending=False
)

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
9,23,4,Regular,30,10005,Toyota,Corolla,Y,Automatic 4-spd,1993
2,23,4,Regular,33,100,Dodge,Charger,Y,Manual 5-spd,1985
7,23,4,Regular,26,10003,Toyota,Corolla,Y,Automatic 3-spd,1993
8,23,4,Regular,31,10004,Toyota,Corolla,Y,Manual 5-spd,1993
76,23,4,Regular,31,10066,Mazda,626,Y,Manual 5-spd,1993
...,...,...,...,...,...,...,...,...,...,...
58,10,8,Regular,11,1005,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985
80,9,8,Regular,10,1007,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
47,9,8,Regular,11,1004,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985


There are *different algorithms* to use with `.sort_values()` and `.sort_index()` - `quicksort`, `mergesort`, and `heapsort`.

`quicksort` is the default algorithm when sorting a single column. You can change this to a stable sorting using `mergesort`:

In [33]:
df.sort_values(
    by='city08',
    ascending=False,
    kind='mergesort'
)

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
2,23,4,Regular,33,100,Dodge,Charger,Y,Manual 5-spd,1985
7,23,4,Regular,26,10003,Toyota,Corolla,Y,Automatic 3-spd,1993
8,23,4,Regular,31,10004,Toyota,Corolla,Y,Manual 5-spd,1993
9,23,4,Regular,30,10005,Toyota,Corolla,Y,Automatic 4-spd,1993
10,23,4,Regular,30,10006,Toyota,Corolla,Y,Manual 5-spd,1993
...,...,...,...,...,...,...,...,...,...,...
69,10,8,Regular,11,1006,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
47,9,8,Regular,11,1004,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
80,9,8,Regular,10,1007,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985


**Note**: In pandas, `kind` is ignored when you sort on more than one column or label.

## Sorting on Multiple Columns

To pass a list of column names:

In [34]:
df.sort_values(
    by=['city08', 'highway08']
)[['city08', 'highway08']]

Unnamed: 0,city08,highway08
80,9,10
47,9,11
99,9,13
1,9,14
58,10,11
...,...,...
9,23,30
10,23,30
8,23,31
76,23,31


Sorting by multiple columns in ascending order:

In [35]:
df.sort_values(
    by=['make', 'model']
)[['make', 'model']]

Unnamed: 0,make,model
0,Alfa Romeo,Spider Veloce 2000
18,Audi,100
19,Audi,100
20,BMW,740i
21,BMW,740il
...,...,...
12,Volkswagen,Golf III / GTI
13,Volkswagen,Jetta III
15,Volkswagen,Jetta III
16,Volvo,240


Changing the column sort order:

In [36]:
df.sort_values(
    by=['model', 'make']
)[['make','model']]

Unnamed: 0,make,model
18,Audi,100
19,Audi,100
16,Volvo,240
17,Volvo,240
75,Mazda,626
...,...,...
62,Ford,Thunderbird
63,Ford,Thunderbird
88,Oldsmobile,Toronado
42,CX Automotive,XM v6


Sorting in descending order:

In [37]:
df.sort_values(
    by=['make', 'model'],
    ascending=False
)[['make','model']]

Unnamed: 0,make,model
16,Volvo,240
17,Volvo,240
13,Volkswagen,Jetta III
15,Volkswagen,Jetta III
11,Volkswagen,Golf III / GTI
...,...,...
21,BMW,740il
20,BMW,740i
18,Audi,100
19,Audi,100


**Note:** with textual data, the sort is **case sensitive**, meaning capitalized text will appear *first* in ascending order and *last* in descending order. 

Sorting by multiple columns with different sort orders:

In [38]:
df.sort_values(
    by=['make', 'model', 'city08'],
    ascending=[True, True, False]
)[['make', 'model', 'city08']]

Unnamed: 0,make,model,city08
0,Alfa Romeo,Spider Veloce 2000,19
18,Audi,100,17
19,Audi,100,17
20,BMW,740i,14
21,BMW,740il,14
...,...,...,...
11,Volkswagen,Golf III / GTI,18
15,Volkswagen,Jetta III,20
13,Volkswagen,Jetta III,18
17,Volvo,240,19


## Sorting the DataFrame on its Index

Sorting by index in ascending order:

In [39]:
sorted_df = df.sort_values(by=['make', 'model'])
sorted_df

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
0,19,4,Regular,25,1,Alfa Romeo,Spider Veloce 2000,Y,Manual 5-spd,1985
18,17,6,Premium,22,10013,Audi,100,Y,Automatic 4-spd,1993
19,17,6,Premium,24,10014,Audi,100,N,Manual 5-spd,1993
20,14,8,Premium,20,10015,BMW,740i,N,Automatic 5-spd,1993
21,14,8,Premium,20,10016,BMW,740il,N,Automatic 5-spd,1993
...,...,...,...,...,...,...,...,...,...,...
12,21,4,Regular,29,10008,Volkswagen,Golf III / GTI,Y,Manual 5-spd,1993
13,18,4,Regular,26,10009,Volkswagen,Jetta III,N,Automatic 4-spd,1993
15,20,4,Regular,28,10010,Volkswagen,Jetta III,N,Manual 5-spd,1993
16,18,4,Regular,23,10011,Volvo,240,Y,Automatic 4-spd,1993


Getting the new DataFrame back to the original order:

In [40]:
sorted_df.sort_index()

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
0,19,4,Regular,25,1,Alfa Romeo,Spider Veloce 2000,Y,Manual 5-spd,1985
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
2,23,4,Regular,33,100,Dodge,Charger,Y,Manual 5-spd,1985
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
4,17,4,Premium,23,10000,Subaru,Legacy AWD Turbo,N,Manual 5-spd,1993
...,...,...,...,...,...,...,...,...,...,...
95,17,6,Regular,25,10083,Pontiac,Grand Prix,Y,Automatic 3-spd,1993
96,17,6,Regular,27,10084,Pontiac,Grand Prix,N,Automatic 4-spd,1993
97,15,6,Regular,24,10085,Pontiac,Grand Prix,N,Automatic 4-spd,1993
98,15,6,Regular,24,10086,Pontiac,Grand Prix,N,Manual 5-spd,1993


If you want to set a custom index using the make and model columns, then you can pass a list to `.set_index()`:

In [41]:
assigned_index_df = df.set_index(
    ['make', 'model']
)
assigned_index_df

Unnamed: 0_level_0,Unnamed: 1_level_0,city08,cylinders,fuelType,highway08,id,mpgData,trany,year
make,model,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alfa Romeo,Spider Veloce 2000,19,4,Regular,25,1,Y,Manual 5-spd,1985
Ferrari,Testarossa,9,12,Regular,14,10,N,Manual 5-spd,1985
Dodge,Charger,23,4,Regular,33,100,Y,Manual 5-spd,1985
Dodge,B150/B250 Wagon 2WD,10,8,Regular,12,1000,N,Automatic 3-spd,1985
Subaru,Legacy AWD Turbo,17,4,Premium,23,10000,N,Manual 5-spd,1993
...,...,...,...,...,...,...,...,...,...
Pontiac,Grand Prix,17,6,Regular,25,10083,Y,Automatic 3-spd,1993
Pontiac,Grand Prix,17,6,Regular,27,10084,N,Automatic 4-spd,1993
Pontiac,Grand Prix,15,6,Regular,24,10085,N,Automatic 4-spd,1993
Pontiac,Grand Prix,15,6,Regular,24,10086,N,Manual 5-spd,1993


This is considered a `MultiIndex` or a **hierarchical index**. Your DataFrame is now indexed by more than one key, which you can sort on with `.sort_index()`:

In [42]:
assigned_index_df.sort_index()

Unnamed: 0_level_0,Unnamed: 1_level_0,city08,cylinders,fuelType,highway08,id,mpgData,trany,year
make,model,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Alfa Romeo,Spider Veloce 2000,19,4,Regular,25,1,Y,Manual 5-spd,1985
Audi,100,17,6,Premium,22,10013,Y,Automatic 4-spd,1993
Audi,100,17,6,Premium,24,10014,N,Manual 5-spd,1993
BMW,740i,14,8,Premium,20,10015,N,Automatic 5-spd,1993
BMW,740il,14,8,Premium,20,10016,N,Automatic 5-spd,1993
...,...,...,...,...,...,...,...,...,...
Volkswagen,Golf III / GTI,21,4,Regular,29,10008,Y,Manual 5-spd,1993
Volkswagen,Jetta III,18,4,Regular,26,10009,N,Automatic 4-spd,1993
Volkswagen,Jetta III,20,4,Regular,28,10010,N,Manual 5-spd,1993
Volvo,240,18,4,Regular,23,10011,Y,Automatic 4-spd,1993


Sorting by index in descending order:

In [43]:
assigned_index_df.sort_index(ascending=False)

Unnamed: 0_level_0,Unnamed: 1_level_0,city08,cylinders,fuelType,highway08,id,mpgData,trany,year
make,model,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Volvo,240,18,4,Regular,23,10011,Y,Automatic 4-spd,1993
Volvo,240,19,4,Regular,26,10012,Y,Manual 5-spd,1993
Volkswagen,Jetta III,18,4,Regular,26,10009,N,Automatic 4-spd,1993
Volkswagen,Jetta III,20,4,Regular,28,10010,N,Manual 5-spd,1993
Volkswagen,Golf III / GTI,18,4,Regular,26,10007,N,Automatic 4-spd,1993
...,...,...,...,...,...,...,...,...,...
BMW,740il,14,8,Premium,20,10016,N,Automatic 5-spd,1993
BMW,740i,14,8,Premium,20,10015,N,Automatic 5-spd,1993
Audi,100,17,6,Premium,22,10013,Y,Automatic 4-spd,1993
Audi,100,17,6,Premium,24,10014,N,Manual 5-spd,1993


## Sorting the Columns of your Dataframe

Using column labels to sort:

In [44]:
df.sort_index(axis=1)

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year
0,19,4,Regular,25,1,Alfa Romeo,Spider Veloce 2000,Y,Manual 5-spd,1985
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985
2,23,4,Regular,33,100,Dodge,Charger,Y,Manual 5-spd,1985
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985
4,17,4,Premium,23,10000,Subaru,Legacy AWD Turbo,N,Manual 5-spd,1993
...,...,...,...,...,...,...,...,...,...,...
95,17,6,Regular,25,10083,Pontiac,Grand Prix,Y,Automatic 3-spd,1993
96,17,6,Regular,27,10084,Pontiac,Grand Prix,N,Automatic 4-spd,1993
97,15,6,Regular,24,10085,Pontiac,Grand Prix,N,Automatic 4-spd,1993
98,15,6,Regular,24,10086,Pontiac,Grand Prix,N,Manual 5-spd,1993


To sort columns in descending order:

In [45]:
df.sort_index(axis=1,ascending=False)

Unnamed: 0,year,trany,mpgData,model,make,id,highway08,fuelType,cylinders,city08
0,1985,Manual 5-spd,Y,Spider Veloce 2000,Alfa Romeo,1,25,Regular,4,19
1,1985,Manual 5-spd,N,Testarossa,Ferrari,10,14,Regular,12,9
2,1985,Manual 5-spd,Y,Charger,Dodge,100,33,Regular,4,23
3,1985,Automatic 3-spd,N,B150/B250 Wagon 2WD,Dodge,1000,12,Regular,8,10
4,1993,Manual 5-spd,N,Legacy AWD Turbo,Subaru,10000,23,Premium,4,17
...,...,...,...,...,...,...,...,...,...,...
95,1993,Automatic 3-spd,Y,Grand Prix,Pontiac,10083,25,Regular,6,17
96,1993,Automatic 4-spd,N,Grand Prix,Pontiac,10084,27,Regular,6,17
97,1993,Automatic 4-spd,N,Grand Prix,Pontiac,10085,24,Regular,6,15
98,1993,Manual 5-spd,N,Grand Prix,Pontiac,10086,24,Regular,6,15


## Working with Missing Data when Sorting in Pandas

Create a new column based on the existing `mpgData` column, mapping True where `mpgData` equals `Y` and `NaN` where it doesn’t:

In [46]:
df['mpgData_'] = df['mpgData'].map({'Y': True})
df

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year,mpgData_
0,19,4,Regular,25,1,Alfa Romeo,Spider Veloce 2000,Y,Manual 5-spd,1985,True
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985,
2,23,4,Regular,33,100,Dodge,Charger,Y,Manual 5-spd,1985,True
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985,
4,17,4,Premium,23,10000,Subaru,Legacy AWD Turbo,N,Manual 5-spd,1993,
...,...,...,...,...,...,...,...,...,...,...,...
95,17,6,Regular,25,10083,Pontiac,Grand Prix,Y,Automatic 3-spd,1993,True
96,17,6,Regular,27,10084,Pontiac,Grand Prix,N,Automatic 4-spd,1993,
97,15,6,Regular,24,10085,Pontiac,Grand Prix,N,Automatic 4-spd,1993,
98,15,6,Regular,24,10086,Pontiac,Grand Prix,N,Manual 5-spd,1993,


To change that behavior and have the missing data appear first in your DataFrame, you can set `na_position` to first:

In [47]:
df.sort_values(
    by='mpgData_',
    na_position='first'
)

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year,mpgData_
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985,
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985,
4,17,4,Premium,23,10000,Subaru,Legacy AWD Turbo,N,Manual 5-spd,1993,
5,21,4,Regular,24,10001,Subaru,Loyale,N,Automatic 3-spd,1993,
11,18,4,Regular,26,10007,Volkswagen,Golf III / GTI,N,Automatic 4-spd,1993,
...,...,...,...,...,...,...,...,...,...,...,...
32,15,8,Premium,23,10026,Cadillac,Eldorado,Y,Automatic 4-spd,1993,True
33,15,8,Premium,23,10027,Cadillac,Seville,Y,Automatic 4-spd,1993,True
37,17,6,Regular,25,10030,Chevrolet,Lumina,Y,Automatic 3-spd,1993,True
85,17,6,Regular,27,10074,Oldsmobile,Cutlass Supreme,Y,Automatic 4-spd,1993,True


## Using Sort Methods to Modify your DataFrame

Using `.sort_values()` in place:

In [48]:
df.sort_values('city08', inplace=True)

In [49]:
df

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year,mpgData_
99,9,8,Premium,13,10087,Rolls-Royce,Brooklands/Brklnds L,N,Automatic 4-spd,1993,
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985,
80,9,8,Regular,10,1007,Dodge,B350 Wagon 2WD,N,Automatic 3-spd,1985,
47,9,8,Regular,11,1004,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985,
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985,
...,...,...,...,...,...,...,...,...,...,...,...
9,23,4,Regular,30,10005,Toyota,Corolla,Y,Automatic 4-spd,1993,True
8,23,4,Regular,31,10004,Toyota,Corolla,Y,Manual 5-spd,1993,True
7,23,4,Regular,26,10003,Toyota,Corolla,Y,Automatic 3-spd,1993,True
76,23,4,Regular,31,10066,Mazda,626,Y,Manual 5-spd,1993,True


Using `.sort_index()` in place modifies the DataFrame object back to its initial order. Use `.sort_order()` with `inplace` set to `True` to modify the DataFrame:

In [50]:
df.sort_index(inplace=True)
df

Unnamed: 0,city08,cylinders,fuelType,highway08,id,make,model,mpgData,trany,year,mpgData_
0,19,4,Regular,25,1,Alfa Romeo,Spider Veloce 2000,Y,Manual 5-spd,1985,True
1,9,12,Regular,14,10,Ferrari,Testarossa,N,Manual 5-spd,1985,
2,23,4,Regular,33,100,Dodge,Charger,Y,Manual 5-spd,1985,True
3,10,8,Regular,12,1000,Dodge,B150/B250 Wagon 2WD,N,Automatic 3-spd,1985,
4,17,4,Premium,23,10000,Subaru,Legacy AWD Turbo,N,Manual 5-spd,1993,
...,...,...,...,...,...,...,...,...,...,...,...
95,17,6,Regular,25,10083,Pontiac,Grand Prix,Y,Automatic 3-spd,1993,True
96,17,6,Regular,27,10084,Pontiac,Grand Prix,N,Automatic 4-spd,1993,
97,15,6,Regular,24,10085,Pontiac,Grand Prix,N,Automatic 4-spd,1993,
98,15,6,Regular,24,10086,Pontiac,Grand Prix,N,Manual 5-spd,1993,
