Topics covered:
<ul>
    <li>Sorting DataFrames based on single column</li>
    <li>Sorting DataFrames based on multiple columns</li>
    <li>Sorting in ascending and descending order</li>
    <li>Sorting based on row and column labels</li>
    <li>Sorting in the presence of missing values</li>
</ul>

In [None]:
import pandas as pd

The dataset has 26 columns but for the purpose of this demo, we will use only a few columns which have been shown below as a list.

In [None]:
sub_columns = ['id', 'make', 'model', 'year', 'cylinders', 'fuelType', 'trany', 'mpgData', 'city08', 'highway08']

In [None]:
df1 = pd.read_csv('../csv-files/vehicles.csv', usecols = sub_columns)

In [None]:
df1.info()

In [None]:
df1.head()

In [None]:
#Check for null values
null_vals = df1.isnull().sum()
print(null_vals[null_vals != 0])

<strong>Sorting data based on a column:</strong>

In [None]:
#Sort df1 based on the values of fuelType
df1.sort_values('fuelType')

The sorted does not affect the original DataFrame. It only returns a copy of the sorted DataFrame. This can be verified by looking at df1.

In [None]:
df1.head()

In [None]:
#Sort df1 by using the column city08
df1.sort_values('city08')

<strong>Sorting data in descending order:</strong>

In [None]:
df1.sort_values(by = 'city08', ascending = False)

<strong>Sorting using multiple columns:</strong>

In [None]:
df1.sort_values(by = ['city08', 'highway08']).head(20)

<strong>Sorting multiple columns in descending order:</strong>

In [None]:
df1.sort_values(by = ['city08', 'highway08'], ascending = False).head(20)

<strong>Different order for different columns:</strong>

You may choose to sort according to multiple columns, where sorting can be ascending or descendingfor different columns. For example, in the below cell, city08 and highway08 follow descending orderwhile year follows ascending order.

In [None]:
df1.sort_values(by = ['city08', 'highway08', 'year'], ascending = [False, False, True], inplace = True)
df1.head()

<strong>Sorting based on index:</strong>

In [None]:
df1.sort_index()

We now create a new DataFrame named new_index_df from df1 where the index is set to thevalues in column year.

In [None]:
new_index_df = df1.set_index('year')

In [None]:
new_index_df.head()

In [None]:
#Sort row indices in descending order
new_index_df.sort_index(ascending = False)

<strong>Sorting based on column labels:</strong>

Sorting can also work on column labels by setting the ‘axis’ parameter to 1.  The below cell willsort the DataFrame new_index_df based on column labels in descending order.

In [None]:
new_index_df.sort_index(axis = 1, ascending = False)

<strong>Sorting in the presence of missing values:</strong>

In [None]:
df1.sort_values(by = 'cylinders', na_position = 'first', ascending = False)

In [None]:
#Sort null values at the end
df1.sort_values(by = 'cylinders', na_position = 'last')