In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Sorting is a way of ordering elements in an order based on their rank.In this notebook we'll see different ways of performing sorting in pandas. This notebookis part of a blogpost I wrote on the same topic.

07/14/021: Updated for the latest version

07/08/021: Updated for the latest version

In [None]:
df = pd.read_csv('/kaggle/input/most-starred-github-repositories/Most starred Github Repositories.csv')
df.head()

Let's quickly go over the various columns of the dataset:
Project Name: Name of the repository in Github
* Stars: A bookmark or display of appreciation for a repository. 
* Forks : A fork is a copy of a repository that you manage.
* Language : Main programming languages used in the project
* Open Issues : Issues are suggested improvements, tasks or questions related to the repository. The issues which haven't been resolved are labelled as open issues.
* Description : A paragraph detailing the purpose of the project.
* Last Commit: A commit, or "revision", is an individual change to a file (or set of files). This field stores the date and time of the last commit.

Note: All the above definitions have been taken from the Github glossary.
The current dataset is ordered by the number of Stars ⭐️ i.e, the project with the maximum number of stars comes first and so on. Pandas supports three kinds of sorting: sorting by index labels, sorting by column values, and sorting by a combination of both. Let's now look at the different ways of sorting this dataset with some examples:

## 1. Sorting on a single column
The function used for sorting in pandas is called `DataFrame.sort_values()`. It is used to sort a DataFrame by its column or row values. Let's sort the dataset by the Forks column.

In [None]:
forks = df.sort_values(by='forks',ascending=False)
forks.head()

The function `dataframe.sort_values` comes with a lot of parameters. We will touch upon a few important ones as we advance through the article. In the above example, we have encountered two of them :
* by: The optional by parameter is  used to specify the column(s) which are used to determine the sorted order.
* ascending: specifies  whether to sort the dataframe in ascending or descending order. The default value is ascending. To sort in descending order, we need to specify `ascending=False`.

## 2. Sorting on multiple columns
Pandas also make it possible to sort the dataset on multiple columns. Simply, pass in the list of the desired columns names in the `sort_values` function as follows:

In [None]:
df.sort_values(by=['issues','stars']).head()

In the example above, we have sorted the dataframe based on the number of `open issues` and the number of stars a project has. Note that by default, the sorting has been done in ascending order.

## 3. Sorting by Multiple Columns With Different Sort Orders
When sorting by multiple columns, it is also possible to pass in different sort orders for different columns.

In [None]:
df.sort_values(by=['issues', 'stars'],
        ascending=[False, True]).head(10)

In the above examples, the dataframe will be first sorted on the `Open Issues` column in ascending order and then on the `Stars` column in descending order.

## 4. Sorting by index

Another way of sorting a dataframe would be by its index. In section 1, we created a dataframe named forks. This is just another version of the original dataframe, which has been sorted on the `Forks` columns. The dataframe appears like this:

In [None]:
forks.head()

As is evident, the index is unsorted. We can sort it by using the `dataframe.sort_index()` function.


In [None]:
forks.sort_index()

Alternatively, you can sort the index in descending order by passing in the `ascending=False` the argument in the function above.


## 5. Ignore the index while sorting

The index column can also be ignored entirely while sorting the dataframe. This results in an index labeled from 0 to n-1 where n refers to the number of observations.



In [None]:
df.sort_values(by='forks',ascending=False, ignore_index=True).head()

## 6. Choosing the sorting algorithm
We touched upon the topic of different sorting algorithms in the beginning. By default, `sort_values` uses the **quicksort** algorithm. However, we can choose between **_‘_quicksort,’ ‘mergesort’ and ‘heapsort’** algorithm using the kind parameter . Remember that this option is only applied when sorting on a single column or label.

In [None]:
df.sort_values(by='forks', kind='mergesort').head()

## 7. Sorting by column names

Additionally, we can also sort the dataframe using the column names instead of the rows using the sort_index() function. For this we need to set the axis parameter to 1.


In [None]:
df.sort_index(axis=1).head(5)

The columns above have been sorted in ascending alphabetical order. By setting `ascending=False`, the sorting can be done in descending order also.

## 8. Performing operations in-place

By setting the `inplace` parameter to `True`, all the sorting operations are done in place. This means that the existing dataframe gets modified. When `**inplace** = False` the operations take place on a copy of the dataframe, which is then returned. The original dataframe remains unchanged.

In [None]:
sorted_forks = df.sort_values(by='forks', inplace=True)

## 9. Handling missing values

Data usually contains null values. Using the `na_position` as first or last, in `sort_values()` function, we can choose to puts NaNs at the beginning or at the end.




In [None]:
df.sort_values(by='forks', na_position='first') #NaN placed first  
df.sort_values(by='forks', na_position='last') #NaN placed in the end

## 10. Apply the key function to the values before sorting

We can also apply a key function to the values before sorting. The function expects a `Series` and return a Series with the same shape as the input. It will be applied to each column in by independently. In the example below, we first convert the column `Project Name` in lowercase and then sort the dataframe on this column

In [None]:
df.sort_values(by='repo_name',key=lambda col: col.str.lower())[:5]

### Conclusion and additional resources

In this article we looked at the different ways of sorting a dataframe using the pandas library. We looked at the usage of both sort_values() as well as the sort_index() functions along with their parameters. [The official documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html) is an excellent resource if you are thinking of going deeper into the details.