Sorting data is an important step in data analysis as it helps to organize and structure the information for easier interpretation and decision-making. Whether we're working with small datasets or large ones, sorting allows us to arrange data in a meaningful way.

Pandas provides the sort_values() method which allows us to sort a DataFrame by one or more columns in either ascending or descending order.

### 1. Sorting a DataFrame by a Single Column
The sort_values() method in Pandas makes it easy to sort our DataFrame by a single column. By default, it sorts in ascending order but we can customize this.

In [1]:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Score': [85, 90, 95, 80]}
df = pd.DataFrame(data)

sorted_df = df.sort_values(by='Age')
print(sorted_df)

      Name  Age  Score
0    Alice   25     85
1      Bob   30     90
2  Charlie   35     95
3    David   40     80


In [2]:
sorted_df = df.sort_values(by='Age',ascending=False)
print(sorted_df)

      Name  Age  Score
3    David   40     80
2  Charlie   35     95
1      Bob   30     90
0    Alice   25     85


# Parameters of `sort_values()`

| Parameter      | Description                                                                 |
|----------------|-----------------------------------------------------------------------------|
| `by`           | Specifies the column(s) to sort by.                                         |
| `ascending`    | A boolean (default `True` for ascending, `False` for descending).           |
| `inplace`      | If `True`, modifies the original DataFrame; otherwise returns a new sorted one. |
| `na_position`  | Controls where `NaN` values are placed: `'first'` (top) or `'last'` (default). |
| `ignore_index` | If `True`, resets the index after sorting.                                  |


#### 2. Sorting a DataFrame by Multiple Columns
When sorting by multiple columns, Pandas allows us to specify a list of column names. This is useful when we want to sort by one column like age and if there are ties, sort by another column like salary.

In [3]:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Score': [85, 90, 95, 80]}
df = pd.DataFrame(data)

sorted_df = df.sort_values(by=['Age', 'Score'])
print(sorted_df)

      Name  Age  Score
0    Alice   25     85
1      Bob   30     90
2  Charlie   35     95
3    David   40     80


#### 3. Sorting DataFrame with Missing Values
In real-world datasets, missing values (NaNs) are common. By default sort_values() places NaN values at the end. If we need them at the top, we can use the na_position parameter.






In [4]:
import pandas as pd
data_with_nan = {"Name": ["Alice", "Bob", "Charlie", "David"],"Age": [28, 22, None, 22]}
df_nan = pd.DataFrame(data_with_nan)

sorted_df = df_nan.sort_values(by="Age", na_position="first")
print(sorted_df)

      Name   Age
2  Charlie   NaN
1      Bob  22.0
3    David  22.0
0    Alice  28.0


#### 4. Sorting by Index
In addition to sorting by column values, we may also want to sort a DataFrame based on its index. This can be done using the sort_index() method in Pandas. By default, sort_index() sorts the DataFrame based on the index in ascending order.






In [6]:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Score': [85, 90, 95, 80]}
df = pd.DataFrame(data)

df_sorted_by_index = df.sort_index()
print(df_sorted_by_index)

      Name  Age  Score
0    Alice   25     85
1      Bob   30     90
2  Charlie   35     95
3    David   40     80


In [7]:
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Score': [85, 90, 95, 80]}
df = pd.DataFrame(data)
df_sorted_by_index_desc = df.sort_index(ascending=False)
print(df_sorted_by_index_desc)

      Name  Age  Score
3    David   40     80
2  Charlie   35     95
1      Bob   30     90
0    Alice   25     85


### 5. Choosing a Sorting Algorithm
Pandas provides different sorting algorithms that we can choose using the kind parameter. Available options are:

####  1. QuickSort (kind='quicksort'): 
It is a highly efficient, divide-and-conquer sorting algorithm. It selects a "pivot" element and partitions the dataset into two halves: one with elements smaller than the pivot and the other with elements greater than the pivot.






In [8]:
import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [28, 22, 25, 22, 28],
    "Score": [85, 90, 95, 80, 88]
}
df = pd.DataFrame(data)

sorted_df = df.sort_values(by='Age', kind='quicksort')
print(sorted_df)

      Name  Age  Score
1      Bob   22     90
3    David   22     80
2  Charlie   25     95
0    Alice   28     85
4      Eve   28     88


In [9]:
import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [28, 22, 25, 22, 28],
    "Score": [85, 90, 95, 80, 88]
}
df = pd.DataFrame(data)

sorted_df = df.sort_values(by='Age', kind='mergesort')
print(sorted_df)

      Name  Age  Score
1      Bob   22     90
3    David   22     80
2  Charlie   25     95
0    Alice   28     85
4      Eve   28     88


In [10]:
import pandas as pd

data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [28, 22, 25, 22, 28],
    "Score": [85, 90, 95, 80, 88]
}
df = pd.DataFrame(data)

sorted_df = df.sort_values(by='Age', kind='heapsort')
print(sorted_df)

      Name  Age  Score
1      Bob   22     90
3    David   22     80
2  Charlie   25     95
4      Eve   28     88
0    Alice   28     85


### 6. Applying Custom Sorting Logic
We can also apply custom sorting logic using the key parameter. This is useful when we need to sort strings in a specific way such as ignoring case sensitivity.

In [11]:
import pandas as pd
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [28, 22, 25, 22, 28],
    "Score": [85, 90, 95, 80, 88]
}
df = pd.DataFrame(data)

sorted_df = df.sort_values(by='Name', key=lambda col: col.str.lower())
print(sorted_df)

      Name  Age  Score
0    Alice   28     85
1      Bob   22     90
2  Charlie   25     95
3    David   22     80
4      Eve   28     88


Pandas filter() function allows us to subset rows or columns in a DataFrame based on their labels. This method is useful when we need to select data based on label matching, whether it's by exact labels, partial string matches or regular expression patterns. It works with labels rather than the content of the DataFrame which makes it a quick and efficient way to focus on specific parts of our data.

In [12]:
import pandas as pd

df = pd.read_csv("nba.csv")

df

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
...,...,...,...,...,...,...,...,...,...
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0


In [13]:
df_filtered_columns = df.filter(items=['Name', 'Team', 'Salary'])
print(df_filtered_columns)

              Name            Team     Salary
0    Avery Bradley  Boston Celtics  7730337.0
1      Jae Crowder  Boston Celtics  6796117.0
2     John Holland  Boston Celtics        NaN
3      R.J. Hunter  Boston Celtics  1148640.0
4    Jonas Jerebko  Boston Celtics  5000000.0
..             ...             ...        ...
453   Shelvin Mack       Utah Jazz  2433333.0
454      Raul Neto       Utah Jazz   900000.0
455   Tibor Pleiss       Utah Jazz  2900000.0
456    Jeff Withey       Utah Jazz   947276.0
457            NaN             NaN        NaN

[458 rows x 3 columns]


# DataFrame.filter() in Pandas  

**Syntax:**  
`DataFrame.filter(items=None, like=None, regex=None, axis=None)`  

---

## Parameters  

| Parameter | Description                                                                 |
|-----------|-----------------------------------------------------------------------------|
| `items`   | A list of labels to keep. Only the specified labels will be retained.       |
| `like`    | A string to match labels that contain this substring.                       |
| `regex`   | A regular expression pattern to match labels.                               |
| `axis`    | Specifies whether to filter rows (`axis=0`) or columns (`axis=1`). By default, it operates on columns. |

---

## Return  
Returns the same type as the input — a **DataFrame** or **Series**, depending on the context.


In [14]:
df_filtered_like = df.filter(like='a', axis=1)
print(df_filtered_like)

              Name            Team     Salary
0    Avery Bradley  Boston Celtics  7730337.0
1      Jae Crowder  Boston Celtics  6796117.0
2     John Holland  Boston Celtics        NaN
3      R.J. Hunter  Boston Celtics  1148640.0
4    Jonas Jerebko  Boston Celtics  5000000.0
..             ...             ...        ...
453   Shelvin Mack       Utah Jazz  2433333.0
454      Raul Neto       Utah Jazz   900000.0
455   Tibor Pleiss       Utah Jazz  2900000.0
456    Jeff Withey       Utah Jazz   947276.0
457            NaN             NaN        NaN

[458 rows x 3 columns]


In [18]:
df_filtered_regex = df.filter(regex='[sS]', axis=1)
df_filtered_regex.head()

Unnamed: 0,Position,Salary
0,PG,7730337.0
1,SF,6796117.0
2,SG,
3,SG,1148640.0
4,PF,5000000.0


In [22]:
df_filtered_rows = df.filter(items=[0, 1, 2], axis=0)
df_filtered_rows

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
